# Tensors: The Core of PyTorch

- Create tensors from different data sources like Python lists and NumPy arrays.
- Reshape and manipulate tensor dimensions to prepare data for model inputs.
- Use indexing and slicing techniques to access and filter specific parts of your data.
- Perform the mathematical and logical operations that form the basis of all neural network computations.

### Imports


In [2]:
import torch
import numpy as np
import pandas as pd

## Tensor Creation

### From Existing Data Structures

In [3]:
# From Python lists
x = torch.tensor([1, 2, 3])

print("FROM PYTHON LISTS:", x)
print("TENSOR DATA TYPE:", x.dtype)

FROM PYTHON LISTS: tensor([1, 2, 3])
TENSOR DATA TYPE: torch.int64


In [5]:
# From a NumPy array
numpy_array = np.array([[1, 2, 3], [4, 5, 6]])
torch_tensor_from_numpy = torch.from_numpy(numpy_array)

print("TENSOR FROM NUMPY:\n\n", torch_tensor_from_numpy)
print("TENSOR DATA TYPE:", torch_tensor_from_numpy.dtype)

TENSOR FROM NUMPY:

 tensor([[1, 2, 3],
        [4, 5, 6]])
TENSOR DATA TYPE: torch.int64


In [6]:
# From Pandas DataFrame
# Read the data from the CSV file into a DataFrame
df = pd.read_csv('./data.csv')

# Extract the data as a NumPy array from the DataFrame
all_values = df.values

# Convert the DataFrame's values to a PyTorch tensor
tensor_from_df = torch.tensor(all_values)

print("ORIGINAL DATAFRAME:\n\n", df)
print("\nRESULTING TENSOR:\n\n", tensor_from_df)
print("\nTENSOR DATA TYPE:", tensor_from_df.dtype)

ORIGINAL DATAFRAME:

    distance_miles  delivery_time_minutes
0            1.60                   7.22
1           13.09                  32.41
2            6.97                  17.47

RESULTING TENSOR:

 tensor([[ 1.6000,  7.2200],
        [13.0900, 32.4100],
        [ 6.9700, 17.4700]], dtype=torch.float64)

TENSOR DATA TYPE: torch.float64


### with predefined values

In [7]:
# All zeros
zeros = torch.zeros(2, 3)

print("TENSOR WITH ZEROS:\n\n", zeros)

TENSOR WITH ZEROS:

 tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [8]:
# All ones
ones = torch.ones(2, 3)

print("TENSOR WITH ONES:\n\n", ones)

TENSOR WITH ONES:

 tensor([[1., 1., 1.],
        [1., 1., 1.]])


In [9]:
# Random numbers
random = torch.rand(2, 3)

print("RANDOM TENSOR:\n\n", random)

RANDOM TENSOR:

 tensor([[0.2335, 0.0480, 0.3448],
        [0.5353, 0.5431, 0.0021]])


### From Sequence

In [10]:
# Range of numbers
range_tensor = torch.arange(0, 10, step=1)

print("ARANGE TENSOR:", range_tensor)

ARANGE TENSOR: tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


## Reshaping and manipulating

In [11]:
# A 2D tensor
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])

print("ORIGINAL TENSOR:\n\n", x)
print("\nTENSOR SHAPE:", x.shape)

ORIGINAL TENSOR:

 tensor([[1, 2, 3],
        [4, 5, 6]])

TENSOR SHAPE: torch.Size([2, 3])


### Changing a Tensor's Dimensions
Once you identify a shape mismatch, you need to correct it. A frequent task is adding a dimension to a single data sample to create a batch of size one for your model, or removing a dimension after a batch operation is complete.

Adding Dimension: torch.Tensor.unsqueeze() inserts a new dimension at the specified index.
Notice how the shape will change from [2, 3] to [1, 2, 3] and the tensor gets wrapped in an extra pair of square brackets [].

In [15]:
print("ORIGINAL TENSOR:\n\n", x)
print("\nTENSOR SHAPE:", x.shape)
print("-"*45)

# Add dimension
expanded = x.unsqueeze(0)  # Add dimension at index 0

print("\nTENSOR WITH ADDED DIMENSION AT INDEX 0:\n\n", expanded)
print("\nTENSOR SHAPE:", expanded.shape)

ORIGINAL TENSOR:

 tensor([[1, 2, 3],
        [4, 5, 6]])

TENSOR SHAPE: torch.Size([2, 3])
---------------------------------------------

TENSOR WITH ADDED DIMENSION AT INDEX 0:

 tensor([[[1, 2, 3],
         [4, 5, 6]]])

TENSOR SHAPE: torch.Size([1, 2, 3])


Removing Dimension: torch.Tensor.squeeze() removes dimensions of size 1.
This reverses the unsqueeze operation, removing the 1 from the shape and taking away a pair of outer square brackets.

In [16]:
print("EXPANDED TENSOR:\n\n", expanded)
print("\nTENSOR SHAPE:", expanded.shape)
print("-"*45)

# Remove dimension
squeezed = expanded.squeeze()

print("\nTENSOR WITH DIMENSION REMOVED:\n\n", squeezed)
print("\nTENSOR SHAPE:", squeezed.shape)

EXPANDED TENSOR:

 tensor([[[1, 2, 3],
         [4, 5, 6]]])

TENSOR SHAPE: torch.Size([1, 2, 3])
---------------------------------------------

TENSOR WITH DIMENSION REMOVED:

 tensor([[1, 2, 3],
        [4, 5, 6]])

TENSOR SHAPE: torch.Size([2, 3])


### Restructuring

Reshaping: torch.Tensor.reshape() changes the shape of a tensor to the specified dimensions.

In [17]:
print("ORIGINAL TENSOR:\n\n", x)
print("\nTENSOR SHAPE:", x.shape)
print("-"*45)

# Reshape
reshaped = x.reshape(3, 2)

print("\nAFTER PERFORMING reshape(3, 2):\n\n", reshaped)
print("\nTENSOR SHAPE:", reshaped.shape)

ORIGINAL TENSOR:

 tensor([[1, 2, 3],
        [4, 5, 6]])

TENSOR SHAPE: torch.Size([2, 3])
---------------------------------------------

AFTER PERFORMING reshape(3, 2):

 tensor([[1, 2],
        [3, 4],
        [5, 6]])

TENSOR SHAPE: torch.Size([3, 2])


Transposing: torch.Tensor.transpose() swaps the specified dimensions of a tensor.

In [18]:
print("ORIGINAL TENSOR:\n\n", x)
print("\nTENSOR SHAPE:", x.shape)
print("-"*45)

# Transpose
transposed = x.transpose(0, 1)

print("\nAFTER PERFORMING transpose(0, 1):\n\n", transposed)
print("\nTENSOR SHAPE:", transposed.shape)

ORIGINAL TENSOR:

 tensor([[1, 2, 3],
        [4, 5, 6]])

TENSOR SHAPE: torch.Size([2, 3])
---------------------------------------------

AFTER PERFORMING transpose(0, 1):

 tensor([[1, 4],
        [2, 5],
        [3, 6]])

TENSOR SHAPE: torch.Size([3, 2])


### Combining tensors

torch.cat(): Joins a sequence of tensors along an existing dimension. Note: All tensors must have the same shape in dimensions other than the one being concatenated.

In [19]:
# Create two tensors to concatenate
tensor_a = torch.tensor([[1, 2],
                         [3, 4]])
tensor_b = torch.tensor([[5, 6],
                         [7, 8]])

# Concatenate along columns (dim=1)
concatenated_tensors = torch.cat((tensor_a, tensor_b), dim=1)


print("TENSOR A:\n\n", tensor_a)
print("\nTENSOR B:\n\n", tensor_b)
print("-"*45)
print("\nCONCATENATED TENSOR (dim=1):\n\n", concatenated_tensors)

TENSOR A:

 tensor([[1, 2],
        [3, 4]])

TENSOR B:

 tensor([[5, 6],
        [7, 8]])
---------------------------------------------

CONCATENATED TENSOR (dim=1):

 tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])


### Indexing and Slicing

- Accessing Elements
These are the fundamental techniques for getting data out of a tensor, working very similarly to how you would access elements in a standard Python list.

  - Standard Indexing: Accessing single elements or entire rows using integer indices (e.g., x[0], x[1, 2]).

In [20]:
# Create a 3x4 tensor
x = torch.tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])
print("ORIGINAL TENSOR:\n\n", x)
print("-" * 55)

# Get a single element at row 1, column 2
single_element_tensor = x[1, 2]

print("\nINDEXING SINGLE ELEMENT AT [1, 2]:", single_element_tensor)
print("-" * 55)

# Get the entire second row (index 1)
second_row = x[1]

print("\nINDEXING ENTIRE ROW [1]:", second_row)
print("-" * 55)

# Last row
last_row = x[-1]

print("\nINDEXING ENTIRE LAST ROW ([-1]):", last_row, "\n")

ORIGINAL TENSOR:

 tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
-------------------------------------------------------

INDEXING SINGLE ELEMENT AT [1, 2]: tensor(7)
-------------------------------------------------------

INDEXING ENTIRE ROW [1]: tensor([5, 6, 7, 8])
-------------------------------------------------------

INDEXING ENTIRE LAST ROW ([-1]): tensor([ 9, 10, 11, 12]) 



- Slicing: Extracting sub-tensors using [start:end:step] notation (e.g., x[:2, ::2]).
Note: The end index itself is not included in the slice.
- Slicing can be used to access entire columns.

In [21]:
print("ORIGINAL TENSOR:\n\n", x)
print("-" * 55)

# Get the first two rows
first_two_rows = x[0:2]

print("\nSLICING FIRST TWO ROWS ([0:2]):\n\n", first_two_rows)
print("-" * 55)

# Get the third column of all rows
third_column = x[:, 2]

print("\nSLICING THIRD COLUMN ([:, 2]]):", third_column)
print("-" * 55)

# Every other column
every_other_col = x[:, ::2]

print("\nEVERY OTHER COLUMN ([:, ::2]):\n\n", every_other_col)
print("-" * 55)

# Last column
last_col = x[:, -1]

print("\nLAST COLUMN ([:, -1]):", last_col, "\n")

ORIGINAL TENSOR:

 tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
-------------------------------------------------------

SLICING FIRST TWO ROWS ([0:2]):

 tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])
-------------------------------------------------------

SLICING THIRD COLUMN ([:, 2]]): tensor([ 3,  7, 11])
-------------------------------------------------------

EVERY OTHER COLUMN ([:, ::2]):

 tensor([[ 1,  3],
        [ 5,  7],
        [ 9, 11]])
-------------------------------------------------------

LAST COLUMN ([:, -1]): tensor([ 4,  8, 12]) 



- Combining Indexing & Slicing

In [22]:
print("ORIGINAL TENSOR:\n\n", x)
print("-" * 55)

# Combining slicing and indexing (First two rows, last two columns)
combined = x[0:2, 2:]

print("\nFIRST TWO ROWS, LAST TWO COLS ([0:2, 2:]):\n\n", combined, "\n")

ORIGINAL TENSOR:

 tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
-------------------------------------------------------

FIRST TWO ROWS, LAST TWO COLS ([0:2, 2:]):

 tensor([[3, 4],
        [7, 8]]) 



- .item(): Extracts the value from a single-element tensor as a standard Python number.

In [23]:
print("SINGLE-ELEMENT TENSOR:", single_element_tensor)
print("-" * 45)

# Extract the value from a single-element tensor as a standard Python number
value = single_element_tensor.item()

print("\n.item() PYTHON NUMBER EXTRACTED:", value)
print("TYPE:", type(value))

SINGLE-ELEMENT TENSOR: tensor(7)
---------------------------------------------

.item() PYTHON NUMBER EXTRACTED: 7
TYPE: <class 'int'>


### Advanced Indexing

- Boolean Masking: Using a boolean tensor to select elements that meet a certain condition (e.g., x[x > 5]).

In [26]:
print("ORIGINAL TENSOR:\n\n", x)
print("-" * 55)

# Boolean indexing using logical comparisons
mask = x > 6

print("MASK (VALUES > 6):\n\n", mask, "\n")

# Applying Boolean masking
mask_applied = x[mask]

print("VALUES AFTER APPLYING MASK:", mask_applied, "\n")

ORIGINAL TENSOR:

 tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
-------------------------------------------------------
MASK (VALUES > 6):

 tensor([[False, False, False, False],
        [False, False,  True,  True],
        [ True,  True,  True,  True]]) 

VALUES AFTER APPLYING MASK: tensor([ 7,  8,  9, 10, 11, 12]) 



- Fancy Indexing: Using a tensor of indices to select specific elements in a non-contiguous way.

In [29]:
print("ORIGINAL TENSOR:\n\n", x)
print("-" * 55)

# Fancy indexing

# Get first and third rows
row_indices = torch.tensor([0, 2])

# Get second and fourth columns
col_indices = torch.tensor([1, 3]) 

# Gets values at (0,1), (0,3), (2,1), (2,3)
get_values = x[row_indices[:, None], col_indices]

print("\nSPECIFIC ELEMENTS USING INDICES:\n\n", get_values, "\n")

ORIGINAL TENSOR:

 tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
-------------------------------------------------------

SPECIFIC ELEMENTS USING INDICES:

 tensor([[ 2,  4],
        [10, 12]]) 



### Mathematical and Logical operations

- Arithmetic: These operations are the foundation of how a neural network processes data. You'll see how PyTorch handles element-wise calculations and uses a powerful feature called broadcasting to simplify your code.
  -  Element-wise Operations: Standard math operators (+, *) that apply to each element independently.

In [31]:
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
print("TENSOR A:", a)
print("TENSOR B", b)
print("-" * 60)

# Element-wise addition
element_add = a + b

print("\nAFTER PERFORMING ELEMENT-WISE ADDITION:", element_add, "\n")

TENSOR A: tensor([1, 2, 3])
TENSOR B tensor([4, 5, 6])
------------------------------------------------------------

AFTER PERFORMING ELEMENT-WISE ADDITION: tensor([5, 7, 9]) 



In [32]:
print("TENSOR A:", a)
print("TENSOR B", b)
print("-" * 65)

# Element-wise multiplication
element_mul = a * b

print("\nAFTER PERFORMING ELEMENT-WISE MULTIPLICATION:", element_mul, "\n")

TENSOR A: tensor([1, 2, 3])
TENSOR B tensor([4, 5, 6])
-----------------------------------------------------------------

AFTER PERFORMING ELEMENT-WISE MULTIPLICATION: tensor([ 4, 10, 18]) 



- Dot Product (torch.matmul()): Calculates the dot product of two vectors or matrices.

In [33]:
print("TENSOR A:", a)
print("TENSOR B", b)
print("-" * 65)

# Dot product
dot_product = torch.matmul(a, b)

print("\nAFTER PERFORMING DOT PRODUCT:", dot_product, "\n")

TENSOR A: tensor([1, 2, 3])
TENSOR B tensor([4, 5, 6])
-----------------------------------------------------------------

AFTER PERFORMING DOT PRODUCT: tensor(32) 



- Broadcasting: The automatic expansion of smaller tensors to match the shape of larger tensors during arithmetic operations.
  - Broadcasting allows operations between tensors with compatible shapes, even if they don't have the exact same dimensions.

In [34]:
a = torch.tensor([1, 2, 3])
b = torch.tensor([[1],
                 [2],
                 [3]])

print("TENSOR A:", a)
print("SHAPE:", a.shape)
print("\nTENSOR B\n\n", b)
print("\nSHAPE:", b.shape)
print("-" * 65)

# Apply broadcasting
c = a + b


print("\nTENSOR C:\n\n", c)
print("\nSHAPE:", c.shape, "\n")

TENSOR A: tensor([1, 2, 3])
SHAPE: torch.Size([3])

TENSOR B

 tensor([[1],
        [2],
        [3]])

SHAPE: torch.Size([3, 1])
-----------------------------------------------------------------

TENSOR C:

 tensor([[2, 3, 4],
        [3, 4, 5],
        [4, 5, 6]])

SHAPE: torch.Size([3, 3]) 



### Logic & Comparisons
Logical operations are powerful tools for data preparation and analysis. They allow you to create boolean masks to filter, select, or modify your data based on specific conditions you define.

- Comparison Operators: Element-wise comparisons (>, ==, <) that produce a boolean tensor.

In [35]:
temperatures = torch.tensor([20, 35, 19, 35, 42])
print("TEMPERATURES:", temperatures)
print("-" * 50)

### Comparison Operators (>, <, ==)

# Use '>' (greater than) to find temperatures above 30
is_hot = temperatures > 30

# Use '<=' (less than or equal to) to find temperatures 20 or below
is_cool = temperatures <= 20

# Use '==' (equal to) to find temperatures exactly equal to 35
is_35_degrees = temperatures == 35

print("\nHOT (> 30 DEGREES):", is_hot)
print("COOL (<= 20 DEGREES):", is_cool)
print("EXACTLY 35 DEGREES:", is_35_degrees, "\n")

TEMPERATURES: tensor([20, 35, 19, 35, 42])
--------------------------------------------------

HOT (> 30 DEGREES): tensor([False,  True, False,  True,  True])
COOL (<= 20 DEGREES): tensor([ True, False,  True, False, False])
EXACTLY 35 DEGREES: tensor([False,  True, False,  True, False]) 



- Logical Operators: Element-wise logical operations (& for AND, | for OR) on boolean tensors.

In [36]:
is_morning = torch.tensor([True, False, False, True])
is_raining = torch.tensor([False, False, True, True])
print("IS MORNING:", is_morning)
print("IS RAINING:", is_raining)
print("-" * 50)

### Logical Operators (&, |)

# Use '&' (AND) to find when it's both morning and raining
morning_and_raining = (is_morning & is_raining)

# Use '|' (OR) to find when it's either morning or raining
morning_or_raining = is_morning | is_raining

print("\nMORNING & (AND) RAINING:", morning_and_raining)
print("MORNING | (OR) RAINING:", morning_or_raining)

IS MORNING: tensor([ True, False, False,  True])
IS RAINING: tensor([False, False,  True,  True])
--------------------------------------------------

MORNING & (AND) RAINING: tensor([False, False, False,  True])
MORNING | (OR) RAINING: tensor([ True, False,  True,  True])


### Statistics

Calculating statistics like the mean or standard deviation can be useful for understanding your dataset or for implementing certain types of normalization during the data preparation phase.

- torch.mean(): Calculates the mean of all elements in a tensor.

In [37]:
data = torch.tensor([10.0, 20.0, 30.0, 40.0, 50.0])
print("DATA:", data)
print("-" * 45)

# Calculate the mean
data_mean = data.mean()

print("\nCALCULATED MEAN:", data_mean, "\n")

DATA: tensor([10., 20., 30., 40., 50.])
---------------------------------------------

CALCULATED MEAN: tensor(30.) 



- torch.std(): Calculates the standard deviation of all elements.

In [38]:
print("DATA:", data)
print("-" * 45)

# Calculate the standard deviation
data_std = data.std()

print("\nCALCULATED STD:", data_std, "\n")

DATA: tensor([10., 20., 30., 40., 50.])
---------------------------------------------

CALCULATED STD: tensor(15.8114) 



In [42]:
sum = 10 + 20 +30 + 40 + 50
print(sum)

mean = sum/5
print(mean)

squared_differences = ((10 - mean)**2 + (20 - mean)**2 + (30 - mean)**2 + (40 - mean)**2 + (50 - mean)**2) / 4 #(n - 1)
print (squared_differences)

print(f"std: {squared_differences ** 0.5}")


150
30.0
250.0
std: 15.811388300841896


### Data Types
Just as important as a tensor's shape is its data type. Neural networks typically perform their calculations using 32-bit floating point numbers (float32). Providing data of the wrong type, such as an integer, can lead to runtime errors or unexpected behavior during training. It is a good practice to ensure your tensors have the correct data type for your model.

Type Casting (.int, etc.): Converts a tensor from one data type to another (e.g., from float to integer).

In [43]:
print("DATA:", data)
print("DATA TYPE:", data.dtype)
print("-" * 45)

# Cast the tensor to a int type
int_tensor = data.int()

print("\nCASTED DATA:", int_tensor)
print("CASTED DATA TYPE", int_tensor.dtype)

DATA: tensor([10., 20., 30., 40., 50.])
DATA TYPE: torch.float32
---------------------------------------------

CASTED DATA: tensor([10, 20, 30, 40, 50], dtype=torch.int32)
CASTED DATA TYPE torch.int32


### Analyze monthly sales

In [47]:
# Sales data for 3 products over 4 months
sales_data = torch.tensor([[100, 120, 130, 110],   # Product A
                           [ 90,  95, 105, 125],   # Product B
                           [140, 115, 120, 150]    # Product C
                          ], dtype=torch.float32)

print("ORIGINAL SALES DATA:\n\n", sales_data)
print("-" * 45)

# 1. Calculate total sales for Product B.
total_sales_product_b = torch.sum(sales_data[1:2])

# 2. Find months where sales for Product C were > 130.
high_sales_mask_product_c = sales_data[2] > 130

# 3. Get sales for Feb and Mar for all products.
# all rows
row_indices = torch.tensor([0, 1, 2])
# Get second and third columns
col_indices = torch.tensor([1, 2]) 
# Gets values at (0,1), (0,2), (1,1), (1,2), (2,1), (2,3)

sales_feb_mar = sales_data[row_indices[:, None], col_indices]

print("\nTotal Sales for Product B:                   ", total_sales_product_b)
print("\nMonths with >130 Sales for Product C (Mask): ", high_sales_mask_product_c)
print("\nSales for Feb & Mar:\n\n", sales_feb_mar)

ORIGINAL SALES DATA:

 tensor([[100., 120., 130., 110.],
        [ 90.,  95., 105., 125.],
        [140., 115., 120., 150.]])
---------------------------------------------

Total Sales for Product B:                    tensor(415.)

Months with >130 Sales for Product C (Mask):  tensor([ True, False, False,  True])

Sales for Feb & Mar:

 tensor([[120., 130.],
        [ 95., 105.],
        [115., 120.]])


### Image batch transformation

You're working on a computer vision model and have a batch of 4 grayscale images, each of size 3x3 pixels. The data is currently in a tensor with the shape [4, 3, 3], which represents [batch_size, height, width].

For processing with certain deep learning frameworks, you need to transform this data into the [batch_size, channels, height, width] format. Since the images are grayscale, you'll need to:

1. Add a new dimension of size 1 at index 1 to represent the color channel.
2. After adding the channel, you realize the model expects the shape [batch_size, height, width, channels]. Transpose the tensor to swap the channel dimension with the last dimension.

In [54]:
# A batch of 4 grayscale images, each 3x3
image_batch = torch.rand(4, 3, 3)
print(image_batch)
print("ORIGINAL BATCH SHAPE:", image_batch.shape)
print("-" * 45)

### START CODE HERE ###

# 1. Add a channel dimension at index 1.
image_batch_with_channel = image_batch.unsqueeze(1)

# 2. Transpose the tensor to move the channel dimension to the end.
# Swap dimension 1 (channels) with dimension 3 (the last one).
image_batch_transposed = image_batch_with_channel.transpose(1,3)

### END CODE HERE ###


print("\nSHAPE AFTER UNSQUEEZE:", image_batch_with_channel.shape)
print(image_batch_with_channel)
print("SHAPE AFTER TRANSPOSE:", image_batch_transposed.shape)
print(image_batch_transposed)

tensor([[[0.5401, 0.3937, 0.5628],
         [0.2107, 0.0996, 0.4891],
         [0.9485, 0.9127, 0.5402]],

        [[0.8148, 0.4050, 0.5280],
         [0.7934, 0.8750, 0.4484],
         [0.8891, 0.4880, 0.5281]],

        [[0.5830, 0.0549, 0.2497],
         [0.4584, 0.6014, 0.0362],
         [0.7792, 0.2347, 0.3544]],

        [[0.4126, 0.3484, 0.7043],
         [0.0906, 0.1803, 0.7070],
         [0.7044, 0.5443, 0.7248]]])
ORIGINAL BATCH SHAPE: torch.Size([4, 3, 3])
---------------------------------------------

SHAPE AFTER UNSQUEEZE: torch.Size([4, 1, 3, 3])
tensor([[[[0.5401, 0.3937, 0.5628],
          [0.2107, 0.0996, 0.4891],
          [0.9485, 0.9127, 0.5402]]],


        [[[0.8148, 0.4050, 0.5280],
          [0.7934, 0.8750, 0.4484],
          [0.8891, 0.4880, 0.5281]]],


        [[[0.5830, 0.0549, 0.2497],
          [0.4584, 0.6014, 0.0362],
          [0.7792, 0.2347, 0.3544]]],


        [[[0.4126, 0.3484, 0.7043],
          [0.0906, 0.1803, 0.7070],
          [0.7044, 0.5443

### Combining and Weighting Sensor Data

An environment monitoring system that uses two sensors: one for temperature and one for humidity. Data from these sensors as two separate 1D tensors.

- Concatenate the two tensors into a single 2x5 tensor, where the first row is temperature data and the second is humidity data.
- Create a weights tensor torch.tensor([0.6, 0.4]).
- Use broadcasting and element-wise multiplication to apply these weights to the combined sensor data. The temperature data should be multiplied by 0.6 and the humidity data by 0.4.
- Finally, calculate the weighted average for each time step by summing the weighted values along dim=0 and dividing by the sum of the weights.


In [63]:
# Sensor readings (5 time steps)
temperature = torch.tensor([22.5, 23.1, 21.9, 22.8, 23.5])
humidity = torch.tensor([55.2, 56.4, 54.8, 57.1, 56.8])

print("TEMPERATURE DATA: ", temperature)
print("HUMIDITY DATA:    ", humidity)
print("-" * 45)

# 1. Concatenate the two tensors.
# we can also us torch.stack
# result = torch.stack([temperature, humidity], dim=0)
# print(result)
# using unsqueeze
combined_data = torch.cat([temperature.unsqueeze(0), humidity.unsqueeze(0)], dim=0)

# 2. Create the weights tensor.
weights = torch.tensor([0.6, 0.4])

# 3. Apply weights using broadcasting.
# You need to reshape weights to [2, 1] to broadcast across columns.
weighted_data = combined_data * weights.unsqueeze(1)

# 4. Calculate the weighted average for each time step.
#    (A true average = weighted sum / sum of weights)
weighted_sum = torch.sum(weighted_data, dim=0)
weighted_average = weighted_sum / torch.sum(weights)

print("\nCOMBINED DATA (2x5):\n\n", combined_data)
print("\nWEIGHTED DATA:\n\n", weighted_data)
print("\nWEIGHTED AVERAGE:", weighted_average)

TEMPERATURE DATA:  tensor([22.5000, 23.1000, 21.9000, 22.8000, 23.5000])
HUMIDITY DATA:     tensor([55.2000, 56.4000, 54.8000, 57.1000, 56.8000])
---------------------------------------------

COMBINED DATA (2x5):

 tensor([[22.5000, 23.1000, 21.9000, 22.8000, 23.5000],
        [55.2000, 56.4000, 54.8000, 57.1000, 56.8000]])

WEIGHTED DATA:

 tensor([[13.5000, 13.8600, 13.1400, 13.6800, 14.1000],
        [22.0800, 22.5600, 21.9200, 22.8400, 22.7200]])

WEIGHTED AVERAGE: tensor([35.5800, 36.4200, 35.0600, 36.5200, 36.8200])


### Feature Engineering for Taxi Fares

Dataset of taxi trips. Have a tensor, trip_data, where each row is a trip and the columns represent [distance (km), hour_of_day (24h)].

- Goal is to engineer a new binary feature called is_rush_hour_long_trip. This feature should be True (or 1) only if a trip meets both of the following criteria:
  - It's a long trip (distance > 10 km).
  - It occurs during a rush hour (8-10 AM or 5-7 PM, i.e., [8, 10) or [17, 19)).

**Steps**
- Slice the trip_data tensor to isolate the distance and hour columns.
- Use logical and comparison operators to create boolean masks for each condition (long trip, morning rush, evening rush).
- Combine these masks to create the final is_rush_hour_long_trip feature.
- Reshape this new 1D feature tensor into a 2D column vector and convert its data type to float so it can be combined with the original data.

In [74]:
# Data for 8 taxi trips: [distance, hour_of_day]
trip_data = torch.tensor([
    [5.3, 7],   # Not rush hour, not long
    [12.1, 9],  # Morning rush, long trip -> RUSH HOUR LONG
    [15.5, 13], # Not rush hour, long trip
    [6.7, 18],  # Evening rush, not long
    [2.4, 20],  # Not rush hour, not long
    [11.8, 17], # Evening rush, long trip -> RUSH HOUR LONG
    [9.0, 9],   # Morning rush, not long
    [14.2, 8]   # Morning rush, long trip -> RUSH HOUR LONG
], dtype=torch.float32)


print("ORIGINAL TRIP DATA (Distance, Hour):\n\n", trip_data)
print("-" * 55)

# 1. Slice the main tensor to get 1D tensors for each feature.
distances = trip_data[:, 0]
hours = trip_data[:, 1]

# 2. Create boolean masks for each condition.
is_long_trip = distances > 10
is_morning_rush = (hours >= 8) & (hours <= 10)
is_evening_rush = (hours >= 17) & (hours <= 19)

# 3. Combine masks to identify rush hour long trips.
# A trip is a rush hour long trip if it's (a morning OR evening rush) AND a long trip.
is_rush_hour_long_trip_mask = (is_morning_rush | is_evening_rush) & is_long_trip

# 4. Reshape the new feature into a column vector and cast to float.
new_feature_col = is_rush_hour_long_trip_mask.float().unsqueeze(1)

print("\n'IS RUSH HOUR LONG TRIP' MASK: ", is_rush_hour_long_trip_mask)
print("\nNEW FEATURE COLUMN (Reshaped):\n\n", new_feature_col)

# You can now concatenate this new feature to the original data
enhanced_trip_data = torch.cat((trip_data, new_feature_col), dim=1)
print("\nENHANCED DATA (with new feature at the end):\n\n", enhanced_trip_data)

ORIGINAL TRIP DATA (Distance, Hour):

 tensor([[ 5.3000,  7.0000],
        [12.1000,  9.0000],
        [15.5000, 13.0000],
        [ 6.7000, 18.0000],
        [ 2.4000, 20.0000],
        [11.8000, 17.0000],
        [ 9.0000,  9.0000],
        [14.2000,  8.0000]])
-------------------------------------------------------

'IS RUSH HOUR LONG TRIP' MASK:  tensor([False,  True, False, False, False,  True, False,  True])

NEW FEATURE COLUMN (Reshaped):

 tensor([[0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [1.]])

ENHANCED DATA (with new feature at the end):

 tensor([[ 5.3000,  7.0000,  0.0000],
        [12.1000,  9.0000,  1.0000],
        [15.5000, 13.0000,  0.0000],
        [ 6.7000, 18.0000,  0.0000],
        [ 2.4000, 20.0000,  0.0000],
        [11.8000, 17.0000,  1.0000],
        [ 9.0000,  9.0000,  0.0000],
        [14.2000,  8.0000,  1.0000]])
