### Comprehensive Exercise: Combining Datasets, DataLoaders, and Tensor Operations

#### Problem Statement:
Build a mini machine learning pipeline using PyTorch's `Dataset` and `DataLoader` utilities. The exercise includes:
1. **Custom Dataset**: Create a custom dataset class using `torch.utils.data.Dataset`. Use synthetic tensor data that mimics a regression task.
2. **Data Transformation**: Apply tensor operations to preprocess the data within the `__getitem__` method of the dataset.
3. **Model Training Preparation**: Use `DataLoader` for efficient data batching.
4. **Integration**: Use knowledge from the "Tensors and Tensor Operations" tutorial to manipulate the data.

#### Steps:
1. **Create Synthetic Data**:
   - Generate random tensors representing features (`X`) and labels (`y`) for a regression task, e.g., \( y = 3x + \text{noise} \).
   - Ensure the dataset contains 500 samples.

2. **Implement a Custom Dataset**:
   - Create a class `RegressionDataset` that subclasses `torch.utils.data.Dataset`.
   - Override the `__len__` and `__getitem__` methods.
   - In `__getitem__`, normalize the features (e.g., using mean and standard deviation).

3. **Use DataLoader**:
   - Create a `DataLoader` for the dataset with a batch size of 32.
   - Shuffle the data during training and use `DataLoader` to iterate over batches.

#### Deliverables:
1. Custom Dataset class code.
2. Code for tensor transformations within the dataset.
3. DataLoader initialization and batch visualization.

This exercise ensures understanding of:
- PyTorch's `Dataset` and `DataLoader`.
- Tensor operations for data preprocessing.

In [65]:
# step 1: Create Synthetic Data
import torch
X = torch.randint(0, 20, (500, ))
noise = torch.rand(X.size())
noisy_X = X + noise
mean, std = noisy_X.mean(), noisy_X.std()
noisy_X = (noisy_X - mean) / std

y = 3 * X + 3
print(X[:10])
print(noisy_X[:10])
print(y[:10])


tensor([ 9, 13,  5,  9,  3,  2, 12, 19, 10, 14])
tensor([-0.0280,  0.6014, -0.6486,  0.0255, -1.0669, -1.2129,  0.4126,  1.7381,
         0.0891,  0.9029])
tensor([30, 42, 18, 30, 12,  9, 39, 60, 33, 45])


In [66]:
# Step 2: Implementa a Custom Dataset
from torch.utils.data import Dataset
class RegressionDataset(Dataset):
    def __init__(self, X, y, transform=None, target_transform=None):
        self.X = X
        self.y = y
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, index):
        input = self.X[index]
        output = self.y[index]
        if self.transform:
            input = self.transform(input)
        if self.target_transform:
            output = self.target_transform(output)
        return input, output
        

In [67]:
train_size = int(len(noisy_X)*0.8)
train_data = RegressionDataset(noisy_X[:train_size].type(torch.float), y.type(torch.float))
test_data = RegressionDataset(noisy_X[train_size:].type(torch.float), y.type(torch.float))

In [68]:
# step 3: use DataLoader
from torch.utils.data import DataLoader
batch_size = 32
train_dataloader = DataLoader(data, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=True)

In [69]:
train_features, train_target = next(iter(train_dataloader))
print(train_features.size(), train_target.size())

torch.Size([32]) torch.Size([32])
