# Synthetic Regression Data



In [1]:
%matplotlib inline
import random
import torch
from d2l import torch as d2l

Construct an artificial dataset
according to a linear model with additive noise.
The true parameters generating our true targets are
$\mathbf{w} = [2, -3.4]^\top$ and $b = 4.2$.
$$\mathbf{y}= \mathbf{X} \mathbf{w} + b + \mathbf\epsilon$$

In [3]:
class SyntheticRegressionData(d2l.DataModule):  
    def __init__(self, w, b, noise=0.01, num_examples=1000, batch_size=8):
        super().__init__()
        self.save_hyperparameters()
        self.X = torch.normal(0, 1, (num_examples, len(w)))
        y = torch.matmul(self.X, w) + b + torch.normal(0, noise, (num_examples,))
        self.y = y.reshape((-1, 1))

w = torch.tensor([2, -3.4])
b = 4.2
data = SyntheticRegressionData(w, b)

Each row in `features` consists of a vector in $\mathbb{R}^2$ and each row in `labels` is a scalar

In [4]:
print('features:', data.X[0],'\nlabel:', data.y[0])

features: tensor([ 1.0323, -0.0329]) 
label: tensor([6.3724])


Define the `data_iter` function
that
takes a batch size, a matrix of features,
and a vector of labels, yielding minibatches of size `batch_size`

In [6]:
@d2l.add_to_class(SyntheticRegressionData)
def train_dataloader(self):
    indices = list(range(self.num_examples))
    random.shuffle(indices)
    for i in range(0, self.num_examples, self.batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + self.batch_size, self.num_examples)])
        yield self.X[batch_indices], self.y[batch_indices]

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, '\ny shape:', y.shape)

X shape: torch.Size([8, 2]) 
y shape: torch.Size([8, 1])


Call the existing API in a framework to read data

In [8]:
@d2l.add_to_class(SyntheticRegressionData)  
def train_dataloader(self):
    dataset = torch.utils.data.TensorDataset(self.X, self.y)
    return torch.utils.data.DataLoader(dataset, self.batch_size, shuffle=True)

next(iter(data.train_dataloader()))

[tensor([[ 1.8273, -0.3405],
         [ 0.3609, -1.4379],
         [-0.1293, -1.0305],
         [-1.0967,  0.7427],
         [ 0.0045, -0.3536],
         [-1.6509,  1.9914],
         [-0.9864,  2.0244],
         [ 1.1538,  0.7084]]),
 tensor([[ 9.0194],
         [ 9.8133],
         [ 7.4450],
         [-0.5365],
         [ 5.4222],
         [-5.8550],
         [-4.6594],
         [ 4.0990]])]

In [9]:
len(data.train_dataloader())

125