#### 9. Why do we need to reshuffle the dataset? Can you design a case where a maliciously constructed dataset would break the optimization algorithm otherwise?

We could have misleading patterns, first a positive linear correspondence for the first 1000 datapoints and then 1000 negative ones following it, then the first 1000 datapoints give us exactly misleading information.

You can imagine a way to maliciously design that kind of pattern on any data with high variance simply by sorting the data.

### With shuffle

![image](linear-regression-scratch_9_1.png)

### Without shuffle

![image](linear-regression-scratch_9_2.png)

## Code

In [None]:
import torch
from d2l import torch as d2l

DO_SHUFFLE = True

class DataLoader(d2l.DataModule):
    """Synthetic data for linear regression.

    Defined in :numref:`sec_synthetic-regression-data`"""
    def __init__(self, X, y, batch_size=1000):
        super().__init__()
        self.save_hyperparameters()

    def get_dataloader(self, train):
        """Defined in :numref:`sec_synthetic-regression-data`"""
        return self.get_tensorloader((self.X, self.y), DO_SHUFFLE)

# Construct a malicious dataset
X1 = torch.randn(1000, 2)
y1 = 2 * X1[:, 0] + 3 * X1[:, 1] + torch.randn(1000)  # Positive linear relationship

X2 = torch.randn(1000, 2)
y2 = -2 * X2[:, 0] - 3 * X2[:, 1] + torch.randn(1000)  # Negative linear relationship

X = torch.cat([X1, X2], dim=0)
y = torch.cat([y1, y2], dim=0)

# Train the model without shuffling
model = d2l.LinearRegressionScratch(2, lr=0.03)
data = DataLoader(X, y)
trainer = d2l.Trainer(max_epochs=15)
trainer.fit(model, data)

# Evaluate the model
with torch.no_grad():
    print(f'error in estimating w: {model.w.reshape(2) - torch.tensor([2, 3])}')
    print(f'error in estimating b: {model.b - 0}')