In [20]:
import pandas as pd
from sklearn.model_selection import train_test_split
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim

In [21]:
torch.manual_seed(42)

<torch._C.Generator at 0x7a72fbfed3f0>

In [22]:
df = pd.read_csv('/content/fmnist_small.csv')
df.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,9,0,0,0,0,0,0,0,0,0,...,0,7,0,50,205,196,213,165,0,0
1,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,0,...,142,142,142,21,0,3,0,0,0,0
3,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,8,0,0,0,0,0,0,0,0,0,...,213,203,174,151,188,10,0,0,0,0


In [23]:
X = df.iloc[:, 1:].values
y = df.iloc[:, 0].values

In [24]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [25]:
# Scaling
X_train = X_train/255.0
y_test = y_test/255.0

### Why PyTorch Prefers Specific Data Types

PyTorch uses specific data types for features and labels during model training for efficiency and compatibility. Here's a breakdown:

**Features as Floats (`torch.float32`)**

* **Precision in Calculations:** Deep learning models rely on floating-point arithmetic for accurate computations. Using `torch.float32` provides the necessary precision to capture complex patterns within the data.
* **Gradient-Based Optimization:** Training deep learning models involves calculating gradients (derivatives). These gradients are naturally represented as floating-point numbers, making features as floats crucial for optimization algorithms like stochastic gradient descent.
* **GPU Acceleration:** Modern GPUs are optimized for processing floating-point operations. Storing features as floats allows PyTorch to leverage GPU acceleration effectively, leading to significantly faster model training.

**Labels as Long Integers (`torch.long`)**

* **Classification Tasks:** Labels in classification problems often represent distinct categories (e.g., 0 for "cat," 1 for "dog"). Long integers (`torch.long`) are well-suited for storing such categorical data efficiently.
* **Loss Functions:** Loss functions used in classification (e.g., cross-entropy loss) typically expect integer labels for comparing predicted probabilities against the actual target categories. Using `torch.long` ensures compatibility with these loss functions.
* **Indexing and Memory:** Representing labels as long integers is memory-efficient and allows for efficient indexing when accessing specific data points within the dataset.

**In Summary:**

* **Features:** Stored as floats (`torch.float32`) for precise calculations, gradient-based optimization, and GPU acceleration.
* **Labels:** Stored as long integers (`torch.long`) for efficient handling of discrete categories, compatibility with loss functions, and memory efficiency.

By adhering to these data type conventions, you ensure your data is processed correctly within the PyTorch framework, leading to more effective and efficient model training.

In [26]:
class FashionD(Dataset):
    def __init__(self, features, labels):
        self.features = torch.tensor(features, dtype=torch.float32)
        self.labels = torch.tensor(labels, dtype=torch.long)

    def __len__(self):
        return len(self.features)

    def __getitem__(self, idx):
        features = self.features[idx]
        label = self.labels[idx]
        return features, label

In [27]:
# Create train_dataset object
train_dataset = FashionD(X_train, y_train)

In [28]:
len(train_dataset)

4800

In [29]:
# Create test_dataset object
test_dataset = FashionD(X_test, y_test)

In [30]:
len(test_dataset)

1200

## Shuffling in Data Loaders

* **`shuffle=True` (for `train_loader`):** When set to `True` for the training data loader, the data samples within the training dataset are shuffled randomly before each epoch. This prevents the model from memorizing the order of the data and helps it generalize better to unseen data.

* **`shuffle=False` (for `test_loader`):** It is crucial to keep the test data in a consistent order, so `shuffle` is set to `False` for the test data loader. This allows for accurate tracking of the model's progress and ensures that the evaluation is performed on the same set of samples each time.

### Why Shuffle?

Shuffling the training data helps to:

* Prevent the model from learning patterns specific to the order of the data, leading to better generalization.
* Improve the stability and convergence of the training process.
* Reduce the impact of biases that may be present in the data due to the order in which it was collected or stored.

### Why Not Shuffle Test Data?

The test data is used to evaluate the performance of the model on unseen data. Therefore, it is essential to keep the test data in a consistent order to ensure that the evaluation results are comparable across different runs. Shuffling the test data would introduce randomness and make it difficult to track the model's progress accurately.

In [31]:
# Create train and test loader
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# why nn.Sequential

1. Simplified Model Definition:

nn.Sequential makes defining a sequence of layers more concise and readable. Layers are listed in the order they should be applied, improving the clarity of the model's structure.

self.fc1 = nn.Sequential(
   nn.Linear(num_features, 128),
   nn.ReLU(),
   nn.Linear(128, 64),
   nn.ReLU(),
   nn.Linear(64, 10)
)

2. Organized Forward Pass:

During the forward pass, input data automatically flows through each layer within nn.Sequential in the defined order. This eliminates the need for separate forward pass code for each layer, leading to better code organization and fewer errors.

def forward(self, x):
   return self.fc1(x)

3. Modular Design: ... (and so on)

**Therefore, to have it as pure text, you would need to remove the code block fencing and any Markdown-specific formatting.**

I apologize for the initial misunderstanding. I hope this clarifies things!


In [32]:
class FashionClassifier(nn.Module):
    def __init__(self, num_features):
        super(FashionClassifier, self).__init__()

        self.fc1 = nn.Sequential(
            nn.Linear(num_features, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        return self.fc1(x)

In [33]:
epochs = 100
learning_rate = 0.1

In [39]:
model = FashionClassifier(X_train.shape[1])

loss_function = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=learning_rate)

In [35]:
len(train_loader) # number of batches

150

In [40]:
for epoch in range(epochs):
  total_loss = 0

  for batch_features, batch_labels in train_loader:
    outputs = model(batch_features)

    loss = loss_function(outputs, batch_labels)

    optimizer.zero_grad()
    loss.backward()

    optimizer.step()

    total_loss += loss.item()

  print(f"Epoch {epoch+1}/{epoch}, Loss: {total_loss/len(train_loader)}")

Epoch 1/0, Loss: 1.3706780107816061
Epoch 2/1, Loss: 0.814962179462115
Epoch 3/2, Loss: 0.6875333511829376
Epoch 4/3, Loss: 0.6041322767734527
Epoch 5/4, Loss: 0.5527137624224027
Epoch 6/5, Loss: 0.5165035071969032
Epoch 7/6, Loss: 0.47892788489659627
Epoch 8/7, Loss: 0.45615307370821634
Epoch 9/8, Loss: 0.42594304800033567
Epoch 10/9, Loss: 0.4140771572291851
Epoch 11/10, Loss: 0.40038476963837943
Epoch 12/11, Loss: 0.37696492080887156
Epoch 13/12, Loss: 0.36135029544432956
Epoch 14/13, Loss: 0.3501733988026778
Epoch 15/14, Loss: 0.3195268313586712
Epoch 16/15, Loss: 0.3267906175057093
Epoch 17/16, Loss: 0.30053788791100183
Epoch 18/17, Loss: 0.29019319981336594
Epoch 19/18, Loss: 0.28491120698551337
Epoch 20/19, Loss: 0.26746990407506627
Epoch 21/20, Loss: 0.25396797065933546
Epoch 22/21, Loss: 0.2509445640941461
Epoch 23/22, Loss: 0.24148049689829348
Epoch 24/23, Loss: 0.23145229910810788
Epoch 25/24, Loss: 0.2295145431905985
Epoch 26/25, Loss: 0.20969781262179216
Epoch 27/26, Loss:

In [41]:
model.eval()

FashionClassifier(
  (fc1): Sequential(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=10, bias=True)
  )
)

In [42]:
# evaluation code(no library used)
total = 0
correct = 0

with torch.no_grad():
  for batch_features, batch_labels in test_loader:
    outputs = model(batch_features)

    _, predicted = torch.max(outputs, 1)

    total += batch_labels.shape[0]
    correct += (predicted == batch_labels).sum().item()

print(f"Accuracy: {100*correct/total: .2f}%")

Accuracy:  10.58%
