# Inspect data notebook

This notebook is for inspecting our dataset to see if processed data fits with our expectations, or if some changes need to be made to the data.py file. 

### Checking Dimensions 

In [4]:
import json

import matplotlib.pyplot as plt
import torch

train_images = torch.load("../data/processed/train_images.pt")
train_targets = torch.load("../data/processed/train_targets.pt")

test_images = torch.load("../data/processed/test_images.pt")
test_targets = torch.load("../data/processed/test_targets.pt")

print("Train images shape:", train_images.shape)
print("Train targets shape:", train_targets.shape)
print("Test images shape:", test_images.shape)
print("Test targets shape:", test_targets.shape)

Train images shape: torch.Size([6528, 3, 224, 224])
Train targets shape: torch.Size([6528, 25])
Test images shape: torch.Size([726, 3, 224, 224])
Test targets shape: torch.Size([726, 25])


This match with the 7,254 (6528 + 726 = 7254) observations we have from the csv and the 25 genre labels we have. So it fits with our expectations!

### Checking Single Instance

In [5]:
i = 0
print("Single image shape:", train_images[i].shape)
print("Single target vector:", train_targets[i])
print("Number of active labels:", train_targets[i].sum())

Single image shape: torch.Size([3, 224, 224])
Single target vector: tensor([1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0])
Number of active labels: tensor(3)


In [7]:
import json

with open("../data/processed/label_names.json") as f:
    label_names = json.load(f)

active_labels = train_targets[i].nonzero(as_tuple=True)[0]

print("Genres for image", i)
for idx in active_labels:
    print("-", label_names[idx])

Genres for image 0
- Action
- Adventure
- Sci-Fi


### Checking normalization

In [8]:
print("Train mean:", train_images.mean().item())
print("Train std:", train_images.std().item())

Train mean: -2.8876092628138395e-08
Train std: 1.0


Here the mean is really close to 0 and the standard deviation is 1 so the normalization seems to have worked. 