<a href="https://colab.research.google.com/github/Anupama83/PyTorch-for-Deep-Learning/blob/main/03_PyTorch_Custom_Dataset_Udemy_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Custom Dataset
1. https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
2. https://pytorch.org/vision/0.8/datasets.html
3. https://pytorch.org/audio/stable/index.html



# What we're going to build
1. Getting a custom dataset with PyTorch
2. Becoming one with the data (preparing and visualizing)
3. Transforming data for use with a model
4. Loading custom data with pre-built fucntions and custom functions
5. Building FoodVision Mini to classify food images
6. Comparing models with and without data augmentation
7. Making predictions on custom data



* FoodVision Mini
PyTorch Libraries:
1. torchvision.transforms
2. torch.utils.data.Dataset
3. torch.utils.data.DataLoader




We've used some datasets with PyTorch before.

But how do you get your own data into PyTorch?


ONe of the wats to do so is vie: custom dataset

## Domain Libraries

Depending on what you're working on, vision, text, audio, recommendation, you'll want to look into each of the PyTorch domain libraries for existing data loading functions and customizable data loading functions.

## 0. Importing PyTorch and Setting up Device Agnostic Code


In [1]:
import torch
from torch import nn

torch.__version__

'2.5.1+cu121'

In [2]:
# Set up device agnostic
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [3]:
!nvidia-smi

Sun Dec 22 13:53:17 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P8              10W /  70W |      3MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## 1. Get Data

Our dataset is a subset of the FOod101 dataset.

Food101 startes 101 different classes of food and 1000 images per class (750 training and 250 testing.

Our dataset starts with 3 classes of food and only 10% of the images (~75 training, 25 testing)


Why do this?

When starting out ML projects, it's important to try things on a small scale and then increase the scale when necessary.

The whole point is speed up how fast you can experiment.



In [9]:
import requests
import zipfile
from pathlib import Path

# Setup path to a data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If the image folder doesn't exits, download it and prepare it

if image_path.is_dir():
  print(f"{image_path} directry exists... skipping download")
else:
  print(f"{image_path} does not exist, creating one...")
  image_path.mkdir(parents = True, exist_ok = True)

# Download pizza, steak and sushi data
with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
  request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
  print("Downloading pizza, steak, sushi data...")
  f.write(request.content)

# Unzip
with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
  print("Unzipping pizza steak and sushi data")
  zip_ref.extractall(image_path)

# Remove zip file
!rm {data_path / "pizza_steak_sushi.zip"}


data/pizza_steak_sushi directry exists... skipping download
Downloading pizza, steak, sushi data...
Unzipping pizza steak and sushi data


In [8]:
data_path / "pizza_steak_sushi.zip"

PosixPath('data/pizza_steak_sushi.zip')