# <b> <p align="center"> <span style="color: #DCC43C"> GOING MODULER <span> </p> </b>
### <b> <p align="center"> <span style="color: #BFF0FF"> script mode <span> </p> </b>

## What is script mode?

**Script mode** uses [Jupyter Notebook cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) (special commands) to turn specific cells into Python scripts.

For example if you run the following code in a cell, you'll create a Python file called `hello_world.py`:

```
%%writefile hello_world.py
print("hello world, machine learning is fun!")
```

You could then run this Python file on the command line with:

```
python hello_world.py

>>> hello world, machine learning is fun!
```

The main cell magic we're interested in using is `%%writefile`.

Putting `%%writefile filename` at the top of a cell in Jupyter or Google Colab will write the contents of that cell to a specified `filename`.

> **Question:** Do I have to create Python files like this? Can't I just start directly with a Python file and skip using a Google Colab notebook?
>
> **Answer:** Yes. This is only *one* way of creating Python scripts. If you know the kind of script you'd like to write, you could start writing it straight away. But since using Jupyter/Google Colab notebooks is a popular way of starting off data science and machine learning projects, knowing about the `%%writefile` magic command is a handy tip.

## What has script mode got to do with PyTorch?

If you've written some useful code in a Jupyter Notebook or Google Colab notebook, chances are you'll want to use that code again.

And turning your useful cells into Python scripts (`.py` files) means you can use specific pieces of your code in other projects.

This practice is not PyTorch specific.

But it's how you'll see many different online PyTorch repositories structured.

### PyTorch in the wild

For example, if you find a PyTorch project on GitHub, it may be structured in the following way:

```
pytorch_project/
├── pytorch_project/
│   ├── data_setup.py
│   ├── engine.py
│   ├── model.py
│   ├── train.py
│   └── utils.py
├── models/
│   ├── model_1.pth
│   └── model_2.pth
└── data/
    ├── data_folder_1/
    └── data_folder_2/
```

Here, the top level directory is called `pytorch_project` but you could call it whatever you want.

Inside there's another directory called `pytorch_project` which contains several `.py` files, the purposes of these may be:
* `data_setup.py` - a file to prepare data (and download data if needed).
* `engine.py` - a file containing various training functions.
* `model_builder.py` or `model.py` - a file to create a PyTorch model.
* `train.py` - a file to leverage all other files and train a target PyTorch model.
* `utils.py` - a file dedicated to helpful utility functions.

And the `models` and `data` directories could hold PyTorch models and data files respectively (though due to the size of models and data files, it's unlikely you'll find the *full* versions of these on GitHub, these directories are present above mainly for demonstration purposes).

> **Note:** There are many different ways to structure a Python project and subsequently a PyTorch project. This isn't a guide on *how* to structure your projects, only an example of how you *might* come across PyTorch projects in the wild. For more on structuring Python projects, see Real Python's [*Python Application Layouts: A Reference*](https://realpython.com/python-application-layouts/) guide.

## What we're going to cover

By the end of this notebook you should finish with a directory structure of:

```
going_modular/
├── going_modular/
│   ├── data_setup.py
│   ├── engine.py
│   ├── model_builder.py
│   ├── train.py
│   └── utils.py
├── models/
│   ├── 05_going_modular_cell_mode_tinyvgg_model.pth
│   └── 05_going_modular_script_mode_tinyvgg_model.pth
└── data/
    └── pizza_steak_sushi/
        ├── train/
        │   ├── pizza/
        │   │   ├── image01.jpeg
        │   │   └── ...
        │   ├── steak/
        │   └── sushi/
        └── test/
            ├── pizza/
            ├── steak/
            └── sushi/
```

Using this directory structure, you should be able to train a model from within a notebook with the command:

```
!python going_modular/train.py
```

Or from the command line with:

```
python going_modular/train.py
```

In essence, we will have turned our helpful notebook code into **reusable modular code**.

## 0. Creating a folder for storing Python scripts

Since we're going to be creating Python scripts out of our most useful code cells, let's create a folder for storing those scripts.

We'll call the folder `going_modular` and create it using Python's [`os.makedirs()`](https://docs.python.org/3/library/os.html) method.

In [2]:
import os 

os.makedirs("going_moduler",exist_ok=True)

## 1. Get data

We're going to start by downloading the same data we used in [notebook 04](https://github.com/ShafaetUllah032/DL_with_PyTorch/blob/main/04%20pytorch%20custom%20dataset.ipynb), the `pizza_steak_sushi` dataset with images of pizza, steak and sushi.

In [None]:
# from google.oauth2 import service_account
# from googleapiclient.discovery import build
# import requests
# import os

# # Path to the credentials.json file you downloaded
# credentials_path = 'credentials.json'

# # Authenticate using the service account
# SCOPES = ['https://www.googleapis.com/auth/drive.readonly']
# creds = service_account.Credentials.from_service_account_file(credentials_path, scopes=SCOPES)
# service = build('drive', 'v3', credentials=creds)

# # Folder ID from the shareable link
# folder_id = 'YOUR_FOLDER_ID_HERE'

# # List files in the folder
# results = service.files().list(q=f"'{folder_id}' in parents", fields="files(id, name)").execute()
# files = results.get('files', [])

# if not files:
#     print('No files found.')
# else:
#     print('Files:')
#     for file in files:
#         print(f"{file['name']} ({file['id']})")

#     # Create a folder to save the downloaded files
#     download_folder = 'downloaded_images'
#     os.makedirs(download_folder, exist_ok=True)

#     # Download each file
#     for file in files:
#         file_id = file['id']
#         file_name = file['name']
#         request = service.files().get_media(fileId=file_id)
#         file_path = os.path.join(download_folder, file_name)

#         with open(file_path, 'wb') as f:
#             downloader = requests.get(f"https://www.googleapis.com/drive/v3/files/{file_id}?alt=media", headers={"Authorization": f"Bearer {creds.token}"})
#             f.write(downloader.content)
#             print(f"Downloaded {file_name}")


In [2]:
import os
import zipfile
from pathlib import Path
import requests

# setup path to data folder

data_path=Path("data/")
image_path=data_path/"pizza_steak_sushi"


# If the imamge floder doesn't exist , download it and prepare it

if image_path.is_dir():
    print(f"{image_path} exist no need to create ..")
else:
    image_path.mkdir(parents=True,exist_ok=True)
    print(f"{image_path} created successfully")


# Downlod pizza, steak, sushi data

with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading pizza, steak, sushi data...")
    f.write(request.content)

# Unzip pizza, steak, sushi data
with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak, sushi data...")
    zip_ref.extractall(image_path)

# Remove zip file
os.remove(data_path / "pizza_steak_sushi.zip")
print("zip file removed ... ")

data\pizza_steak_sushi exist no need to create ..
Downloading pizza, steak, sushi data...
Unzipping pizza, steak, sushi data...
zip file removed ... 


In [3]:
# Setup train and testing paths

train_dir=image_path/"train" 
test_dir=image_path/"test"

train_dir,test_dir

(WindowsPath('data/pizza_steak_sushi/train'),
 WindowsPath('data/pizza_steak_sushi/test'))

## 2. Create Datasets and DataLoaders

Let's turn our data into PyTorch `Dataset`'s and `DataLoader`'s and find out a few useful attributes from them such as `classes` and their lengths.

In [5]:
from torchvision import datasets, transforms

# Create simple transform

data_transform=transforms.Compose([
    transforms.Resize((64,64)),
    transforms.ToTensor(),
])

# Use ImageFolder to create dataset(s)
train_data = datasets.ImageFolder(root=train_dir, # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

test_data = datasets.ImageFolder(root=test_dir,
                                 transform=data_transform)

print(f"Train data:\n{train_data}\nTest data:\n{test_data}")

Train data:
Dataset ImageFolder
    Number of datapoints: 225
    Root location: data\pizza_steak_sushi\train
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )
Test data:
Dataset ImageFolder
    Number of datapoints: 75
    Root location: data\pizza_steak_sushi\test
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
           )


In [6]:
# Get class names as a list
class_names = train_data.classes
class_names

['pizza', 'steak', 'sushi']

In [7]:
# Can also get class names as a dict
class_dict = train_data.class_to_idx
class_dict

{'pizza': 0, 'steak': 1, 'sushi': 2}

In [8]:
# Check the lengths
len(train_data), len(test_data)

(225, 75)

In [9]:
# Turn train and test Datasets into DataLoaders
from torch.utils.data import DataLoader

train_dataloader = DataLoader(dataset=train_data,
                              batch_size=1, # how many samples per batch?
                              num_workers=1, # how many subprocesses to use for data loading? (higher = more)
                              shuffle=True) # shuffle the data?

test_dataloader = DataLoader(dataset=test_data,
                             batch_size=1,
                             num_workers=1,
                             shuffle=False) # don't usually need to shuffle testing data

train_dataloader, test_dataloader

(<torch.utils.data.dataloader.DataLoader at 0x1b75ac263d0>,
 <torch.utils.data.dataloader.DataLoader at 0x1b75ac26f40>)

In [10]:
# Check out single image size/shape
img, label = next(iter(train_dataloader))

# Batch size will now be 1, try changing the batch_size parameter above and see what happens
print(f"Image shape: {img.shape} -> [batch_size, color_channels, height, width]")
print(f"Label shape: {label.shape}")

Image shape: torch.Size([1, 3, 64, 64]) -> [batch_size, color_channels, height, width]
Label shape: torch.Size([1])
