# Going Modular

### [Resource](https://www.learnpytorch.io/05_pytorch_going_modular/)

Turning the most useful code cells in [`00_PyTorch_Custom_Datasets`](https://github.com/LuluW8071/Data-Science/tree/main/Pytorch/04_PyTorch_Custom_Datasets) into a series of Python scripts saved to a directory called going_modular.

### What is going modular?
Going modular involves turning notebook code (from a **Jupyter Notebook** or **Google Colab Notebook**) into a series of different Python scripts that offer similar functionality.

For example, we could turn our notebook code from a series of cells into the following Python files:

- `data_setup.py` - a file to prepare and download data if needed.
- `dataset.py` - a file to create a dataloader.
- `model_builder.py` or `model.py` - a file to create a PyTorch model.
- `engine.py` - a file containing various training functions.
- `train.py` - a file to leverage all other files and train a target PyTorch model.
- `utils.py` - a file dedicated to helpful utility functions.

### Why would you want to go **modular**?
Notebooks are fantastic for iteratively exploring and running experiments quickly. However, for larger scale projects you may find Python scripts more reproducible and easier to run.

For example, if you have an app running online that other people can access and use, the code running that app is considered **production code**.

## Pros and Cons of Notebooks vs Scripts

|                | Pros                                                  | Cons                                                                  |
|----------------|-------------------------------------------------------|-----------------------------------------------------------------------|
| Notebooks      | Easy to experiment/get started                        | Versioning can be hard                                                |
|                | Easy to share (e.g. a link to a Google Colab notebook)| Hard to use only specific parts                                       |
|                | Very visual                                           | Text and graphics can get in the way of code                          |


|                | **Pros**                                              | **Cons**                                                              |
|----------------|-------------------------------------------------------|-----------------------------------------------------------------------|
| Python scripts | Can package code together (saves rewriting similar code across different notebooks) | Experimenting isn't as visual (usually have to run the whole script rather than one cell) |
|                | Can use git for versioning                                                        |                                         |
|                | Many open source projects use scripts                                             |                                         |
|                | Larger projects can be run on cloud vendors (not as much support for notebooks)   |                                         |

## What we're working towards
By the end of this section we want to have two things:

1. The ability to train the model we built in [`04_PyTorch_Custom_Datasets`](https://github.com/LuluW8071/Data-Science/tree/main/Pytorch/04_PyTorch_Custom_Datasets) with one line of code on the command line:

    ```bash
    python train.py
    ```

2. A directory structure of reusable Python scripts, such as:

    ```
    05_PyTorch_Going_Modular/
    ├── going_modular/
    |   ├── data_setup.py
    |   ├── dataset.py
    |   ├── engine.py
    |   ├── model.py
    |   ├── train.py
    |   ├── utils.py
    |   └── dataset/
    |       ├── test/
    |       │   ├── donuts
    |       │   ├── dumplings
    |       │   ├── ice_cream
    |       │   ├── pizza
    |       │   ├── ramen
    |       │   ├── samosa
    |       │   ├── steak
    |       │   └── sushi
    |       └── train/
    |           ├── donuts
    |           ├── dumplings
    |           ├── ice_cream
    |           ├── pizza
    |           ├── ramen
    |           ├── samosa
    |           ├── steak
    |           └── sushi
    └── models/
        └── tinyvgg_model.pth
    ```



After writing the scripts, run the command in terminal:

```bash
python train.py
```

<i>**Note**: 
If you dont have external GPU you can also
- use colab notebook,
- Just Upload scripts on colab runtime, and 
- Don't forget to switch runtime to T4GPU</i>

In [2]:
!python3 train.py

Downloading...
From (original): https://drive.google.com/uc?id=1J0syU84FNmtxkf9AzDPdRSDmtUr1CSy8
From (redirected): https://drive.google.com/uc?id=1J0syU84FNmtxkf9AzDPdRSDmtUr1CSy8&confirm=t&uuid=0e0cfab4-0c19-4381-b70d-1aa293fb1387
To: /content/Food_dataset.zip
100% 367M/367M [00:04<00:00, 87.4MB/s]
Files extracted successfully to: ./dataset
  0% 0/100 [00:00<?, ?it/s]
Epoch: 1 | Train loss: 2.0327 - Train acc: 19.99% -- Test_loss: 1.9544 -- Test_acc: 23.89%
  1% 1/100 [00:34<56:20, 34.14s/it]
Epoch: 2 | Train loss: 1.9358 - Train acc: 26.23% -- Test_loss: 1.8934 -- Test_acc: 26.29%
  2% 2/100 [01:07<55:20, 33.88s/it]
Epoch: 3 | Train loss: 1.8911 - Train acc: 27.52% -- Test_loss: 1.9725 -- Test_acc: 22.76%
  3% 3/100 [01:41<54:56, 33.98s/it]
Epoch: 4 | Train loss: 1.8232 - Train acc: 31.60% -- Test_loss: 1.7604 -- Test_acc: 34.98%
  4% 4/100 [02:14<53:38, 33.53s/it]
Epoch: 5 | Train loss: 1.7706 - Train acc: 35.18% -- Test_loss: 1.7971 -- Test_acc: 31.27%
  5% 5/100 [02:48<53:04, 33.

To run the demo version of `05_PyTorch_Going_Modular`, install the required dependencies from `demo` directory

In [None]:
cd demo
!pip install -r requirements.txt

Run streamlit command for local web deployment:

In [None]:
streamlit run main.py

The trained model seems to be underfitting which could be mainly due to 
- the model is too simple, So it may be not capable to represent the complexities in the data.
- the input features which is used to train the model is not the adequate representations of underlying factors influencing the target variable.
- the size of the training dataset used is not enough.

Maybe using other architecture like `ResNet50`, `EfficientNetB0` and other could produce better results.