In [None]:
#@title Hidden Code (Run Me!)
### DO NOT CHANGE ###
%load_ext autoreload
%autoreload 2

! git clone https://github.com/Momentum-AI-Org/momentum_final_projects.git
%cd momentum_final_projects 
! pip install -e .

from api.config import ProjectConfig
from api.setup_script import setup_script
from utils.constants import PROJECT_TYPE

# one of SHOES, FRUIT, PIZZA, RECAPTCHA, WEATHER
ProjectConfig.PROJECT_NAME = PROJECT_TYPE.FRUIT
setup_script()

from api.common import (
    download_data,
    get_train_test_datasets,
    get_model,
    train_model,
    display_loss_curves,
    visualize_dataset,
    evaluate_pretrain_accuracy,
    evaluate_test_accuracy,
    visualize_predictions,
)
### DO NOT CHANGE ###

# Fruit Classifier Project
Welcome to the Fruit Classifier project! In this project, you'll be building a classifier (a convolutional neural network) to identify different types of fruit from images. You'll be using a lot of knowledge you've learned over the past week, so feel free to use those materials as you work on your project. As always, feel free to ask your mentors quesitons if you're stuck (that's what they're here for).

The project will have the following structure:
1. Downloading the dataset
2. Building the train/test datasets
3. Looking at the dataset
4. Building our model
5. Choosing the hyperparameters
6. Training our model
7. Evaluating our model
8. Seeing our model's predictions!

## Helper Functions!

There are predefined function to walk you through this lab. 

 **Hint: the functions are presented in the order you should use them!**

<br/>

`download_data()`
- This downloads the fruit dataset you need! 
- This function takes in no parameters and dosen't return anything.

<br/>

`get_train_test_datasets(num_train_imgs_per_class, num_test_imgs_per_class) -> train_dataset, test_dataset`
- Creates a train and test dataset. 
- You pass in how many images you want the train and test datasets to contain (`num_train_imgs_per_class` and `num_test_imgs_per_class`) this function will return a `train_dataset` and a `test_dataset`. 
- Try training with `100` images and testing with `25` images to begin, and increase from there.
- Note that if you try to make the datasets too big (and there aren't enough downloaded images), this function will throw an error.

<br/>

`visualize_dataset(dataset)`
- This function takes in one parameter, which is the `dataset` you want to visualize.
- This function dosen't return anything.


<br/>

`get_model(depth, num_filters) -> model`
- This function takes in two parameters.
- `depth` controls the number of layers that your network has. More depth corresponds to a larger and more complex network, but is slower to train. Try keeping this value in the range `(4, 8)`.
- `num_filters` controls the number of filters that your network has (think back to convolutional neural networks!). More filters corresponds to a larger and more complex network, but is slower to train. Try keeping this value in the range `(128, 512)`.
- This function returns an untrained neural network `model`.

<br/>

`evaluate_pretrain_accuracy(model, test_dataset) -> pretrained_accuracy`
- This function takes in your `model` and `test_dataset` and returns how accurate your model is at classifying the images correctly (`pretrained_accuracy`).


<br/>

`train_model(model, train_dataset, test_dataset, n_epochs, lr)`
- This function takes in several parameters:
- `model` your untrained model
- `train_dataset` dataset to use for training
- `test_dataset` dataset to use for testing
- `n_epochs` number of times to loop through the `train_dataset` during training. We suggest keeping this value between `(10, 60)`
- `lr` the learning rate to use during training (remember going down the hill!). We suggest keeping this value between `(0.01, 0.0001)`
- This function dosen't return anything, but edits your existing model to make it better!


<br/>


`display_loss_curves()`
- This function will display pretty loss curves after you finish training your model! Helpful to detect overfitting!
- This function does not have any parameters, nor does it return anything.

<br/>

`evaluate_test_accuracy(model, test_dataset) -> test_accuracy`
- This function takes in your `model` and `test_dataset` and returns how accurate your model is at classifying the images correctly (`test_accuracy`).

<br/>

`visualize_predictions(model, test_dataset)`
- This function takes in your `model` and `test_dataset` and will create a pretty graphic of your model's predictions!
- This function dosen't return anything.

## Downloading the Data

Machine learning is all about data, so lets start by getting some data! Unfortunately, our dataset lives online -- lets download it using one of the functions above :) 

Fill in the code box below to download the image data to our colab notebook. Wait until the code below finishes running.

In [None]:
# your code here! ( our solution is just one line of code :D )

## Compile the Data Into Datasets

The downloaded data is messy, so we've done some of the grunt work to clean it up for you! Call the right function below to create a `train_dataset` and `test_dataset`. We will train our model on the `train_dataset` and evaluate it on the `test_dataset`.

**Why do we want to split our data into two datasets? Hint: it has to do with overfitting!**

In [None]:
# your code here! ( our solution is just one line of code :D )

## Looking at our Data

That was a lot of data we just downloaded, processed, and organized! But how do we know we downloaded the right thing? Well, we can look at the data!

Try visualizing your `train_dataset` and `test_dataset` below!

Consider the following:
- Does the data look like what you expected? 
- Whats in the foreground of the images that you see?
- Whats in the background?
- What colors are the objects in the images that you see? Is there one color that stands out?
- Are the images blurry or clear?

Be critical about your data because your model can only ever be as good as your data!

In [None]:
# your code here! ( visualizing each dataset can be done in one line of code :D )

## Create an Untrained Model

We're ready to build our model! You get to decide how complex to make your model (tune `depth` and `num_filters` accordingly, after first reading the documentation above). 

Remember: simpler models are much faster to train, but larger ones can capture more of the details and patterns in your data!

In [None]:
# your code here! ( our solution is just one line of code :D )

## Evaluating Our Untrained Model

Let's see how well our untrained model does! It shouldn't do much better than random guessing...

In [None]:
# your code here! ( our solution is just one line of code :D )

## Training Time

Now for the exciting part! Lets train our model. 
Write the code below to train our model, and adjust the hyperparamerters, `n_epochs` and `lr` to your liking. 

Training the model will likely take a while depending on the parameters you chose (up to 15 minutes)! This may be a great time to 
- eat a snack
- use the bathroom
- talk to a friend
- go outside

... but remember to keep an eye on your training loss to see if your model is improving!

In [None]:
# your code here! ( our solution is just one line of code :D )

## Evaluate our Model
We can evaluate our model in 3 ways.

1. We can look at the loss curves during training 
2. We can compute our model's overall test accuracy
3. We can see some sample predictions that our model makes

(each of these is a separate function)

<br/>

### What to look for
Machine learning is sometimes more of an art than a science. Heres what to watch out for:
1. You want to make sure you training loss is steadily deacreasing, and your validation accuracy is steadily increasing.
  - If your training loss is jumping around your learning rate may be too high.
  - If your validation loss starts to decrease late into training it can be a sign of overfitting!

2.  These are hard problems and we are limited by time! Don't worry about making your accuracy perfect. Try for over `60%` :)

3. See if there is a pattern to the mistakes your model is making.


<br/>

### Improving Your Model

There are several numbers you can play around with to try to boost your model's performance:

1. Change your dataset size (number of images)
2. Change your model's complexity (depth and # of filters)
3. Change your training time (# of epochs)

If you're stuck, ask a mentor for help!

### Visualize Loss Curves

In [None]:
display_loss_curves()

### Compute Test Accuracy

In [None]:
evaluate_test_accuracy(model, test_dataset)

### Visualize Predictions

In [None]:
visualize_predictions(model, test_dataset)