# Assignment 1: Machine Learning, Deep Learning, YOLO

For this assignment we will be working with the **California housing dataset** for machine learning, the **CIFAR-10 dataset** for deep learning, and your **own** dataset for object detection using YOLO.

- **California Housing Dataset:**
  - A regression dataset from scikit-learn, based on the 1990 U.S. census. It includes housing features like median income, house age, and number of rooms, with the goal of predicting median house values in California.

- **CIFAR-10 Dataset:**
  - A classification dataset containing 60,000 32x32 color images in 10 classes, including animals and vehicles. Widely used in machine learning for training and benchmarking image recognition models, especially CNNs.



Parts of this assignment will require you to go **online** and find resources to help you answer some questions, as they were not covered in the tutorial. This is common in real world development scenarios, and **encouraged**.



---


Open the cells below to start



# Machine Learning

## Step 1: Load and Explore the Dataset

In [None]:
from sklearn.datasets import fetch_california_housing
import pandas as pd

# TASK: Load the California Housing dataset and convert it to a pandas DataFrame.


# TASK: Explore the first few rows of the dataset.


## Step 2: Basic Statistical Overview

In [None]:
# TASK: Display a statistical overview of the dataset.



## Step 3: Visualize the Data

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# TASK: Create a heatmap visualization to understand the data better.
# Hint 1: Look at the seaborn library



**Graph should look like this:**
![Image Description](https://drive.google.com/uc?export=view&id=1PzKy45PwQEw-YPvSFawiYNSztUZh92Ck)


## Step 4: Data Processing

In [None]:
# TASK: Perform feature scaling on the dataset and split it into training and testing sets.


## Step 5: Train the Model

In [None]:
# TASK: Initialize a RandomForestRegressor and train it on the training data.



## Step 6: Model Evaluation

In [None]:
# TASK: Evaluate the model using MSE and R-squared metrics.



## Step 7: Using the Model for Prediction

In [None]:
# TASK: Use the model to make predictions on new data.


## Step 8: Reflections



1.   **Model Insights:** Reflect on the model's performance based on the metrics used (MSE, R-squared, etc.). What do these metrics tell you about the model's accuracy and reliability?

2.   **Feature Influence:** Which features did you find to be most influential in predicting housing prices, based on your understanding of the dataset and the model's behavior?

3.   **Model Complexity and Performance:** Do you think the complexity of the RandomForestRegressor was appropriate for this dataset? How might the model's complexity impact its performance and ability to generalize?

4.   **Challenges in the Assignment:** What were the most challenging aspects you encountered during this assignment? How did you address these challenges?

5.   **Potential Improvements:** What improvements or additional experiments would you consider to enhance the model's performance or to gain deeper insights?

6.   **Real-World Implications:** How can the insights gained from this model be applied in real-world scenarios? What are some practical applications and potential limitations?

7.   **Ethical Considerations:** Discuss any ethical concerns that might arise when using machine learning models in real estate, such as the prediction of housing prices. How might these concerns influence how you approach a data science problem?

**Put your answers down below**

1.

2.

3.

4.

5.

6.

# Deep Learning

### Step 1: Data Loading and Initial Exploration

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt

# TASK: Load the CIFAR-10 dataset using TensorFlow Keras and display the first few images.
# Here are the class names for CIFAR-10
class_names = ['Airplane', 'Car', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']


**Images should look like this**
![Image Description](https://drive.google.com/uc?export=view&id=1_jCI6nepIuO3ya6gOQwDrZd7NEgXMc9H)


## Step 2: Preprocessing the Data

In [None]:
# TASK: Normalize the pixel values of the images from [0, 255] to [0, 1] range

# TASK: Reshape the images to add a channel dimension (necessary for CNNs)


## Step 3: Building the Neural Network Model (CNNs)

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten

# TASK: Build a convolutional neural network for image classification.
# Follow this model summary as to how to build your model

# Model: "sequential"
# _________________________________________________________________
#  Layer (type)                Output Shape              Param #
# =================================================================
#  conv2d (Conv2D)             (None, 30, 30, 32)        896

#  max_pooling2d (MaxPooling2  (None, 15, 15, 32)        0
#  D)

#  conv2d_1 (Conv2D)           (None, 13, 13, 64)        18496

#  max_pooling2d_1 (MaxPoolin  (None, 6, 6, 64)          0
#  g2D)

#  conv2d_2 (Conv2D)           (None, 4, 4, 64)          36928

#  flatten (Flatten)           (None, 1024)              0

#  dense (Dense)               (None, 64)                65600

#  dense_1 (Dense)             (None, 10)                650



## Step 4: Training the Model

In [None]:
from sklearn.model_selection import train_test_split
# TASK: Combine the preprocessed train and test sets and then split into training, validation, and test sets (70:20:10 split).
# Hint 1: https://www.geeksforgeeks.org/training-vs-testing-vs-validation-sets/


# TASK: Compile the CNN model with an optimizer, loss function for multi-class classification, and metrics for evaluation.

# TASK: Train the CNN model on the training data and validate it using the validation set. Choose an appropriate number of epochs.

# TASK: Print training and validation metrics to evaluate the performance.


## Step 5: Evaluating the Model and Making Predictions

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# TASK: Use the CNN model to make predictions on the test set and compare these predictions to the actual values.

# TASK: Create a visualization to compare the actual and predicted values.



**Predictions should look like this:**
![Image Description](https://drive.google.com/uc?export=view&id=1uy3Lfg_TyhZA6SQuMc-x0BXI1j4p2Iln)


## Step 6: Reflection Questions


1.   **Model Performance Analysis:** Reflect on the model's performance on the training, validation, and test data. Did you observe any signs of overfitting or underfitting? How well did the model generalize to the test data?

2.   **Hyperparameter Tuning Insights:** Adjust some hyperparameters and train again. How did adjustments to the learning rate, number of epochs, or batch size affect the training process and the final model performance?


3. **Impact of Model Complexity:** Do you think the complexity of your model was appropriate for the task? How might changes in the model's architecture (like adding or removing layers) affect its performance?

4. **Comparison to Traditional Models:** How do you think a neural network model for this regression task compares to more traditional models, like linear regression or decision trees?

5. **Real-World Applications:** Consider how this model could be used in real-world scenarios. What are some potential applications and challenges?

6. **Learning and Challenges:** Reflect on your learning experience throughout this project. What were the most challenging aspects, and how did you address them?

**Put your answers down below**

1.

2.

3.

4.

5.

6.

# YOLO

For this part of the assignment, you will be training a YOLOv4 Tiny object detection model using your **own** dataset from Roboflow, following the same procedure we used in the tutorial.

Go here to find and fork a dataset: https://public.roboflow.com/object-detection/

The majority of the code setup will be very similar to what we established in the tutorial, with some key changes:

1.   Add some **augmentations** of your choice such as rotations, saturation, grayscale, etc... and upload a snapshot of the augmentation options you picked to in **Step 6: Reflections**.

  *   Ensure you have **Auto-Orient** and **Resize** for preprocessing.

  *   Choose ideally the smallest **max version size** (the number of new variations generated from the augmentations) a bigger max version size will result in longer train time.


2.   Use the full **max_batches** forumla as we discussed. This will take a **long** time, for the sake of this assignment you will be training a **full** model.

3.   Run your model on some images, save and upload **3** of them to **Step 6: Reflections**.

4.   Answer the questions in **Step 6: Reflections** regarding your YOLOv4 Tiny model.




## Step 1: Environment Setup in Google Colab

In [None]:
# TASK: Move to the proper directory and check Nvidia CUDA drivers.


In [None]:
# TASK: Pull from the Darknet framework


In [None]:
# TASK: Modify Makefile


In [None]:
# TASK: download the newly released yolov4-tiny weights


## Step 2: Obtaining and Processing Data

In [None]:
# TASK: Pull your data from roboflow


In [None]:
# TASK: Set up training file directories for custom dataset


## Step 3: Create Config file

In [None]:
# TASK: build config dynamically based on number of classes


In [None]:
# TASK: Write cfg template

## Step 4: Train Model

In [None]:
# TASK: Train your model


## Step 5: Predict and visualize

In [None]:
# TASK: create a funciton to visualize your results


In [None]:
# TASK: copy label names to coco


In [None]:
# TASK: Predict on some test images and visualize


## Step 6: Reflections

**Insert Augmentation choice images here:**

**Should look like this:**

<img src="https://drive.google.com/uc?export=view&id=10Ra8pjegvU0JV2i5py5MLqamiGnoFRd9" alt="Image Description" width="200"/>


**Insert 3 prediction images here**



1.   **Model Performance and Observations:** Reflect on the performance of your YOLO v4 Tiny model. How effective was it in terms of accuracy and speed for object detection? Did you notice any specific challenges or successes in detecting objects?

2.   **Dataset Selection and Challenges:** Discuss the dataset you used for training the model. What were the main characteristics of this dataset, and what challenges did you face while working with it.

3.   **Challenges in Training and Detection:**
What were the most significant challenges you encountered during the training
of the model or in the detection phase? How did you address these challenges?

4.   **Real-World Application and Ethical Considerations:**
How could the trained model be applied in real-world scenarios? Are there ethical considerations to be aware of when deploying this model in a practical setting?

5.   **Reflection on Learning and Future Improvements:**
Reflect on your learning experience throughout this assignment. What aspects did you find most enlightening? Based on your experience, what improvements would you suggest for future projects involving YOLO v4 Tiny or similar object detection tasks?

**Put your answers down below**

1.

2.

3.

4.

5.
