# Multiclass Food Image Classification with Convolutional Neural Networks

### Technology & Skills
**Technology** Python, Jupyter Notebook, GitHub, Git

**Python Libraries:** Pandas, requests, time, urllib, sys, os, numpy, tensorflow, keras, scikit-learn, matplotlib

**Technical Skills:** Multiclass classification, data collection, data cleaning, scraping web API, image processing, pre-processing, modeling, data visualization, exploratory data analysis (EDA), computer vision, deep learning, convolutional neural networks, neural network architecture, image data augmentation, accuracy, precision, recall, confusion matrix, pre-trained model, transfer learning, bias-variance tradeoff

**Models:** Convolutional neural network, ResNet50, EfficientNet-B0

### Overview
The code notebooks contain the following 4 sections:
1. Introduction
2. Data Collection & Cleaning
3. Pre-Processing, Modeling, Data Visualization & EDA
4. Conclusion

### Problem Statement
In this project, I intend on building a convolutional neural network (CNN) to classify food images.

In this project, I hope to accomplish the following objectives:
- Collect and clean at least 200 images per food class for 5 classes using the pushshift.io Reddit API, and
- Build convolutional neural networks from scratch and pre-trained, state of the art models to predict multiclass food classes with accuracy scores higher than that of the null model.

For this multiclass classification problem, I aim to predict the following 5 classes:
1. Burger
2. Hotdog
3. Pizza
4. Taco
5. Sushi

I will be using the following subreddits to collect image data:
- r/burgers (103k members)
- r/hotdogs (19.5k members)
- r/Pizza (309k members)
- r/tacos (47.7k members)
- r/sushi (226k members)

The general workflow is as follows: image data --> multiclass outcome.

### Data Sources
Given more resources, the optimal way to collect image data would have been through a web API for Google Images. Unfortunately, this would have required securing a bigger budget and navigating the nuances of the /robots.txt page for Google Images (which is very particular about what is and is not allowed). Thus, for practical reasons, I chose to collect the image data using the pushshift.io Reddit API. The links to the data have been provided below:
- [r/burgers](https://api.pushshift.io/reddit/search/submission?subreddit=burgers): The pushshift.io Reddit API for r/burgers.
- [r/hotdogs](https://api.pushshift.io/reddit/search/submission?subreddit=hotdogs): The pushshift.io Reddit API for r/hotdogs.
- [r/Pizza](https://api.pushshift.io/reddit/search/submission?subreddit=Pizza): The pushshift.io Reddit API for r/Pizza.
- [r/tacos](https://api.pushshift.io/reddit/search/submission?subreddit=tacos): The pushshift.io Reddit API for r/tacos.
- [r/sushi](https://api.pushshift.io/reddit/search/submission?subreddit=sushi): The pushshift.io Reddit API for r/sushi.

### Data Dictionary

|Feature|Type|Dataset|Description|
|---|---|---|---|
|subreddit|object|df|The name of the subreddit|
|title|object|df|The contents of the post|
|url|object|df|The url for the image|