# Loading Libraries

This section of the code is importing necessary libraries and modules that will be used throughout the script. This includes libraries for data manipulation (pandas, numpy), data visualization (matplotlib, seaborn), and machine learning (sklearn).

In [4]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from joblib import dump
from joblib import load

# Loading the Data

Here, the code is loading a dataset from a CSV file and displaying the first 5 rows to get a sense of the data.

In [None]:
os.chdir(r'C:\Users\guilh\OneDrive\Área de Trabalho\FinalProject\FinalProject')
#Loading the dataset
file_path = 'done_food_data.csv'
food_data = pd.read_csv(file_path)
#Top 5 rows
food_data.head()

***

# Missing Values


This code checks for missing values in the food_data dataset. It uses the isnull() function from pandas to identify missing values in each column of the dataset, and then sums up the number of missing values in each column using the sum() function. The resulting series missing_values contains the number of missing values in each column.

The last line of the code missing_values[missing_values > 0] filters the missing_values series to only show columns that have missing values (i.e., columns where the number of missing values is greater than 0). This is useful for identifying which columns in the dataset have missing values that need to be handled.

If there are no missing values in the dataset, the output of the last line of the code will be an empty series. If there are missing values, the output will be a series showing the number of missing values in each column with missing values.

In [None]:
#Checking for missing values in the dataset
missing_values = food_data.isnull().sum()
missing_values[missing_values > 0]

***

# Visualization-1

This code creates a box plot to visualize the distribution of energy content (in kcal) across different food groups in the food_data dataset. The boxplot() function from seaborn is used to create the box plot, with Energy_kcal as the x-axis variable and FoodGroup as the y-axis variable.

The figsize() function from matplotlib is used to set the size of the plot to 14 inches wide and 8 inches tall. The title(), xlabel(), and ylabel() functions are used to set the title and labels of the plot.

The resulting plot shows the distribution of energy content (in kcal) across different food groups, with the median value represented by the line in the middle of each box, the box representing the interquartile range (IQR), and the whiskers representing the range of the data excluding outliers. Outliers are represented by individual points beyond the whiskers.

This visualization can help identify which food groups have higher or lower energy content on average, as well as the range and distribution of energy content within each food group.

In [None]:
#Visualization of the distribution of energy content in kcal across different food groups
plt.figure(figsize=(14, 8))
sns.boxplot(x='Energy_kcal', y='FoodGroup', data=food_data)
plt.title('Distribution of Energy Content (kcal) Across Food Groups')
plt.xlabel('Energy (kcal)')
plt.ylabel('Food Group')
plt.show()

***

# Visualization-2

The comment describes how a stacked bar plot is created to show the average macronutrient composition (protein, fat, carbohydrates) of different food groups using the food_data dataset.   
It explains the use of groupby() in pandas to calculate mean values of macronutrients by food group, and how the plot() function, along with parameters like kind='bar' and stacked=True, is used to generate the plot.   
Additional details include plot customization like size, color, and label adjustments to enhance readability and aesthetics. The result is a visual representation that allows easy comparison of macronutrient distribution across food groups.

In [None]:
#Visualization of the average macronutrient composition (protein, fat, carbohydrates) for each food group
food_groups_macronutrients = food_data.groupby('FoodGroup')[['Protein_g', 'Fat_g', 'Carb_g']].mean()
#Creating a stacked bar plot to display the macronutrient composition
food_groups_macronutrients.plot(kind='bar', stacked=True, figsize=(14, 8), color=['#1f77b4', '#ff7f0e', '#2ca02c'])
plt.title('Average Macronutrient Composition in Different Food Groups')
plt.xlabel('Food Group')
plt.ylabel('Average Macronutrient Content (g)')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Macronutrients')
plt.tight_layout()
plt.show()

***

# Visualization-3

Bar plot that visualizes the average micronutrient content (VitA, VitC, Calcium, Iron) for each food group using the food_data dataset.   
It details the use of groupby() in pandas to compute mean values of VitA_mcg, VitC_mg, Calcium_mg, and Iron_mg for each food group.   
The plot() function is then utilized to generate the bar plot with these nutrients as the y-axis variables and FoodGroup as the x-axis variable.   
It mentions setting the plot size with figsize(), using a 'viridis' color map with the colormap parameter, and enhancing readability through customized labels, title, legend, and x-axis label rotation.   
Tight_layout() is used for optimal plot spacing.   
The resulting visualization displays the average micronutrient content per food group, which helps in comparing and understanding the distribution of these micronutrients across different food groups.

In [None]:
#Visualization of the average micronutrient content (VitA, VitC, Calcium, Iron) for each food group
micronutrients_columns = ['VitA_mcg', 'VitC_mg', 'Calcium_mg', 'Iron_mg']
food_groups_micronutrients = food_data.groupby('FoodGroup')[micronutrients_columns].mean()
#Creating a bar plot for the average micronutrient content
food_groups_micronutrients.plot(kind='bar', figsize=(14, 8), colormap='viridis')
plt.title('Average Micronutrient Content in Different Food Groups')
plt.xlabel('Food Group')
plt.ylabel('Average Micronutrient Content')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Micronutrients')
plt.tight_layout()
plt.show()

***

# Visualization-4

The following summarizes how a box plot is created to visualize the distribution of sugar content across various food categories using the food_data dataset. The boxplot() function from seaborn is employed to create the plot, with "Sugar_g" as the x-axis variable and "category" as the y-axis variable.   
The plot's size is set using matplotlib's figure() function, and plot labels and title are configured with the title(), xlabel(), and ylabel() functions. The box plot visually represents the interquartile range (IQR), median, data range excluding outliers (via whiskers), and outliers themselves as individual points beyond the whiskers.   
This visualization is useful for assessing the average sugar content, its range, and distribution within each food category, helping to identify categories with higher or lower sugar levels.

In [None]:
#Visualization of the distribution of sugar content across different food categories
plt.figure(figsize=(14, 8))
sns.boxplot(x='Sugar_g', y='category', data=food_data)
plt.title('Distribution of Sugar Content Across Food Categories')
plt.xlabel('Sugar Content (g)')
plt.ylabel('Category')
plt.show()

***

***

***

***