# Obesity and health issues: how the food we eat influences health conditions in society
### Erick Alcalá (A01025213@itesm.mx), Emiliano Cabrera (A01025453@itesm.mx), Andrew Dunkerley (e-mail), Do Hyun Nam (A01025276@itesm.mx)
Professor: Saúl Juárez Ordóñez
Tecnologico de Monterrey, School of Engineering and Sciences, Sciences Department
Av. Carlos Lazo 100, Santa Fe, 01389, Mexico City, Mexico

## Abstract
[Abstract chingon]

Keywords: `Linear Regression` `Health` `Nutrition` `Calories`



## 1. Introduction
[Insert intro chingon]


## 2. Theoretical Framework

Mexico is facing multiple epidemics at the same time, the one that arrived this year, COVID-19, and two that have been affecting the country for a lot of years, obesity and overweight. According to the ENSANUT, 72.7% of female adults are overweight or obese, while in male adults the percentage is 69.4%. The COVID-19 pandemic has put this situation in the spotlight once again, since according to the General Epidemiologist Directorate, of the 105,459 COVID-19 deaths, 33.77% are obesity related, 45.54% and 38.57% are related to diabetes and hypertension, which are direct consequences of obesity and overweightness. Therefore, the government is now, more than ever, searching for ways to tackle this problem.


### 2.1. Public policies
Public policies, at the federal and state level, are being analyzed and voted upon in the different levels of government in the country. Two of them stand out, the first one being the new nutritional labeling in the pre-packed products, this new labeling included octagonal stickers with a warning of  “High in sodium”, “High in sugar”, among others. Despite the fact that this change was approved by the chamber of deputies last year, it was enforced on the 1st of October of this year. The policy had a lot of critics, saying that yet again the government was ignoring the main fact of why obesity and overweight are so rampant in Mexico, and that is poverty. (Aguirre, 2019)


The Second public policy that stands out, was one approved in the southern state of Oaxaca. The new law prohibits the distribution, sale, supply or gift of junk food and sugary drinks to underage kids. This law was debated for one year, but the current pandemic and its effects on public health were the perfect final argument to approve the law. The law was praised by the UN and UNICEF, but it was criticized by business organizations, alleging it would be a huge blow for the economy of the state. (Galván, 2020) The government is hurting the small and big business in a moment where they need all the help they can get, even more in a state where poverty is rampant and small businesses are family owned and more at risk of closing. Is it okay to affect hundreds of families and businesses to try to tackle obesity and overweight in children? That would depend on everyone’s ethics and morals.

### 2.2. Technological tools
Technology is something important that the government is forgetting to use. We live in a new era, an era of technological advancements. The tools to create new codes to help the rampant obesity and overweight problem are within reach, codes to provide new information for the consumers about nutrition facts of what they are consuming, to predict their caloric load to help the consumer make the right choice of what to eat. That is precisely what the code in this project makes. 

## 3. Analysis and results

Using Python's Numpy, Pandas, and Scikit modules we were able to analyse a nutritional facts database from various food items. Matplotlib and Seaborn modules were used to visualize our data. Then using a linear regression prediction model, we could make a revision on whether our model was designed correctly and identify biases and flaws from the testing.

In [None]:
# Importing the necessary modules for the analysis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
import sys 
import pandas_profiling
sns.set()
from sympy import latex

In [None]:
# Creating dataframe from the data parsed from the database in a .csv format
nut = pd.read_csv('nutrition.csv', index_col=1) # Dataframe named nut as an abbreviation of nutrition
print(f'Dataframe of size {nut.shape[0]} and {nut.shape[1]}')
#nut.profile_report() # Returns a summary of the dataframe's characteristics, it appears commented due to it being slow

# Usinf the dropna method, we clean data with undefined values
newnut = nut.dropna()
# Filter dataframe leaving just the columns of interest for this research
newnut2 = newnut[['calories','total_fat','carbohydrate','sodium','protein']]

# Converting all data to a unified unit, being the gram (g)
newnut2['sodium'] = newnut2['sodium'].str.rstrip('mg').astype('float')
newnut2['total_fat'] = newnut2['total_fat'].str.rstrip('g').astype('float')
newnut2['carbohydrate'] = newnut2['carbohydrate'].str.rstrip('g').astype('float')
newnut2['protein'] = newnut2['protein'].str.rstrip('g').astype('float')
newnut2['sodium'] = newnut2['sodium']/1000
# Newnut2 corresponds to the filtered and cleaned nut dataframe

# Creating test samples of four different kinds of food
nut_nuts = newnut2.loc[['Nuts, pecans']]
nut_ramen = newnut2.loc[['Soup, dry, beef flavor, ramen noodle']]
nut_teff = newnut2.loc[['Teff, uncooked']]
nut_sherbet = newnut2.loc[['Sherbet, orange']]
newnut3 = newnut2.loc[['Nuts, pecans','Soup, dry, beef flavor, ramen noodle','Teff, uncooked','Sherbet, orange']]


In [None]:
#%% Plotting the histogram relevant to each factor of analysis
fig, axes = plt.subplots(3, 2, figsize=(20, 20), sharex=False)

fig.suptitle('Plot of 4 factors', y=0.9, fontsize=20)

sns.histplot(newnut2["calories"], color="skyblue", stat='count', bins=100, ax=axes[0, 0])
sns.histplot(newnut2["total_fat"], color="olive", stat='count', bins=100, ax=axes[0, 1])
sns.histplot(newnut2["carbohydrate"], color="gold", stat='count', bins=100, ax=axes[1, 0])
sns.histplot(newnut2["sodium"], color="teal", stat='count', bins=100,ax=axes[1, 1])
sns.histplot(newnut2["protein"], color="blue", stat='count', bins=100,ax=axes[2, 0])
fig.delaxes(axes[2,1])

In [None]:
# Plotting a boxplot that helps with the visualization of central tendencies
f, axes = plt.subplots(nrows=3, ncols=2, figsize=(26,18))

f.suptitle('Boxplots of Fat, Sodium, Protein, Carbohydrates and Calories', y=0.9, fontsize=20)
sns.boxplot(x=newnut2['calories'], color="skyblue", ax=axes[0, 0])
sns.boxplot(x=newnut2['total_fat'], color="olive", ax=axes[0, 1])
sns.boxplot(x=newnut2['carbohydrate'], color="gold", ax=axes[1, 0])
sns.boxplot(x=newnut2['sodium'], color="teal", ax=axes[1, 1])
sns.boxplot(x=newnut2['protein'], color="blue", ax=axes[2, 0])

f.delaxes(axes[2,1])

In [None]:

# Filters outliers
Q1 = newnut2.quantile(0.25)
Q3 = newnut2.quantile(0.75)
IQR = Q3 - Q1

# Finalnut dataframe stores the dataframe with removed outliers
finalnut = newnut2[~((newnut2 < (Q1 - 2.5 * IQR)) |(10000 > (Q3 + 2.5 * IQR))).any(axis=1)]

# Calsulates number of data points reduced
len_after = len(finalnut)
len_before = len(newnut2)
len_difference = len(newnut2) - len(finalnut)
print('We reduced our data size from {} foods by {} foods to {} foods.'.format(len_before, len_difference, len_after))


In [None]:
#%% Linear fit
newnut3 = newnut2

from sklearn.linear_model import LinearRegression

X = newnut3[['total_fat', 'carbohydrate', 'sodium', 'protein']] 
Y = newnut3['calories']

# Using the Scikit.learn module, we obtain a linear regression model
calories_prediction = LinearRegression()
calories_prediction.fit(X, Y)

print('Intercept: \n', regressionname.intercept_)
print('Coefficients: \n', regressionname.coef_)

### Testing with four different foods eating in our daily life.

In [None]:
fat = 3.9
sod =0.42
prot = 4.6
carbo = 24
print ('Number of calories: \n', regressionname.predict([[fat, sod, prot, carbo]]))