# Introduction

The following hypothesis tests are intended to test and verify some of the observations and comments about the diets that were generated in the [Exploratory Data Analysis](./ExploratoryDataAnalysis.ipynb). Where the main question of the work is answered whether there is a distinction or difference in the nutritional contributions of the different diets.

# 0. Setting Dataset, Functions and Other Code

## 0.1 Import Libraries

In [122]:
# Importing required libraries

import pandas as pd
import numpy as np

from scipy import stats
import statsmodels.stats.weightstats as sm

import matplotlib.pyplot as plt
import seaborn as sns

In [141]:
# Importing functions for Hypothesis Testing

from FunctionsHypothesisTesting import *

## 0.2 Loading of Dataset

In [2]:
# Loading cleaned dataset

Diets_Dataset = pd.read_csv('../Datasets/Diets_Dataset_Cleaned.csv')

## 0.3 Setting Universal Variables

In [33]:
# Naming main variables

Macronutrients = ['Carbs(g)','Protein(g)','Fat(g)']
TotalMacronutrients = 'Total_Macronutrients'
Diets = ['dash', 'keto', 'mediterranean', 'paleo', 'vegan']

In [34]:
# Setting random state

RANDOM_STATE = 8013

# 1. [DASH] Is there a balance in daily macronutrient intake?

For determining whether there is a balance in the daily intake of macronutrients, it is sufficient to look at the behavior of nutritional intake at the time of consuming five recipes, equivalent to five meals a day.

For achieving the above, eating five meals is simulated as randomly sampling five recipes and calculating the nutritional contributions provided to the total of each macronutrient. This random sampling process is repeated 200 times to generate a sample equivalent to following this diet using the recipes in the data set for 200 days.

For testing if there is a balance in the daily macronutrient intake, it is to test if there is no significant difference between the nutritional intake and the nutritional balance, that is, consuming the same proportions of macronutrients.

The following set of hypotheses is used to test the balance sheet:

$$H_0 : \overline{\text{M}} = 1/3$$
$$H_1 : \overline{\text{M}} \ne 1/3$$

Where $\overline{\text{M}}$ refers to the sample average of the contributions of some macronutrient. Because the sample size is $n = 200$, the Central Limit Theorem (CLT) is verified, therefore using the Z Test will give reliable results for the rejection or acceptance of the null hypothesis $H_0$; for improving the quality of the test, the normality of the data is tested using the Kolmogorov-Smirnov Test with two tails and a significance of $5%$. In the case of the Z Test, it is performed with two tails and a significance of $5%$.

Since the mean daily macronutrient intake differs significantly from the balance (with a daily intake of $1/3$) with a confidence level of $95\%$, it follows that the DASH diet could not be classified as a balanced diet with respect to macronutrient intake. As it favors the intake of certain food groups that impact its macronutrient intake, therefore, by how it is defined, this diet follows some pattern on the food groups that are consumed favoring certain groups in favor of cardiovascular health.

## 1.1 Simulation of 200 Days following the DASH Diet

In [None]:
# Getting sample of recipes

sample_size_dash = 1000 # 5 recipes times 200 days

sample_recipes_dash = Diets_Dataset.query("Diet_type == 'dash'").sample(n=sample_size_dash,random_state=RANDOM_STATE)
sample_recipes_dash.reset_index(inplace=True,drop=True)

sample_recipes_dash

Unnamed: 0,Diet_type,Recipe_name,Cuisine_type,Protein(g),Carbs(g),Fat(g),Total_Macronutrients
0,dash,Apple and cabbage salad with apple molasses dr...,middle eastern,0.111425,0.465508,0.423067,467.94
1,dash,Yogurt Pops,american,0.177773,0.667385,0.154841,51.02
2,dash,Caesar Bread Salad Or Panzanella Giulio,american,0.169658,0.520477,0.309865,583.35
3,dash,Cashew Chicken Bake,south american,0.342140,0.397390,0.260470,436.02
4,dash,Crab and Avocado Mimosa recipes,british,0.273198,0.230713,0.496089,204.54
...,...,...,...,...,...,...,...
995,dash,Olive Crema,mediterranean,0.068931,0.455211,0.475857,30.03
996,dash,Bourbon Old Fashioned Recipe,british,0.000000,1.000000,0.000000,10.42
997,dash,Apple Cinnamon Turnovers,middle eastern,0.067926,0.756360,0.175714,205.96
998,dash,Nutted Chicken-Rice Noodle Casserole,asian,0.225585,0.429676,0.344738,549.46


In [42]:
# Getting daily intake macronutrients

total_macronutrients_sample_dash = sample_recipes_dash[TotalMacronutrients].to_numpy()[:,None]
macronutriens_sample_recipes_dash = sample_recipes_dash[Macronutrients]*total_macronutrients_sample_dash
daily_macronutrients_dash = macronutriens_sample_recipes_dash.groupby(macronutriens_sample_recipes_dash.index//5).sum()

daily_macronutrients_dash

Unnamed: 0,Carbs(g),Protein(g),Fat(g)
0,775.96,365.24,601.67
1,560.89,260.33,392.56
2,1023.40,213.60,225.97
3,1261.33,576.28,1137.50
4,460.30,325.97,238.28
...,...,...,...
195,892.45,301.45,461.31
196,778.50,346.90,264.71
197,528.89,307.86,274.54
198,356.44,325.35,257.69


In [46]:
# Getting proportional macronutrients in each day

total_daily_macronutrients_dash = daily_macronutrients_dash.sum(axis=1).to_numpy()[:,None]

daily_macronutrients_dash /= total_daily_macronutrients_dash

## 1.2 Applying the Hypothesis Test

In [None]:
# Testing significance difference

pvalues_z_test = DashBalanceMacronutrients(daily_macronutrients_dash,Macronutrients)

Carbs does follow a normal distribution
Carbs is not balanced

Protein does follow a normal distribution
Protein is not balanced

Fat does follow a normal distribution
Fat is not balanced



# [Keto] Do the recipes have the same macronutrient composition?

# [Mediterranean] Do recipes from other regions differ from those in the Mediterranean?

# [Paleo] Are the nutritional contributions of the recipes different for each region?

# [Vegan] Are the macronutrient contributions the same among the different recipes?

# Is there a significant difference in the nutritional contributions of the different diets?