# Introduction

The following hypothesis tests are intended to test and verify some of the observations and comments about the diets that were generated in the [Exploratory Data Analysis](./ExploratoryDataAnalysis.ipynb). Where the main question of the work is answered whether there is a distinction or difference in the nutritional contributions of the different diets. In order to solve and test the different hypotheses, use is made of the data set obtained after transforming the macronutrients in [EDA: Transformation of Macronutrient Values](./ExploratoryDataAnalysis.ipynb).

# 0. Setting Dataset, Functions and Other Code

## 0.1 Import Libraries

In [1]:
# Importing required libraries

import pandas as pd
import numpy as np

from scipy import stats
import statsmodels.stats.weightstats as sm

import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Importing functions for Hypothesis Testing

from FunctionsHypothesisTesting import *

## 0.2 Loading of Dataset

In [3]:
# Loading cleaned dataset

Diets_Dataset = pd.read_csv('../Datasets/Diets_Dataset_Processed.csv')

## 0.3 Setting Universal Variables

In [4]:
# Naming main variables

Macronutrients = ['Carbs(g)','Protein(g)','Fat(g)']
TotalMacronutrients = 'Total_Macronutrients'
Diets = ['dash', 'keto', 'mediterranean', 'paleo', 'vegan']

In [5]:
# Setting random state

RANDOM_STATE = 8013

# 1. [DASH] Is there a balance in daily macronutrient intake?

For determining whether there is a balance in the daily intake of macronutrients, it is sufficient to look at the behavior of nutritional intake at the time of consuming five recipes, equivalent to five meals a day.

For achieving the above, eating five meals is simulated as randomly sampling five recipes and calculating the nutritional contributions provided to the total of each macronutrient. This random sampling process is repeated 200 times to generate a sample equivalent to following this diet using the recipes in the data set for 200 days.

For testing if there is a balance in the daily macronutrient intake, it is to test if there is no significant difference between the nutritional intake and the nutritional balance, that is, consuming the same proportions of macronutrients.

The following set of hypotheses is used to test the balance of macronutrients:

$$H_0 : \overline{\text{M}} = 1/3$$
$$H_1 : \overline{\text{M}} \ne 1/3$$

Where $\overline{\text{M}}$ refers to the sample mean of the contributions of some macronutrient. Because the sample size is $n = 200$, the Central Limit Theorem (CLT) is verified, therefore using the Z Test will give reliable results for the rejection or acceptance of the null hypothesis $H_0$; for improving the quality of the test, the normality of the data is tested using the Kolmogorov-Smirnov Test with two tails and a significance of $5%$. In the case of the Z Test, it is performed with two tails and a significance of $5%$.

Since the mean daily macronutrient intake differs significantly from the balance (with a daily intake of $1/3$) with a confidence level of $95\%$, it follows that the DASH diet could not be classified as a balanced diet with respect to macronutrient intake. As it favors the intake of certain food groups that impact its macronutrient intake, therefore, by how it is defined, this diet follows some pattern on the food groups that are consumed favoring certain groups in favor of cardiovascular health.

## 1.1 Simulation of 200 Days following the DASH Diet

In [6]:
# Getting sample of recipes

sample_size_dash = 1000 # 5 recipes times 200 days

sample_recipes_dash = Diets_Dataset.query("Diet_type == 'dash'").sample(n=sample_size_dash,random_state=RANDOM_STATE)
sample_recipes_dash.reset_index(inplace=True,drop=True)

sample_recipes_dash

Unnamed: 0,Diet_type,Recipe_name,Cuisine_type,Protein(g),Carbs(g),Fat(g),Total_Macronutrients
0,dash,Old Fashioned,world,0.012245,0.985714,0.002041,9.80
1,dash,Southwestern Breakfast Tostadas,american,0.295270,0.470753,0.233977,100.01
2,dash,Best-Ever Guacamole,mexican,0.082764,0.397757,0.519479,171.21
3,dash,La Paloma,world,0.059126,0.922879,0.017995,3.89
4,dash,Melty Chocolate-Stuffed French Toast with Lemo...,french,0.168321,0.673972,0.157707,626.48
...,...,...,...,...,...,...,...
995,dash,East India Cocktail,american,0.007240,0.990950,0.001810,11.05
996,dash,Kale and Feta Burrito,mediterranean,0.198113,0.682939,0.118949,270.20
997,dash,Cranberry-Whiskey Sour Slush,world,0.007384,0.989938,0.002678,261.38
998,dash,Tiny Meat Cakes (Bolos de Carne),mexican,0.538876,0.143764,0.317360,187.39


In [7]:
# Getting daily intake macronutrients

total_macronutrients_sample_dash = sample_recipes_dash[TotalMacronutrients].to_numpy()[:,None]
macronutriens_sample_recipes_dash = sample_recipes_dash[Macronutrients]*total_macronutrients_sample_dash
daily_macronutrients_dash = macronutriens_sample_recipes_dash.groupby(macronutriens_sample_recipes_dash.index//5).sum()

daily_macronutrients_dash

Unnamed: 0,Carbs(g),Protein(g),Fat(g)
0,550.66,149.50,211.23
1,606.00,150.59,482.76
2,850.35,644.08,727.49
3,346.88,409.59,307.41
4,261.65,40.63,88.99
...,...,...,...
195,679.70,100.53,367.02
196,1181.21,385.90,258.08
197,575.96,57.29,40.45
198,511.62,412.34,923.98


In [8]:
# Getting proportional macronutrients in each day

total_daily_macronutrients_dash = daily_macronutrients_dash.sum(axis=1).to_numpy()[:,None]

daily_macronutrients_dash /= total_daily_macronutrients_dash

## 1.2 Applying the Hypothesis Test

In [9]:
# Testing significance difference

pvalues_z_test = DashBalanceMacronutrients(daily_macronutrients_dash)

Carbs does follow a normal distribution
Carbs is not balanced

Protein does follow a normal distribution
Protein is not balanced

Fat does follow a normal distribution
Fat is not balanced



# 2. [Keto] Do the recipes have the same macronutrient composition?

Assuming that if the recipes tend to have similar macronutrient compositions then they will have the same variability in the values that their macronutrients take (because the means add up to $1$ and by fixing one macronutrient the other two will have equal size ranges of values shifted according to the mean). The latter is equal to having equal variances.

Therefore, proving that the recipes have similar compositions requires only that their variances do not differ significantly when comparing the three at the same time. In other words, homoscedasticity in the variances of the three macronutrients must be verified.

The following set of hypotheses is used to test similarity in compositions:

$$H_0 : s^2_C = s^2_P = s^2_F$$
$$H_1 : s^2_C \ne s^2_P \ne s^2_F$$

Where $s^2_M$ represents the variance in macronutrient $M$. Since the distributions are non-normal, the suggestion for testing the homoscedasticity of three or more treatments is to use Levene's Test [[1]](#references) which is robust to non-normal distributions. Levene's test is performed with a significance of $5% and, by definition, with a right tail.

Since the variances of the three macronutrients are significantly different at the $95% confidence level, it follows that keto diet recipes do not have or do not follow the same compositions, as defined. Therefore, the recipes have a variability in the compositions, that is, recipes with unique nutritional contributions or sufficiently different from each other can be found.

The homoscedasticity test for each macronutrient pair shows that there is a heteroscedasticity in the variances with a confidence level of $95%, which represents a stronger tendency for the nutritional contributions not to be repeated. That is, there are recipes that are composed of different products and foods and that provide different macronutrients.

## 2.1 Getting of Compositions by Macronutrient

In [10]:
# Getting data for the testing

recipes_keto = Diets_Dataset.query("Diet_type == 'keto'")
macronutrients_values_keto = [recipes_keto[macronutrient] for macronutrient in Macronutrients]

macronutrients_values_keto

[2796    0.202812
 2797    0.082788
 2798    0.103195
 2799    0.507561
 2800    0.415284
           ...   
 4303    0.199920
 4304    0.017468
 4305    0.031938
 4306    0.236214
 4307    0.166648
 Name: Carbs(g), Length: 1512, dtype: float64,
 2796    0.125048
 2797    0.474841
 2798    0.120306
 2799    0.130021
 2800    0.253493
           ...   
 4303    0.134506
 4304    0.406934
 4305    0.681742
 4306    0.372316
 4307    0.064661
 Name: Protein(g), Length: 1512, dtype: float64,
 2796    0.672141
 2797    0.442371
 2798    0.776499
 2799    0.362418
 2800    0.331224
           ...   
 4303    0.665574
 4304    0.575598
 4305    0.286320
 4306    0.391470
 4307    0.768690
 Name: Fat(g), Length: 1512, dtype: float64]

## 2.2 Applying the Hypothesis Test

In [11]:
# Testing homocedasticity

homocedasticity_result = KetoHomocedasticity(macronutrients_values_keto)

There is not same composition


In [12]:
# Testing homocedasticity in pairs

for index_macronutrien in range(3):
    index_left , index_right = (index_macronutrien-1)%3 , (index_macronutrien+1)%3
    print(f'{Macronutrients[index_left][:-3]} & {Macronutrients[index_right][:-3]}')
    KetoHomocedasticity([macronutrients_values_keto[index_left],macronutrients_values_keto[index_right]])
    print()

Fat & Protein
There is not same composition

Carbs & Fat
There is not same composition

Protein & Carbs
There is not same composition



# [Mediterranean] Do recipes from other regions differ from those in the Mediterranean?

# [Paleo] Are the nutritional contributions of the recipes different for each region?

# [Vegan] Are the macronutrient contributions the same among the different recipes?

# Is there a significant difference in the nutritional contributions of the different diets?

# References

* [1] Levene Test. SciPy. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html#scipy.stats.levene