# <p style="text-align: center;"> Predicting Diabetes and Recommending Food For Farm Workers </p>

**Abstract:**

This project aims to decrease the number of pre-diabetics/diabetics in the population of farmer workers. As a result, we decided to work on an AI application that can help estimate prediabetic based on data provided by Dr. Angelos Sikalidis. In this project, we also built a recommendation system, using an AI, to recommend food based on the measurement of vitamin levels given to the recommender system. We used different libraries such as NumPy, pandas, scikit-learn, and Keras library in this project. 

In [1]:
#import required libraries
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
import numpy as np

Let's load in the CSV. This file is the mock data of different measurements that will be taken of individual farm workers.
The measurements are:
- BMI
- Weight (Kg)
- Waist Circumfrence (cm)
- Systolic BP (mmHg)
- Diastolic BP (mmHg)
- HDL-c (mg/dl)
- LDL-c (mg/dl)
- Total Cholesterol (mg/dL)
- Atherogenicity index
- TAG (mg/dL)
- FPG (mg/dL)
- HbA1c (%)
- CRP (mg/L)
- Pulse Oximetry (%)
- Servings Grain (oz.)
- Servings Fruit (2c.)
- Servings Veg (2.5c.)
- Servings Pro (5.5oz.)
- Servings Dairy (1c.)
- Total Caloric Intake (kcal)
- % Energy CHO
- % Energy Pro
- % Energy Fat
- Vitamin D (mcg)
- Vitamin E (mg)
- Vitamin A (mcg)
- Vitamin K (mcg)
- Vitamin B12 (mcg)
- Folate (mcg)
- Category [Malnourished, Normal, Borderline Risk, Overweight At Risk, High Risk]
- Diabetic [Non-diabetic, pre-diabetic, diabetic]

First thing we want to do is classify each individuals into certain health categories. These health categories include:
- **Malnourished: suffering from malnutrition.**
- **Normal: conforming to a standard health.**
- **Borderline Risk: close to bad health.**
- **Overweight At Risk: above a weight considered normal or desirable.**
- **High Risk: very dangerous health**

In [2]:
df = pd.read_csv('mock_data.csv')
properties = list(df.columns.values)
properties.remove('Category')

y = df['Category']
c = [p for p in range(len(y))]
    
m_data = pd.DataFrame(np.zeros((len(y),len(properties))),columns=properties,index=c)

for i in range(df.shape[0]):
    for p in properties:
        data = df.iloc[i][p]
        if type(data) == str:
            print("We got a string, should not happen")
            data = np.float64(data.replace(',',''))
            if np.isnan(data):
                #outcome = df.iloc[i]['Category']
                cleanData(m_data,p,outcome,i)
            else:
                m_data.iloc[i][p] = data
                print("-Added to m_data")
        elif np.isnan(data):
            #outcome = df.iloc[i]['Category']
            pass
        else:
            m_data.iloc[i][p] = data

X = m_data[properties]
X.head

<bound method NDFrame.head of       BMI  Weight (Kg)  Waist Circumfrence (cm)  Systolic BP (mmHg)  \
0    17.0         50.0                     70.0                55.0   
1    18.0         54.0                     73.0                60.0   
2    16.0         56.0                     72.0                59.0   
3    17.0         57.0                     71.0                57.0   
4    18.0         53.0                     70.0                59.0   
..    ...          ...                      ...                 ...   
114  50.0        106.0                    105.0               182.0   
115  47.0        119.0                    106.0               181.0   
116  45.0        107.0                    107.0              1184.0   
117  47.0        118.0                    109.0               185.0   
118  50.0        120.0                    110.0               184.0   

     Diastolic BP (mmHg)  HDL-c (mg/dl)  LDL-c (mg/dl)  \
0                   66.0           27.0           55.0   
1

In [22]:
# Once we load in the file, lets save 20% of our data for testing.
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=0)
y_train = keras.utils.to_categorical(y_train) #turning targets into categorical vectors training
y_test = keras.utils.to_categorical(y_test) #turning targets into categorical vectors testing
num_classes = len(y_test[0])

feature_vector_length = len(properties)
input_shape = (feature_vector_length,)

In [49]:
# Configure the model
c_model = keras.Sequential()
c_model.add(keras.layers.Dense(32,input_shape=input_shape,activation=tf.nn.relu,kernel_initializer='he_uniform'))
c_model.add(keras.layers.Dense(32,activation=tf.nn.relu,kernel_initializer='he_uniform'))
c_model.add(keras.layers.Dense(num_classes,activation=tf.nn.softmax))

# Compile the model
c_model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
# Train the model
c_model.fit(X_train,y_train,epochs=64,verbose=1,validation_split=0.2)

Epoch 1/64
Epoch 2/64
Epoch 3/64
Epoch 4/64
Epoch 5/64
Epoch 6/64
Epoch 7/64
Epoch 8/64
Epoch 9/64
Epoch 10/64
Epoch 11/64
Epoch 12/64
Epoch 13/64
Epoch 14/64
Epoch 15/64
Epoch 16/64
Epoch 17/64
Epoch 18/64
Epoch 19/64
Epoch 20/64
Epoch 21/64
Epoch 22/64
Epoch 23/64
Epoch 24/64
Epoch 25/64
Epoch 26/64
Epoch 27/64
Epoch 28/64
Epoch 29/64
Epoch 30/64
Epoch 31/64
Epoch 32/64
Epoch 33/64
Epoch 34/64
Epoch 35/64
Epoch 36/64
Epoch 37/64
Epoch 38/64
Epoch 39/64
Epoch 40/64
Epoch 41/64
Epoch 42/64
Epoch 43/64
Epoch 44/64
Epoch 45/64
Epoch 46/64
Epoch 47/64
Epoch 48/64
Epoch 49/64
Epoch 50/64
Epoch 51/64
Epoch 52/64
Epoch 53/64
Epoch 54/64
Epoch 55/64
Epoch 56/64
Epoch 57/64
Epoch 58/64
Epoch 59/64


Epoch 60/64
Epoch 61/64
Epoch 62/64
Epoch 63/64
Epoch 64/64


<tensorflow.python.keras.callbacks.History at 0x13cdf5ca0>

In [50]:
test_results = c_model.evaluate(X_test,y_test,verbose=1)
print(f'Test results - Loss: {test_results[0]} - Accuracy: {test_results[1]*100}%')

# Save the model if it is 100% accurate
if test_results[1]*100 == 100:
    c_model.save('models')

Test results - Loss: 0.11522891372442245 - Accuracy: 95.83333134651184%


This model that we trained gives us an accuracy of **95%**. Let's see what our model predicts for other data not used in the mock data.

Our farmer has the following:

[BMI,

Weight (Kg),

Waist Circumfrence (cm),

Systolic BP (mmHg),

Diastolic BP (mmHg),

HDL-c (mg/dl),

LDL-c (mg/dl),

Total Cholesterol (mg/dL),

Atherogenicity index,

TAG (mg/dL),

FPG (mg/dL),

HbA1c (%),

CRP (mg/L),

Pulse Oximetry (%),

Servings Grain (oz.),

Servings Fruit (2c.),

Servings Veg (2.5c.),

Servings Pro (5.5oz.),

Servings Dairy (1c.),

Total Caloric Intake (kcal),

% Energy CHO,

% Energy Pro,

% Energy Fat,

Vitamin D (mcg),

Vitamin E (mg),

Vitamin A (mcg),

Vitamin K (mcg),

Vitamin B12 (mcg),

Folate (mcg),

Diabetic]

**Farmer 0: [17,50,70,55,66,27,55,100,0,0,48,3.8,0,84,2,0,0,1,0,1799,40,20,15,25,10,699,75,1.4,390,0,0]**

This person is not diabetic but is **malnourished**. 

**Farmer 1: [24,63,81,65,69,76,75,169,0,109,86,4.2,0.53,102,3,2.5,2.5,2,1.5,1800,52,34,19,55,12,999,124,2,415,1,0]**

This person is not diabetic and has **normal health status**. 

**Farmer 2: [28,85,84,130,84,57,100,192,0.21,200,111,6.3,1.8,90,7,0,0,0,0,2300,59,40,39,0,0,0,0,0,0,2,1]**

This person is not diabetic and has **borderline risk health status**. 

**Farmer 3: [30,92,91,165,99,42,158,208,0.25,398,124,6.5,2.8,89,9,0,0,7,0,2600,62,40,40,0,0,0,0,0,0,3,2]**

This person is diabetic and is **overweight at risk health status**.

**Farmer 4: [45,106,105,182,130,31,176,256,0.38,520,130,9.5,3.8,80,12,0,0,9,0,3400,69,48,51,101,24,3025,147,3.4,640,4,2]**

This person is also diabetic and is **high risk health status**

In [52]:
farmer_00 = [17,50,70,55,66,27,55,100,0,0,48,3.8,0,84,2,0,0,1,0,1799,40,20,15,25,10,699,75,1.4,390,0]
farmer_01 = [24,63,81,65,69,76,75,169,0,109,86,4.2,0.53,102,3,2.5,2.5,2,1.5,1800,52,34,19,55,12,999,124,2,415,0]
farmer_02 = [28,85,84,130,84,57,100,192,0.21,200,111,6.3,1.8,90,7,0,0,0,0,2300,59,40,39,0,0,0,0,0,0,1]
farmer_03 = [30,92,91,165,99,42,158,208,0.25,398,124,6.5,2.8,89,9,0,0,7,0,2600,62,40,40,0,0,0,0,0,0,2]
farmer_04 = [45,106,105,182,130,31,176,256,0.38,520,130,9.5,3.8,80,12,0,0,9,0,3400,69,48,51,101,24,3025,147,3.4,640,2]

t = [farmer_00,farmer_01,farmer_02,farmer_03,farmer_04]
res = c_model.predict(t)

for index,result in enumerate(res):
    print("Farmer ",index)
    print(res[index])
    print("-----------------------------------------------------------------------")

Farmer  0
[1.000000e+00 6.235514e-23 0.000000e+00 0.000000e+00 0.000000e+00]
-----------------------------------------------------------------------
Farmer  1
[1.8886124e-18 9.9891210e-01 0.0000000e+00 0.0000000e+00 1.0879657e-03]
-----------------------------------------------------------------------
Farmer  2
[2.2996549e-13 5.3317379e-23 9.9568546e-01 4.3145404e-03 0.0000000e+00]
-----------------------------------------------------------------------
Farmer  3
[1.7985874e-23 2.2627673e-09 2.2694378e-08 1.0000000e+00 0.0000000e+00]
-----------------------------------------------------------------------
Farmer  4
[0. 0. 0. 0. 1.]
-----------------------------------------------------------------------


The output of our model is an array of probability for each indices 0-4.

Each index is a class for a health category.
- 0: Malnourished
- 1: Normal Health Status
- 2: Boderline Risk Health Status
- 3: Overweight at Risk Health Status
- 4: High Risk Health Status

At each indices, the probabilty tells us how likely it is that the farmer falls into the above category. Looking at famer 0:

Farmer  0: [1.000000e+00 , 6.235514e-23 , 0.000000e+00 , 0.000000e+00 , 0.000000e+00]

At index 0, we have 1.0000000e+00 which is a probability of 100%.

At index 1, we have 6.235514e-23 which is a probability of ~0%.

At index 2, we have 0.000000e+00 which is a probability of 0%.

At index 3, we have 0.0000000e+00 which is a probability of 0%.

At index 4, we have 0.000000e+00 which is a probability of 0%.

The model predicts that the farmer falls into index 0 category which is Malnourished, the correct health status

In [54]:
for i,rr in enumerate(res):
    max_i = 0
    max_r = 0
    print("Farmer ",i)
    if i == 0:
        print("True= Malnourished")
    elif i == 1:
        print("True= Normal")
    elif i == 2:
        print("True= Borderline Risk")
    elif i == 3:
        print("True= Overweight At Risk")
    elif i == 4:
        print("True= High Risk")
    for ri,r in enumerate(rr):
        if r > max_r:
            max_i = ri
            max_r = r
    if max_i == 0:
        print("Pred= Malnourished")
    elif max_i == 1:
        print("Pred= Normal")
    elif max_i == 2:
        print("Pred= Borderline Risk")
    elif max_i == 3:
        print("Pred= Overweight At Risk")
    elif max_i == 4:
        print("Pred= High Risk")
    print("-------------------------")

Farmer  0
True= Malnourished
Pred= Malnourished
-------------------------
Farmer  1
True= Normal
Pred= Normal
-------------------------
Farmer  2
True= Borderline Risk
Pred= Borderline Risk
-------------------------
Farmer  3
True= Overweight At Risk
Pred= Overweight At Risk
-------------------------
Farmer  4
True= High Risk
Pred= High Risk
-------------------------


From the output of our model compared to the true value of our data, for all the tests, the prediction of the model matches our data. However, if we retrain our model many times again, we get very different accuracies ranging from 60% accurate to 100% accuarate. We still cannot say that our model is 95% accurate because our data size is only 119. A much bigger  data will give us a more accutate result. 

Let's now categorize our mock data.

In [56]:
health_status = []

malnourished = []
maln_index = 0
normal = []
norm_index = 1
borderline_risk = []
border_index = 2
overweight_risk = []
overw_index = 3
high_risk = []
high_index = 4

res = c_model.predict(X)

for i,rr in enumerate(res):
    max_i = 0
    max_r = 0
    for ri,r in enumerate(rr):
        if r > max_r:
            max_i = ri
            max_r = r
    if max_i == 0:
        malnourished.append(list(X.loc[i]))
    elif max_i == 1:
        normal.append(list(X.loc[i]))
    elif max_i == 2:
        borderline_risk.append(list(X.loc[i]))
    elif max_i == 3:
        overweight_risk.append(list(X.loc[i]))
    elif max_i == 4:
        high_risk.append(list(X.loc[i]))
        
health_status.append(malnourished)
health_status.append(normal)
health_status.append(borderline_risk)
health_status.append(overweight_risk)
health_status.append(high_risk)

## Prediabetic or Diabetic ##

Now that we can classify each individuals into certain health categories, it would be very useful to know if these individuals are diabetic, pre-diabetic or not diabetic. This way when we recommend food to a person, we can be sure we're not recommending diabetics a certain food they aren't allowed to consume. 

In [57]:
properties = list(df.columns.values)
properties.remove('Diabetic')

y_diabetes = df['Diabetic']

In [58]:
# Splitting test and training sets
Xdiabetes_train, Xdiabetes_test, ydiabetes_train, ydiabetes_test = train_test_split(X,y_diabetes,test_size=0.2,random_state=0)
ydiabetes_train = keras.utils.to_categorical(ydiabetes_train) #turning targets into categorical vectors training
ydiabetes_test = keras.utils.to_categorical(ydiabetes_test) #turning targets into categorical vectors testing
num_classes = len(ydiabetes_test[0])

feature_vector_length = len(properties)
input_shape = (feature_vector_length,)

In [122]:
diabetes_model = keras.Sequential()
diabetes_model.add(keras.layers.Dense(32,input_shape=input_shape,activation=tf.nn.relu,kernel_initializer='he_uniform'))
diabetes_model.add(keras.layers.Dense(32,activation=tf.nn.relu,kernel_initializer='he_uniform'))
diabetes_model.add(keras.layers.Dense(num_classes,activation=tf.nn.softmax))

diabetes_model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
history_diabetes = diabetes_model.fit(Xdiabetes_train,ydiabetes_train,epochs=64,verbose=1,validation_split=0.2)

Epoch 1/64
Epoch 2/64
Epoch 3/64
Epoch 4/64
Epoch 5/64
Epoch 6/64
Epoch 7/64
Epoch 8/64
Epoch 9/64
Epoch 10/64
Epoch 11/64
Epoch 12/64
Epoch 13/64
Epoch 14/64
Epoch 15/64
Epoch 16/64
Epoch 17/64
Epoch 18/64
Epoch 19/64
Epoch 20/64
Epoch 21/64
Epoch 22/64
Epoch 23/64
Epoch 24/64
Epoch 25/64
Epoch 26/64
Epoch 27/64
Epoch 28/64
Epoch 29/64
Epoch 30/64
Epoch 31/64
Epoch 32/64
Epoch 33/64
Epoch 34/64
Epoch 35/64
Epoch 36/64
Epoch 37/64
Epoch 38/64
Epoch 39/64
Epoch 40/64
Epoch 41/64
Epoch 42/64
Epoch 43/64
Epoch 44/64
Epoch 45/64
Epoch 46/64
Epoch 47/64
Epoch 48/64
Epoch 49/64
Epoch 50/64
Epoch 51/64
Epoch 52/64
Epoch 53/64
Epoch 54/64
Epoch 55/64
Epoch 56/64
Epoch 57/64
Epoch 58/64
Epoch 59/64


Epoch 60/64
Epoch 61/64
Epoch 62/64
Epoch 63/64
Epoch 64/64


In [123]:
test_results_diabetes = diabetes_model.evaluate(Xdiabetes_test,ydiabetes_test,verbose=1)
print(f'Test results - Loss: {test_results_diabetes[0]} - Accuracy: {test_results_diabetes[1]*100}%')

# Save the model if it is 100% accurate
if test_results_diabetes[1]*100 == 100:
    diabetes_model.save('models')

Test results - Loss: 0.02044145204126835 - Accuracy: 100.0%
INFO:tensorflow:Assets written to: models/assets


Using the same farmers as before, lets see what our model predicts.

**Farmer 0: [17,50,70,55,66,27,55,100,0,0,48,3.8,0,84,2,0,0,1,0,1799,40,20,15,25,10,699,75,1.4,390,0]**

This person is **not diabetic** but is malnourished

**Farmer 1: [24,63,81,65,69,76,75,169,0,109,86,4.2,0.53,102,3,2.5,2.5,2,1.5,1800,52,34,19,55,12,999,124,2,415,1]**

This person is **not diabetic** and has normal health status.

**Farmer 2: [28,85,84,130,84,57,100,192,0.21,200,111,6.3,1.8,90,7,0,0,0,0,2300,59,40,39,0,0,0,0,0,0,2]**

This person is **not diabetic** and has borderline risk health status

**Farmer 3: [30,92,91,165,99,42,158,208,0.25,398,124,6.5,2.8,89,9,0,0,7,0,2600,62,40,40,0,0,0,0,0,0,3]**

This person is **diabetic** and is overweight at risk health status

**Farmer 4: [45,106,105,182,130,31,176,256,0.38,520,130,9.5,3.8,80,12,0,0,9,0,3400,69,48,51,101,24,3025,147,3.4,640,4]**

This person is also **diabetic** and is high risk health status


In [124]:
farmer_00 = [17,50,70,55,66,27,55,100,0,0,48,3.8,0,84,2,0,0,1,0,1799,40,20,15,25,10,699,75,1.4,390,0]
farmer_01 = [24,63,81,65,69,76,75,169,0,109,86,4.2,0.53,102,3,2.5,2.5,2,1.5,1800,52,34,19,55,12,999,124,2,415,1]
farmer_02 = [28,85,84,130,84,57,100,192,0.21,200,111,6.3,1.8,90,7,0,0,0,0,2300,59,40,39,0,0,0,0,0,0,2]
farmer_03 = [30,92,91,165,99,42,158,208,0.25,398,124,6.5,2.8,89,9,0,0,7,0,2600,62,40,40,0,0,0,0,0,0,3]
farmer_04 = [45,106,105,182,130,31,176,256,0.38,520,130,9.5,3.8,80,12,0,0,9,0,3400,69,48,51,101,24,3025,147,3.4,640,4]

res_diabetic = diabetes_model.predict([farmer_00,farmer_01,farmer_02,farmer_03,farmer_04])
#print(res_diabetic)

for i,result in enumerate(res_diabetic):
    print("Farmer ",i)
    max_r = 0
    max_i = 0
    for ir,r in enumerate(result):
        if r > max_r:
            max_r = r
            max_i = ir
    if i == 0:
        print("True: Not diabetic")
    elif i == 1:
        print("True: Not diabetic")
    elif i == 2:
        print("True: Not diabetic")
    elif i == 3:
        print("True: Diabetic")
    elif i == 4:
        print("True: Diabetic")
    if max_i == 0:
        print("Pred: Not diabetic")
    elif max_i == 1:
        print("Pred: Prediabetic")
    else:
        print("Pred: Diabetic")
    print("------------------")


Farmer  0
True: Not diabetic
Pred: Not diabetic
------------------
Farmer  1
True: Not diabetic
Pred: Not diabetic
------------------
Farmer  2
True: Not diabetic
Pred: Prediabetic
------------------
Farmer  3
True: Diabetic
Pred: Diabetic
------------------
Farmer  4
True: Diabetic
Pred: Diabetic
------------------


Even though our model is 100% accurate, we are still getting wrong predictions. A bigger sample size would help make the model more accurate.

## Recommendation System

**A very basaic CSV file that contains about 7,412 food variety and their nutritional facts**

In [125]:
#import external libraries and functions
import sklearn
from sklearn.neighbors import NearestNeighbors

#load dataset from CSV file and show first 5 records
food_bank = pd.read_csv('food.csv')
   
food_bank.columns = food_bank.columns
food_bank.head()

Unnamed: 0,Category,Description,Nutrient Data Bank Number,Data.Alpha Carotene,Data.Ash,Data.Beta Carotene,Data.Beta Cryptoxanthin,Data.Carbohydrate,Data.Cholesterol,Data.Choline,...,Data.Major Minerals.Potassium,Data.Major Minerals.Sodium,Data.Major Minerals.Zinc,Data.Vitamins.Vitamin A - IU,Data.Vitamins.Vitamin A - RAE,Data.Vitamins.Vitamin B12,Data.Vitamins.Vitamin B6,Data.Vitamins.Vitamin C,Data.Vitamins.Vitamin E,Data.Vitamins.Vitamin K
0,BUTTER,"BUTTER,WITH SALT",1001,0,2.11,158,0,0.06,215,19,...,24,576,0.09,2499,684,0.17,0.003,0.0,2.32,7.0
1,BUTTER,"BUTTER,WHIPPED,WITH SALT",1002,0,2.11,158,0,0.06,219,19,...,26,827,0.05,2499,684,0.13,0.003,0.0,2.32,7.0
2,BUTTER OIL,"BUTTER OIL,ANHYDROUS",1003,0,0.0,193,0,0.0,256,22,...,5,2,0.01,3069,840,0.01,0.001,0.0,2.8,8.6
3,CHEESE,"CHEESE,BLUE",1004,0,5.11,74,0,2.34,75,15,...,256,1395,2.66,763,198,1.22,0.166,0.0,0.25,2.4
4,CHEESE,"CHEESE,BRICK",1005,0,3.18,76,0,2.79,94,15,...,136,560,2.6,1080,292,1.26,0.065,0.0,0.26,2.5


In [126]:
data_key = pd.read_csv('mock_data_key.csv')

print(data_key)

                            MALNOURISHED NORMAL/DESIRABLE BORDERLINE RISK  \
BMI                                 < 19            19-25           26-29   
Weight (Kg)                         < 63            63-68           69-86   
Waist Circumference (cm)            ≤ 73            74-81           82-89   
Systolic BP (mmHg)                  ≤ 60           61-129         130-139   
Diastolic BP (mmHg)                 ≤ 67            68-79           80-89   
HDL-c (mg/dL)                       < 30           60-100           46-59   
LDL-c (mg/dL)                       < 65            65-90          91-130   
Total Cholesterol (mg/dL)          < 110          160-180         181-200   
Atherogenicity index                   X         −0.3-0.1       0.11-0.24   
TAG (mg/dL)                            X            < 150         150-200   
FPG (mg/dL)                         < 75           75-100         100-115   
HbA1c (%)                            < 4            4-5.7         5.7-6.4   

**The data_key tells us where each measurements of our features needs to be at in order to fall under certain health categories.**

Lets try to recommend a food to this person: 

farmer_00 = [17,
            50,
            70,
            55,
            66,
            27,
            55,
            100,
            0,
            0,
            48,
            3.8,
            0,
            84,
            2,
            0,
            0,
            1,
            0,
            1799,
            40,
            20,
            15,
            25,
            **10 Vitamin E,
            699 Vitamin A,
            75 Vitamin K,
            1.4 Vitamin B12,**
            390,
            0]

This famer is categorized as malnourished and has 
- 10 mg of Vitamin E (mg)
- 699 mcg of Vitamin A (mcg)
- 75 mcg of Vitamin K (mcg)
- 1.4 mcg of Vitamin B12 (mcg)

The recommended amount:
- Vitamin E is 12-18 mg
- Vitamin A is 700-1000 mcg
- Vitamin K is 110-140  mcg
- Vitamin B12 is 1.5-3 mcg

We can find the difference of the recommended and actual to get the proper food recommendation.

Need:
- 2 mg of Vitamin E
- 1 mcg of Vitamin A
- 35 mcg of Vitamin K
- 0.1 mcg of Vitamin B12

Since we lack the most on Vitamin K, we need to recommend a food that will help normalize these levels.

In [127]:
farmer_00 = [17,50,70,55,66,27,55,100,0,0,48,3.8,0,84,2,0,0,1,0,1799,40,20,15,25,10,699,75,1.4,390,0]

# Indices to where each farmer's vitamin measures are
v_e = 24
v_a = 25
v_k = 26
v_b12 = 27

# Food bank indices of vitamins we want
f_a_index = 41
f_b12_index = 43
f_e_index = 46
f_k_index = 47

# Recommended Vitamins
rec_vitamins = {}
rec_vitamins["vitamin_e"] = 12
rec_vitamins["vitamin_a"] = 700
rec_vitamins["vitamin_k"] = 110
rec_vitamins["vitamin_b12"] = 1.5

recommendations = []

# Input: A list of vitamins that might be needed
# Outputs: A dictionary of the vitamin name as key and the amount of vitamin needed as value
def within_range(need_list):
    threshold = 5
    vitamins_needed = {}

    vitamins_needed["Vitamin_E"] = 0
    vitamins_needed["Vitamin_A"] = 0
    vitamins_needed["Vitamin_K"] = 0
    vitamins_needed["Vitamin_B12"] = 0
    
    # If a person only need 5 or less units of vitamin, we don't recommend anything
    for ni,n in enumerate(need_list):
        if n > threshold:
            vitamin = ""
            if ni == 0:
                vitamins_needed["Vitamin_E"] = n
            elif ni == 1:
                vitamins_needed["Vitamin_A"] = n
            elif ni == 2:
                vitamins_needed["Vitamin_K"] = n
            elif ni == 3:
                vitamins_needed["Vitamin_B12"] = n
    return vitamins_needed
    
# Input: 
# Output:
def get_food_rec(farmer):
    # Get the farmer's needed vitamins
    need_vitamin_e = rec_vitamins["vitamin_e"] - farmer[v_e]
    need_vitamin_a = rec_vitamins["vitamin_a"] - farmer[v_a]
    need_vitamin_k = rec_vitamins["vitamin_k"] - farmer[v_k]
    need_vitamin_b12 = rec_vitamins["vitamin_b12"] - farmer[v_b12]
    
    need_list = []
    need_list.append(need_vitamin_e)
    need_list.append(need_vitamin_a)
    need_list.append(need_vitamin_k)
    need_list.append(need_vitamin_b12)

    vitamins_needed = within_range(need_list)
    
    X_rec = food_bank.iloc[:,[f_e_index,f_a_index,f_k_index,f_b12_index]].values
    nbrs = NearestNeighbors(n_neighbors=1).fit(X_rec)
    
    foods = {}
    food_quantity = {}
    
    not_in_range = True
    for key in vitamins_needed:
        if vitamins_needed[key] != 0:
            food_list = []
            if key == "Vitamin_E":
                while vitamins_needed[key] > 5:
                    food = food_bank.iloc[nbrs.kneighbors([[vitamins_needed["Vitamin_E"],0,0,0]])[1][0][0]]
                    if food[0] not in food_quantity:
                        food_quantity[food[0]] = 1
                        food_list.append(food)
                    else:
                        food_quantity[food[0]] += 1
                    vitamins_needed[key] -= food[f_e_index]
                foods["Vitamin_E"] = food_list
            elif key == "Vitamin_A":
                while vitamins_needed[key] > 5:
                    food = food_bank.iloc[nbrs.kneighbors([[0,vitamins_needed["Vitamin_A"],0,0]])[1][0][0]]
                    if food[0] not in food_quantity:
                        food_quantity[food[0]] = 1
                        food_list.append(food)
                    else:
                        food_quantity[food[0]] += 1
                    vitamins_needed[key] -= food[f_a_index]
                foods["Vitamin_A"] = food_list
            elif key == "Vitamin_K":
                while vitamins_needed[key] > 5:
                    food = food_bank.iloc[nbrs.kneighbors([[0,0,vitamins_needed["Vitamin_K"],0]])[1][0][0]]
                    if food[0] not in food_quantity:
                        food_quantity[food[0]] = 1
                        food_list.append(food)
                    else:
                        food_quantity[food[0]] += 1
                    vitamins_needed[key] -= food[f_k_index]
                foods["Vitamin_K"] = food_list
            elif key == "Vitamin_B12":
                while vitamins_needed[key] > 5:
                    food = food_bank.iloc[nbrs.kneighbors([[0,0,0,vitamins_needed["Vitamin_B12"]]])[1][0][0]]
                    if food[0] not in food_quantity:
                        food_quantity[food[0]] = 1
                        food_list.append(food)
                    else:
                        food_quantity[food[0]] += 1
                    vitamins_needed[key] -= food[f_b12_index]
                foods["Vitamin_B12"] = food_list
    return foods,food_quantity

def print_rec(foods,food_quantity):
    for k in foods:
        print("For " + k + " we recommend:")
        for food in foods[k]:
            print(food[0] + " x" + str(food_quantity[food[0]]) + " serving(s)")
        print("-------------------------------")
        
foods,food_quantity = get_food_rec(farmer_00)
print_rec(foods,food_quantity)


For Vitamin_K we recommend:
CASHEW NUTS x1 serving(s)
-------------------------------


We got this food: **CASHEW NUTS (Vitamin K)**

For Cashew Nuts, we needed about 35 mcg more.

The information about it on the Food CVS gives us the following:

Data.Vitamins.Vitamin A:                                                         0

Data.Vitamins.Vitamin B12:                                                            0

Data.Vitamins.Vitamin E:                                                           0.92

**Data.Vitamins.Vitamin K:                                                           34.7**


What kind of food are recommended to farmer that is of normal health status?

In [128]:
n_farmer = health_status[norm_index][0]
n_food,food_q = get_food_rec(n_farmer)
print_rec(n_food,food_q)

This farmer is of normal health status and don't lack any vitamins, so we did not recommend any food for them.

## Future Works

Some of the future work that we would have liked to implement are:

- Having a recommendation system for physical activities. Similarly to a food recommendation, we would have an AI that recommends activities, for an individual, based on previously performed activities to stay healthy.

- Using Apple's affordable Apple Watch. Apple came out with an affordable Apple Watch that can be accessible to our targeted users.

- Having a user web and mobile interface that can interact with the application. The user will be able to see all their progress and results.

- Having a chat bot or AI assistant that can communicate with the user.


# Questions?