# Predicting Recommended Daily Calories

This notebook covers the first simple step we can do before we start building a model and use the other dataset with all necessary nutritional information. We can use the data we have to predict the recommended daily calories for each person.

### Importing the libraries


In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

In [10]:
df = pd.read_csv('recommended_nutrition_cleaned.csv')
df.head(5)

Unnamed: 0,Sex,Age,Height,Weight,Activity Level,BMI,Daily Calories,Carbs,Fiber,Protein,Fat,Water,Vitamin C,Vitamin A,Vitamin D,Vitamin E,Vitamin B12,Vitamin K,Niacin,Calcium
0,1,18.0,4.0,88.0,Sedentary,26.9,1166.0,131 - 189 grams\n,38 grams,34 grams,32 - 45 grams\n,3.3 liters (about 14 cups)\n,75 mg,900 mcg,15 mcg,15 mg,2.4 mcg,75 mcg,16 mg,"1,300 mg"
1,1,18.0,4.0,90.0,Sedentary,27.5,1190.0,134 - 193 grams\n,38 grams,35 grams,33 - 46 grams\n,3.3 liters (about 14 cups)\n,75 mg,900 mcg,15 mcg,15 mg,2.4 mcg,75 mcg,16 mg,"1,300 mg"
2,1,18.0,4.0,93.0,Sedentary,28.4,1227.0,138 - 199 grams\n,38 grams,36 grams,34 - 48 grams\n,3.3 liters (about 14 cups)\n,75 mg,900 mcg,15 mcg,15 mg,2.4 mcg,75 mcg,16 mg,"1,300 mg"
3,1,18.0,4.0,95.0,Sedentary,29.0,1251.0,141 - 203 grams\n,38 grams,37 grams,35 - 49 grams\n,3.3 liters (about 14 cups)\n,75 mg,900 mcg,15 mcg,15 mg,2.4 mcg,75 mcg,16 mg,"1,300 mg"
4,1,18.0,4.0,97.0,Sedentary,29.6,1275.0,143 - 207 grams\n,38 grams,37 grams,35 - 50 grams\n,3.3 liters (about 14 cups)\n,75 mg,900 mcg,15 mcg,15 mg,2.4 mcg,75 mcg,16 mg,"1,300 mg"


This is a simple regression problem, and we can use the following features to predict the recommended daily calories:

In [11]:
features = ['Weight', 'Height', 'Age', 'Sex']
target = 'Daily Calories'

X = df[features]
y = df[target]

## Dividing the data into train and test dataset

Next we divide the data into train and test dataset. We will use the train dataset to train the model and test dataset to test the model. We will use 80% of the data for training and 20% for testing.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

The model that we use for this case is Random Forest Regressor. The default parameters are used for the model. We will use the model to predict the recommended daily calories for the test dataset and then compare the predicted values with the actual values. Obviously, because of the limited dataset we can not expect to see any good results. However this notebook will represent as a starting point for the next steps, the model will expect the 4 features (Sex, Age, Weight, Height) and will provide prediction.

In [13]:
model = RandomForestRegressor(random_state=42)

In [14]:
model.fit(X_train, y_train)

In [16]:
predictions = model.predict(X_test)

mae = mean_absolute_error(y_test, predictions)
print(f"Mean Absolute Error: {mae}")

Mean Absolute Error: 17.62399999999999


Here we define the sample data that we will use to predict the recommended daily calories. In this case is hardcoded, but in the final product it will be extracted from the website/application.

In [26]:
new_data = {'Weight': [70], 'Height': [5], 'Age': [18], 'Sex': [1]}  # Example values
new_df = pd.DataFrame(new_data)
new_df

Unnamed: 0,Weight,Height,Age,Sex
0,70,5,18,1


In [27]:
predicted_calories = model.predict(new_df)
print(f"Predicted Daily Calories: {predicted_calories[0]}")

Predicted Daily Calories: 1455.02


So if your weight is 70 lbs (30kg), you have to take around 1500 calories daily. Obviolusly this does not provide any kind of information, however this will be the structure of the future model.

In [24]:
new_data = {'Weight': [200], 'Height': [5], 'Age': [21], 'Sex': [1]}  # Example values
new_df = pd.DataFrame(new_data)
new_df

Unnamed: 0,Weight,Height,Age,Sex
0,200,5,21,1


In [25]:
predicted_calories = model.predict(new_df)
print(f"Predicted Daily Calories: {predicted_calories[0]}")

Predicted Daily Calories: 2047.51
