In this section, we import the necessary libraries for data manipulation, model training, and evaluation


In [453]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score



Loading the Dataset

In [454]:
df=pd.read_csv('Dietary.csv')

Print the shape

In [455]:
shape = df.shape
print("Shape of the DataFrame:", shape)

Shape of the DataFrame: (78, 27)


In [456]:
df.head()


Unnamed: 0,Age,Gender,How many meals do you have a day? (number of regular occasions in a day when a significant and reasonably filling amount of food is eaten),What would best describe your diet:,Choose all that apply: [I skip meals],Choose all that apply: [I experience feelings of hunger during the day],Choose all that apply: [I consult a nutritionist/dietician],Choose all that apply: [I cook my own meals],What would you consider to be the main meal of YOUR day?,What does your diet mostly consist of and how is it prepared?,How many times a week do you order-in or go out to eat?,Are you allergic to any of the following? (Tick all that apply),What is your weekly food intake frequency of the following food categories: [Sweet foods],What is your weekly food intake frequency of the following food categories: [Salty foods],What is your weekly food intake frequency of the following food categories: [Fresh fruit],What is your weekly food intake frequency of the following food categories: [Fresh vegetables],"What is your weekly food intake frequency of the following food categories: [Oily, fried foods]",What is your weekly food intake frequency of the following food categories: [Meat],What is your weekly food intake frequency of the following food categories: [Seafood ],How frequently do you consume these beverages [Tea],How frequently do you consume these beverages [Coffee],How frequently do you consume these beverages [Aerated (Soft) Drinks],How frequently do you consume these beverages [Fruit Juices (Fresh/Packaged)],"How frequently do you consume these beverages [Dairy Beverages (Milk, Milkshakes, Smoothies, Buttermilk, etc)]",How frequently do you consume these beverages [Alcoholic Beverages],"What is your water consumption like (in a day, 1 cup=250ml approx)",Unnamed: 27
0,18-24,Male,5,Pollotarian (Vegetarian who consumes poultry a...,Rarely,Often,Never,Sometimes,Lunch,Freshly home-cooked produce,4,Milk,Less often,Once a day,Less often,Once a day,Less often,Often,Often,Never,Never,Less often,Never,Less often,Never,More than 15 cups,
1,18-24,Male,4,Vegetarian (No egg or meat),Rarely,Often,Rarely,Rarely,Lunch,Freshly home-cooked produce,1,I do not have any allergies,Often,Often,Less often,Often,Often,Never,Never,Less often,Never,Often,Once a day,Often,Never,11-14 cups,
2,45-54,Male,3,Pescatarian (Vegetarian who consumes only seaf...,Never,Rarely,Never,Never,All,Freshly home-cooked produce,3,I do not have any allergies,Once a day,Several times a day,In every meal,In every meal,Less often,Never,Often,Once a day,Less often,Never,Less often,Once a day,Never,More than 15 cups,
3,18-24,Male,2,Non-Vegetarian,Often,Often,Never,Sometimes,Lunch,Freshly home-cooked produce,1,I do not have any allergies,Once a day,Once a day,Several times a day,In every meal,Few times a week,Once a day,Few times a week,Few times a week,Once a day,Once a month,Once a month,Few times a week,Never,7-10 cups,
4,18-24,Female,3,Eggetarian (Vegetarian who consumes egg and eg...,Sometimes,Sometimes,Never,Often,Breakfast,Freshly home-cooked produce,1,I do not have any allergies,Few times a week,Few times a week,Once a day,In every meal,Few times a week,Never,Never,Never,Never,Once a month,Once a month,Once a day,Once a month,4-6 cups,


Dropping the null values

In [457]:
df = df.dropna(axis=1, how='all')

Handling Age Column with One-Hot Encoding

In this section, we apply **One-Hot Encoding** to the 'Age' column of the dataset, which is a categorical feature.

In [458]:
age_dummies = pd.get_dummies(df['Age'])
df = pd.concat([df, age_dummies], axis=1)
df = df.drop(columns=['Age'])

Drop the original Water Consumption column

In [459]:

df = df.drop(columns=['What is your water consumption like (in a day, 1 cup=250ml approx)'])


Selecting Categorical columns

In [460]:
categorical_cols = df.select_dtypes(include=['object']).columns

In [461]:
print(categorical_cols)

Index(['Gender', 'What would best describe your diet:',
       'Choose all that apply: [I skip meals]',
       'Choose all that apply: [I experience feelings of hunger during the day]',
       'Choose all that apply: [I consult a nutritionist/dietician]',
       'Choose all that apply: [I cook my own meals]',
       'What would you consider to be the main meal of YOUR day?',
       'What does your diet mostly consist of and how is it prepared?',
       'Are you allergic to any of the following? (Tick all that apply)',
       'What is your weekly food intake frequency of the following food categories: [Sweet foods]',
       'What is your weekly food intake frequency of the following food categories: [Salty foods]',
       'What is your weekly food intake frequency of the following food categories: [Fresh fruit]',
       'What is your weekly food intake frequency of the following food categories: [Fresh vegetables]',
       'What is your weekly food intake frequency of the following food

 Encoding Categorical Variables

In [462]:
df_encoded = pd.get_dummies(df, columns=categorical_cols)

Shape before encoding and after encoding

In [463]:
shape = df.shape
print("Shape of the DataFrame before encoding:", shape)
shape = df_encoded.shape
print("Shape of the DataFrame after encoding:", shape)

Shape of the DataFrame before encoding: (78, 29)
Shape of the DataFrame after encoding: (78, 134)


Cleaning and Updating Categorical Data

In [464]:
df_encoded_cleaned = df_encoded.drop(columns=[
    'Choose all that apply: [I consult a nutritionist/dietician]_Rarely',
    'Choose all that apply: [I consult a nutritionist/dietician]_Sometimes'
])

df_encoded_cleaned.loc[
    df_encoded_cleaned['Choose all that apply: [I consult a nutritionist/dietician]_Never'] == False,
    'Choose all that apply: [I consult a nutritionist/dietician]_Often'
] = True 



Counting True Values in Specific Columns

In [465]:

true_count = df_encoded_cleaned['Choose all that apply: [I consult a nutritionist/dietician]_Often'].sum()
print(f"Number of true values in the column 'Choose all that apply: [I consult a nutritionist/dietician]_Often': {true_count}")

true_count = df_encoded_cleaned['Choose all that apply: [I consult a nutritionist/dietician]_Never'].sum()

print(f"Number of true values in the column 'Choose all that apply: [I consult a nutritionist/dietician]_Never': {true_count}")


Number of true values in the column 'Choose all that apply: [I consult a nutritionist/dietician]_Often': 15
Number of true values in the column 'Choose all that apply: [I consult a nutritionist/dietician]_Never': 63


In [466]:
X = df_encoded_cleaned.drop(columns=[
    'Choose all that apply: [I consult a nutritionist/dietician]_Never',
    'Choose all that apply: [I consult a nutritionist/dietician]_Often'
])  # Features

# Create the target variable
y = df_encoded_cleaned['Choose all that apply: [I consult a nutritionist/dietician]_Often'].astype(int)

# Check class distribution
print("Class distribution:")
print(y.value_counts())

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Logistic Regression model with class weights
logistic_model = LogisticRegression(random_state=42, class_weight='balanced', max_iter=1000)
logistic_model.fit(X_train, y_train)

# Make predictions
y_pred_logistic = logistic_model.predict(X_test)

# Evaluate the model
print("Logistic Regression Classification Report:")
print(classification_report(y_test, y_pred_logistic))

# Calculate F1 Score
f1_logistic = f1_score(y_test, y_pred_logistic)
print("Logistic Regression F1 Score:", f1_logistic)

Class distribution:
Choose all that apply: [I consult a nutritionist/dietician]_Often
0    63
1    15
Name: count, dtype: int64
Logistic Regression Classification Report:
              precision    recall  f1-score   support

           0       0.75      0.69      0.72        13
           1       0.00      0.00      0.00         3

    accuracy                           0.56        16
   macro avg       0.38      0.35      0.36        16
weighted avg       0.61      0.56      0.58        16

Logistic Regression F1 Score: 0.0
