Data mining in health and wellness offers numerous impactful applications. It transforms raw nutrition and food tracking data into actionable insights, helping users make informed decisions that support their overall health and wellness goals.

In this practice activity, you will be working with a preliminary review of a nutrition and food tracking dataset. The main objective of this exercise is for you to understand the dataset structure and get familiar with the meaning of each feature of the nutritional data.

**Instructions:**

1. Download the nutrition dataset  and load it to .
    
    [here](https://drive.google.com/file/d/1sLoEkd84nwHpJK0yOhBhcsNzQyV_fd_X/view?usp=drive_link)
    
    [Jupyter Notebooks](https://jupyter.org/)
    
2. Inspect the first few rows and columns using head() to get a sense of the data.
3. Review the dataset’s features and identify potential target variables.
4. Based on the initial findings, answer the following questions:
    1. How many entries does the dataset have?
    2. What are the column names, and what kind of data do they contain?
    3. Do you notice any missing or unusual values?
5. Create a table that lists each feature, describes its meaning, and identifies its type (e.g., numerical, categorical).
6. Based on the results, answer the following questions:
    1. Which columns are numerical, and which are categorical?
    2. Are there any features that look redundant or unnecessary for analysis?

In [16]:
# 1. Import Libraries
import pandas as pd
import os
from google.colab import drive
from sklearn.preprocessing import LabelEncoder
drive.mount('/content/drive')


# 2. Load the Dataset
# Get the current working directory
current_directory = os.getcwd()

# 3. Construct the file path relative to the current directory
file_path = "/content/drive/MyDrive/Colab Notebooks/nutrition_data.csv"

# 4. Load the dataset using the relative path
df = pd.read_csv(r"/content/drive/MyDrive/Colab Notebooks/nutrition_data.csv")
print(os.path.exists(file_path))  # Check if file exists

# 5. Verify the dataset load
!cp "/content/drive/MyDrive/Colab Notebooks/nutrition_data.csv" "/content/"
print(df.head())

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
True
        food_item  calories  protein  carbs  fats  meal_time
0           Apple      95.0      0.5   25.0   0.3  Breakfast
1          Banana     105.0      1.3   27.0   0.4      Snack
2  Chicken Breast     165.0     31.0    0.0   3.6      Lunch
3           Steak     679.0     62.0    0.0  48.0     Dinner
4           Salad     150.0      2.0   15.0   7.0      Lunch


Reviewing the dataset's feature.


In [None]:
# 6. Dataset Structure
num_rows = df.shape[0]
num_cols = df.shape[1]

print(f"The dataset has {num_rows} rows and {num_cols} columns.")

print("\nDataset Info:")
df.info()

The dataset has 101 rows and 6 columns.

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
Index: 101 entries, 0 to 104
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   food_item  101 non-null    object 
 1   calories   101 non-null    float64
 2   protein    101 non-null    float64
 3   carbs      101 non-null    float64
 4   fats       101 non-null    float64
 5   meal_time  101 non-null    object 
dtypes: float64(4), object(2)
memory usage: 5.5+ KB


The given dataset has 105 rows and 6 columns namely 'food_item', 'calories', 'protein', 'carbs', 'fats', and 'meal_time'

Checking any missing and unusual values in the given dataset.

In [None]:
# 7. Missing Values
missing_values = df.isnull().sum()
print("\nMissing Values in the Dataset:")
print(missing_values)


Missing Values in the Dataset:
food_item    0
calories     0
protein      0
carbs        0
fats         0
meal_time    0
dtype: int64


**Handling Missing Data**

Filling the missing values with Column Means


In [None]:
# 8. Fill Missing Values with Column Means
for column in df.columns:
    if df[column].isnull().any():  # Check if the column has any missing values
        column_mean = df[column].mean()
        df[column].fillna(column_mean, inplace=True)



Verify that the Missing Values are filled, and there are no more missing

In [None]:
# 9. Verify that Missing Values are Filled
print("\nChecking for missing values after filling:")
print(df.isnull().sum())  # Should show 0 missing values in the filled columns



Checking for missing values after filling:
food_item    0
calories     0
protein      0
carbs        0
fats         0
meal_time    0
dtype: int64


In [14]:
# 10. Feature Review Table
features = {
    "Feature": ["food_item", "calories", "protein", "carbs", "fats", "meal_time"],
    "Description": [
        "Name of the food item",
        "Calories in the food item",
        "Protein content (grams)",
        "Carbohydrate content (grams)",
        "Fat content (grams)",
        "Meal timing (e.g., breakfast, lunch, etc.)"
    ],
    "Type": ["Categorical", "Numerical", "Numerical", "Numerical", "Numerical", "Categorical"]
}

feature_review_df = pd.DataFrame(features)
print("\nFeature Review Table:")
print(feature_review_df)

# 11. Analyze Feature Types
numerical_columns = df.select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_columns = df.select_dtypes(include=['object']).columns.tolist()

print(f"\nNumerical Columns: {numerical_columns}")
print(f"Categorical Columns: {categorical_columns}")


Feature Review Table:
     Feature                                 Description         Type
0  food_item                       Name of the food item  Categorical
1   calories                   Calories in the food item    Numerical
2    protein                     Protein content (grams)    Numerical
3      carbs                Carbohydrate content (grams)    Numerical
4       fats                         Fat content (grams)    Numerical
5  meal_time  Meal timing (e.g., breakfast, lunch, etc.)  Categorical

Numerical Columns: ['calories', 'protein', 'carbs', 'fats', 'meal_time_encoded']
Categorical Columns: []


Converting the 'food_item' column from categorical into numerical format.


In [17]:
# Apply One-Hot Encoding to the 'food_item' column
df = pd.get_dummies(df, columns=['food_item'], drop_first=True)

# Display the updated dataframe with new one-hot encoded columns
print(df.head())


   calories  protein  carbs  fats  meal_time  food_item_Banana  \
0      95.0      0.5   25.0   0.3  Breakfast             False   
1     105.0      1.3   27.0   0.4      Snack              True   
2     165.0     31.0    0.0   3.6      Lunch             False   
3     679.0     62.0    0.0  48.0     Dinner             False   
4     150.0      2.0   15.0   7.0      Lunch             False   

   food_item_Chicken Breast  food_item_Fish  food_item_Pasta  food_item_Pizza  \
0                     False           False            False            False   
1                     False           False            False            False   
2                      True           False            False            False   
3                     False           False            False            False   
4                     False           False            False            False   

   food_item_Rice  food_item_Salad  food_item_Steak  food_item_Yogurt  
0           False            False          

Checking the columns.

In [18]:
print(df.columns)


Index(['calories', 'protein', 'carbs', 'fats', 'meal_time', 'food_item_Banana',
       'food_item_Chicken Breast', 'food_item_Fish', 'food_item_Pasta',
       'food_item_Pizza', 'food_item_Rice', 'food_item_Salad',
       'food_item_Steak', 'food_item_Yogurt'],
      dtype='object')


Converting the 'meal_time' column from categorical into numerical format.

In [19]:
# Initialize the label encoder
label_encoder = LabelEncoder()

# Apply label encoding to the 'meal_time' column
df['meal_time_encoded'] = label_encoder.fit_transform(df['meal_time'])

# Drop the original 'meal_time' column if not needed
df.drop('meal_time', axis=1, inplace=True)

# Display the transformed dataframe
print(df.head())


   calories  protein  carbs  fats  food_item_Banana  food_item_Chicken Breast  \
0      95.0      0.5   25.0   0.3             False                     False   
1     105.0      1.3   27.0   0.4              True                     False   
2     165.0     31.0    0.0   3.6             False                      True   
3     679.0     62.0    0.0  48.0             False                     False   
4     150.0      2.0   15.0   7.0             False                     False   

   food_item_Fish  food_item_Pasta  food_item_Pizza  food_item_Rice  \
0           False            False            False           False   
1           False            False            False           False   
2           False            False            False           False   
3           False            False            False           False   
4           False            False            False           False   

   food_item_Salad  food_item_Steak  food_item_Yogurt  meal_time_encoded  
0          