Technological Institute of the Philippines | Quezon City - Computer Engineering
--- | ---
Course Code: | CPE 019
Code Title: | Emerging Technologies in CpE 2
2nd Semester | AY 2023-2024
<hr> | <hr>
<u>**Assignment 7.1** | Classifications and Regression
**Name** | Buenafe, Dhafny S.
**Section** | CPE32S3
**Date Performed**: |April 7, 2024
**Date Submitted**: |April 11, 2024
**Instructor**: | Engr. Roman Richard

<hr>

For classification, do the following:
  - Create a base model
  - Evaluate the model with k-fold cross validation
  - Improve the accuracy of your model by applying additional hidden layers
  
For regression, do the following:
  - Create a base model
  - Improve the model by standardizing the dataset
  - Show tuning of layers and neurons (see evaluating small and larger networks)


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('/content/Obesity Classification.csv')

The data revolves around obesity classification, focusing on how features such as age, gender, height, weight, and BMI influence the classification into categories like underweight, normal weight, overweight, and obesity.  This analysis aims to investigate how these specific features influence the classification process.

In [None]:
df.head()

Unnamed: 0,ID,Age,Gender,Height,Weight,BMI,Label
0,1,25,Male,175,80,25.3,Normal Weight
1,2,30,Female,160,60,22.5,Normal Weight
2,3,35,Male,180,90,27.3,Overweight
3,4,40,Female,150,50,20.0,Underweight
4,5,45,Male,190,100,31.2,Obese


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 108 entries, 0 to 107
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   ID      108 non-null    int64  
 1   Age     108 non-null    int64  
 2   Gender  108 non-null    object 
 3   Height  108 non-null    int64  
 4   Weight  108 non-null    int64  
 5   BMI     108 non-null    float64
 6   Label   108 non-null    object 
dtypes: float64(1), int64(4), object(2)
memory usage: 6.0+ KB


In [None]:
df.isnull().sum()

ID        0
Age       0
Gender    0
Height    0
Weight    0
BMI       0
Label     0
dtype: int64

In [None]:
df["Gender"] = df["Gender"].apply(lambda toLabel: 0 if toLabel == 'Male' else 1)

In [None]:
df.head()

Unnamed: 0,ID,Age,Gender,Height,Weight,BMI,Label
0,1,25,0,175,80,25.3,Normal Weight
1,2,30,1,160,60,22.5,Normal Weight
2,3,35,0,180,90,27.3,Overweight
3,4,40,1,150,50,20.0,Underweight
4,5,45,0,190,100,31.2,Obese


In [None]:
label_mapping = {'Underweight': 0, 'Normal Weight': 1, 'Overweight': 2, 'Obese': 3}

In [None]:
df['Encoded_Label'] = df['Label'].map(label_mapping)

In [None]:
df.head()

Unnamed: 0,ID,Age,Gender,Height,Weight,BMI,Label,Encoded_Label
0,1,25,0,175,80,25.3,Normal Weight,1
1,2,30,1,160,60,22.5,Normal Weight,1
2,3,35,0,180,90,27.3,Overweight,2
3,4,40,1,150,50,20.0,Underweight,0
4,5,45,0,190,100,31.2,Obese,3


In [None]:
y = (df["Encoded_Label"])
y

0      1
1      1
2      2
3      0
4      3
      ..
103    0
104    0
105    0
106    0
107    0
Name: Encoded_Label, Length: 108, dtype: int64

In [None]:
x = df.drop(["ID","Label","Encoded_Label"], axis=1)
x

Unnamed: 0,Age,Gender,Height,Weight,BMI
0,25,0,175,80,25.3
1,30,1,160,60,22.5
2,35,0,180,90,27.3
3,40,1,150,50,20.0
4,45,0,190,100,31.2
...,...,...,...,...,...
103,11,0,175,10,3.9
104,16,1,160,10,3.9
105,21,0,180,15,5.6
106,26,1,150,15,5.6


#Splitting Data

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

In [None]:
x_train

Unnamed: 0,Age,Gender,Height,Weight,BMI
14,38,0,190,90,27.3
94,17,1,160,15,5.6
45,52,1,130,75,25.0
76,55,0,210,85,26.1
25,93,1,140,40,16.7
...,...,...,...,...,...
46,57,0,210,105,28.9
40,27,0,180,75,24.2
0,25,0,175,80,25.3
90,48,1,130,40,16.7


In [None]:
y_train

14    2
94    0
45    2
76    2
25    0
     ..
46    3
40    1
0     1
90    0
23    0
Name: Encoded_Label, Length: 75, dtype: int64

#Classification

In [None]:
from sklearn.model_selection import cross_val_score, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)

In [None]:
def create_base_model():
    model = Sequential([
        Dense(64, activation='relu', input_shape=(x.shape[1],)),
        Dropout(0.5),  # Adding dropout for regularization
        Dense(32, activation='relu'),
        Dense(4, activation='softmax')  # Output layer with 4 units for classification
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

In [None]:
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

In [None]:
accuracy_scores = []
for train_index, test_index in kfold.split(x_scaled):
    x_train, x_test = x_scaled[train_index], x_scaled[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

    # Create and train model
    model = create_base_model()
    model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=0)

    # Evaluate model
    _, accuracy = model.evaluate(x_test, y_test, verbose=0)
    accuracy_scores.append(accuracy)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
print("Accuracy: %.2f%%" % (np.mean(accuracy_scores)*100))

Accuracy: 51.77%


- The conclusion is that the overall accuracy of the model, based on the mean of the accuracy scores, is 51.77%. This suggests that the model's performance in classifying obesity based on the provided features (age, gender, height, weight, and BMI) is moderate, but there is room for improvement. To further improve the accuracy the model should refine or make some adjustments.

#Regression

In [None]:
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
# Create the base model
model2= Sequential([
    Dense(64, activation='relu', input_shape=(x_train.shape[1],)),
    Dense(4, activation='softmax')
])

# Compile the model
model2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model2.fit(x_train, y_train, epochs=50, batch_size=18, validation_split=0.3)

Epoch 1/50
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 70ms/step - accuracy: 0.1004 - loss: 1.3999 - val_accuracy: 0.1852 - val_loss: 1.3793
Epoch 2/50
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.1859 - loss: 1.3728 - val_accuracy: 0.1852 - val_loss: 1.3535
Epoch 3/50
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.3496 - loss: 1.3224 - val_accuracy: 0.1852 - val_loss: 1.3292
Epoch 4/50
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.4422 - loss: 1.2667 - val_accuracy: 0.2222 - val_loss: 1.3068
Epoch 5/50
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.5200 - loss: 1.2125 - val_accuracy: 0.2593 - val_loss: 1.2870
Epoch 6/50
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4033 - loss: 1.2352 - val_accuracy: 0.2963 - val_loss: 1.2688
Epoch 7/50
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━

- The results show that the model is good at learning from the training data, achieving about 80.52% as highest accuracy. However, it's not as good at handling new, unseen data, as seen in the lower validation accuracy of around 70.37%. This means taht the model is too focused on the specifics of the training data and struggles with new information. The model needs to refine to be able to learn new datas.

In [None]:
smaller_model = Sequential([
    Dense(32, activation='relu', input_shape=(x_train.shape[1],)),
    Dense(4, activation='softmax')
])

smaller_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
smaller_history = smaller_model.fit(x_train, y_train, epochs=50, batch_size=8, validation_split=0.3)

Epoch 1/50


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 31ms/step - accuracy: 0.1803 - loss: 1.4370 - val_accuracy: 0.0000e+00 - val_loss: 2.3707
Epoch 2/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.3215 - loss: 1.3077 - val_accuracy: 0.0000e+00 - val_loss: 2.2906
Epoch 3/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.3327 - loss: 1.3055 - val_accuracy: 0.0000e+00 - val_loss: 2.2105
Epoch 4/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.3660 - loss: 1.2902 - val_accuracy: 0.0370 - val_loss: 2.1342
Epoch 5/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.3805 - loss: 1.2419 - val_accuracy: 0.1111 - val_loss: 2.0613
Epoch 6/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step - accuracy: 0.3533 - loss: 1.2594 - val_accuracy: 0.1111 - val_loss: 1.9921
Epoch 7/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━

In [None]:
larger_model = Sequential([
    Dense(128, activation='relu', input_shape=(x_train.shape[1],)),
    Dense(64, activation='relu'),
    Dense(4, activation='softmax')
])

larger_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
larger_history = larger_model.fit(x_train, y_train, epochs=50, batch_size=8, validation_split=0.3)

Epoch 1/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 31ms/step - accuracy: 0.2921 - loss: 1.4109 - val_accuracy: 0.6296 - val_loss: 1.2989
Epoch 2/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6753 - loss: 1.2185 - val_accuracy: 0.6296 - val_loss: 1.1522
Epoch 3/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6866 - loss: 1.0865 - val_accuracy: 0.7037 - val_loss: 1.0199
Epoch 4/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6873 - loss: 0.9980 - val_accuracy: 0.7037 - val_loss: 0.9038
Epoch 5/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6660 - loss: 0.9105 - val_accuracy: 0.7407 - val_loss: 0.7846
Epoch 6/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6547 - loss: 0.8662 - val_accuracy: 0.7407 - val_loss: 0.7047
Epoch 7/50
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0

- The conclusion here is that if we use lots of hidden layers, the data gets clearer, and the model can learn it better. But if we use fewer layers, the model isn't as good and needs improvement.