# Logistic Regression on age prediction from a dataset

This shows that I used Nvidia T4 GPU from Google Colab. 

In [1]:
!nvidia-smi

Fri Mar  8 09:54:56 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   61C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

The dataset has been stored on Google Drive. 

I mounted the dataset such that Google Colab is able to access the dataset

In [2]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [3]:
# List the contents of the root directory
!ls '/content/drive/MyDrive/'

# Change directory
%cd '/content/drive/MyDrive/ML'

'Colab Notebooks'	       mysql			    'Untitled document (2).gdoc'
'Copy of setup-database.sql'  'Neural Networks'		    'Untitled document (3).gdoc'
 databases		      'Subject Materials'	    'Untitled document (4).gdoc'
 ML			      'Untitled Diagram.drawio'     'Untitled document.gdoc'
 MyModels		      'Untitled document (1).gdoc'
/content/drive/MyDrive/ML


In [4]:
ls

age_detection.csv  [0m[01;34mSharpenedImages[0m/  [01;34mtest[0m/  [01;34mTest_Faces[0m/  [01;34mTrain[0m/  train.csv


Importing all necessary libraries and packages

In [5]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from tensorflow.keras.preprocessing.image import img_to_array, load_img
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.models import Model

Making use of the VGG16 model for feature extraction

In [6]:
# Load pre-trained VGG16 model for feature extraction
base_model = VGG16(weights='imagenet')
model = Model(inputs=base_model.input, outputs=base_model.get_layer('fc1').output)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5


In [7]:
# Function to preprocess and extract features from a single image
def extract_features(img_path, model):
    img = load_img(img_path, target_size=(224, 224))
    img_array = img_to_array(img)
    expanded_img_array = np.expand_dims(img_array, axis=0)
    preprocessed_img = preprocess_input(expanded_img_array)
    features = model.predict(preprocessed_img)
    return features.flatten()

Loading and preparing the dataset

In [8]:
data_1 = pd.read_csv('train.csv')
data_1['Class'].replace(
    ['YOUNG', 'MIDDLE','OLD'],
    [0, 1, 2],
    inplace=True)

data_1.head()

Unnamed: 0,ID,Class
0,377.jpg,1
1,17814.jpg,0
2,21283.jpg,1
3,16496.jpg,0
4,4487.jpg,1


In [9]:
sample_data = data_1.sample(n=5000, random_state=42)

In [10]:
%cd Train

/content/drive/MyDrive/ML/Train


In [11]:
# Extract features for each image
features = np.array([extract_features(path, model) for path in sample_data['ID']])


[1;30;43mStreaming output truncated to the last 5000 lines.[0m


Splitting the dataset into training set and testing test using 20% test set and random state of 42 for reproducibility

In [12]:
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(features, sample_data['Class'], test_size=0.2, random_state=42)


In [13]:
# Train logistic regression model
lr_model = LogisticRegression(max_iter=5000)
lr_model.fit(X_train, y_train)

Evaluation of simple logistic regression model

In [14]:
# Predict and evaluate
y_pred = lr_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

Accuracy: 0.659


In [15]:
# Precision, Recall, F1-score
# Calculate the confusion matrix

from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score


confusion = confusion_matrix(y_test, y_pred)

# Calculate precision, recall, and F1-score
precision = precision_score(y_test, y_pred, average='weighted')  # Use 'weighted' for multiclass classification
recall = recall_score(y_test, y_pred, average='weighted')  # Use 'weighted' for multiclass classification
f1 = f1_score(y_test, y_pred, average='weighted')  # Use 'weighted' for multiclass classification

print("Confusion Matrix:")
print(confusion)
print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1)

Confusion Matrix:
[[223 122   8]
 [105 382  42]
 [ 10  54  54]]
Precision: 0.6563126338783907
Recall: 0.659
F1-Score: 0.657054863479181
