<a href="https://colab.research.google.com/github/Nacxht/ML-Archive/blob/main/gender_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Preparation

## Kaggle & GDrive config

In [1]:
!pip install kaggle



In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = '/content/drive/MyDrive/kaggle'

## Download dataset

In [4]:
!kaggle datasets download -d elakiricoder/gender-classification-dataset

Downloading gender-classification-dataset.zip to /content
  0% 0.00/19.0k [00:00<?, ?B/s]
100% 19.0k/19.0k [00:00<00:00, 31.1MB/s]


## Load the dataset

In [5]:
import zipfile

filepath = '/content/gender-classification-dataset.zip'

with zipfile.ZipFile(filepath, 'r') as zip_ref:
  zip_ref.extractall('/content/dataset')

In [6]:
import pandas as pd

gender_df = pd.read_csv('/content/dataset/gender_classification_v7.csv')

## Checking Dataset

In [7]:
gender_df.head()

Unnamed: 0,long_hair,forehead_width_cm,forehead_height_cm,nose_wide,nose_long,lips_thin,distance_nose_to_lip_long,gender
0,1,11.8,6.1,1,0,1,1,Male
1,0,14.0,5.4,0,0,1,0,Female
2,0,11.8,6.3,1,1,1,1,Male
3,0,14.4,6.1,0,1,1,1,Male
4,1,13.5,5.9,0,0,0,0,Female


In [8]:
gender_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5001 entries, 0 to 5000
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   long_hair                  5001 non-null   int64  
 1   forehead_width_cm          5001 non-null   float64
 2   forehead_height_cm         5001 non-null   float64
 3   nose_wide                  5001 non-null   int64  
 4   nose_long                  5001 non-null   int64  
 5   lips_thin                  5001 non-null   int64  
 6   distance_nose_to_lip_long  5001 non-null   int64  
 7   gender                     5001 non-null   object 
dtypes: float64(2), int64(5), object(1)
memory usage: 312.7+ KB


## Separating dataset attributes and labels

In [9]:
X = gender_df.iloc[:,:-1]
y = gender_df['gender']

In [10]:
X.head()

Unnamed: 0,long_hair,forehead_width_cm,forehead_height_cm,nose_wide,nose_long,lips_thin,distance_nose_to_lip_long
0,1,11.8,6.1,1,0,1,1
1,0,14.0,5.4,0,0,1,0
2,0,11.8,6.3,1,1,1,1
3,0,14.4,6.1,0,1,1,1
4,1,13.5,5.9,0,0,0,0


In [11]:
y.head()

0      Male
1    Female
2      Male
3      Male
4    Female
Name: gender, dtype: object

## Train Test Split

In [12]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=42
)

# Creating Model (Decision Tree)

In [13]:
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Model Evaluation

In [14]:
from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)

print(f'Accuracy: {acc}')

Accuracy: 0.9585207396301849


# Prediction

In [15]:
# "long hair", "forehead width (cm)", "forehead height (cm)", "nose wide", "nose long", "lips thin", "distance nose to lip"

"""
/ - "long hair"             -> this indicates whether this person has a long hair or not.
/ - "forehead width"        -> the width of the forehead from right to left given in cm.
/ - "forehead height"       -> the width of the forehead width in cm from where the hair grows to the eyebrows.
/ - "nose wide"             -> whether the nose is wide or not. 1 represents wide and 0 not.
/ - "nose long"             -> whether the nose is long or not. 1 represents long and 0 not.
/ - "lips thin"             -> whether this person has a thin lip or not. 1 represents thin and 0 not.
/ - "distance nose to lip"  -> is the distance from nose to lip is long? 1 represents yes and 0 not.
"""

gender = model.predict([[0, 12.5, 7, 0, 0, 1, 0]])

print(f'Gender: {gender[0]}')

Gender: Male


