# Sex Classification Example

 **Problem:** classify whether a given person is a male or a female based on the measured features. The features include height, weight, and foot size.
 
 **Data:**
 
 |Gender|Height(feet)|Weight|Foot Size|
 |----|---:|---:|---:|
 |male  |6           |180   |12       |
 |male  |5.92        |190   |11       |
 |male  |5.58        |170   |12       |
 |male  |5.92        |165   |10       |
 |female|5           |100   |6        |
 |female|5.5         |150   |8        |
 |female|5.42        |130   |7        |
 |female|5.75        |150   |9        |

### Importing basic Libraries

In [1]:
%matplotlib inline

#Import basic libraries
import pandas as pd
import numpy as np
from __future__ import division
from sklearn.naive_bayes import GaussianNB

### Creating dataset

In [2]:
data = pd.DataFrame()

### Creating the target classes

In [3]:
data['Gender'] = ['male','male','male','male','female','female','female','female']
data

Unnamed: 0,Gender
0,male
1,male
2,male
3,male
4,female
5,female
6,female
7,female


### Adding features values

In [4]:
data['Height'] = [6,5.92,5.58,5.92,5,5.5,5.42,5.75]
data['Weight'] = [180,190,170,165,100,150,130,150]
data['Foot_Size'] = [12,11,12,10,6,8,7,9]
data

Unnamed: 0,Gender,Height,Weight,Foot_Size
0,male,6.0,180,12
1,male,5.92,190,11
2,male,5.58,170,12
3,male,5.92,165,10
4,female,5.0,100,6
5,female,5.5,150,8
6,female,5.42,130,7
7,female,5.75,150,9


### Creating Test dataset

In [5]:
new_person = pd.DataFrame()

### Adding features

In [6]:
new_person['Height'] = [6]
new_person['Weight'] = [130]
new_person['Foot_Size'] = [8]
new_person

Unnamed: 0,Height,Weight,Foot_Size
0,6,130,8


### Calculating Priors -- ( P(male) and P(female) )

In [7]:
n_male = data['Gender'][data['Gender'] == 'male'].count()
print('Total number of Males in data is: %s' % (n_male))

n_female = data['Gender'][data['Gender'] == 'female'].count()
print('Total number of Females in data is: %s' % (n_female))

total_people = data['Gender'].count()
print('Total number of People in data is: %s' % (total_people))

prior_male = n_male / total_people
print('Probability of male is: %s' % (prior_male))

prior_female = n_female / total_people
print('Probability of male is: %s' % (prior_female))




Total number of Males in data is: 4
Total number of Females in data is: 4
Total number of People in data is: 8
Probability of male is: 0.5
Probability of male is: 0.5


### Calculating Likelihood ( P( height | Female), P( weight | Female) ) and so on.
since in a Naive Gaussian classifiers we asume that the likelihood is normally distribute we just need to calculate the mean and variance for each case

#### Calculating means

In [8]:
data_means = data.groupby('Gender').mean()
data_means

Unnamed: 0_level_0,Height,Weight,Foot_Size
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,5.4175,132.5,7.5
male,5.855,176.25,11.25


#### Calculating variance

In [9]:
data_variance = data.groupby('Gender').var()
data_variance

Unnamed: 0_level_0,Height,Weight,Foot_Size
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.097225,558.333333,1.666667
male,0.035033,122.916667,0.916667


### Moving the results to independent variables

In [10]:

male_height_mean = data_means['Height'][data_variance.index == 'male'].values[0]
male_weight_mean = data_means['Weight'][data_variance.index == 'male'].values[0]
male_footsize_mean = data_means['Foot_Size'][data_variance.index == 'male'].values[0]


male_height_variance = data_variance['Height'][data_variance.index == 'male'].values[0]
male_weight_variance = data_variance['Weight'][data_variance.index == 'male'].values[0]
male_footsize_variance = data_variance['Foot_Size'][data_variance.index == 'male'].values[0]


female_height_mean = data_means['Height'][data_variance.index == 'female'].values[0]
female_weight_mean = data_means['Weight'][data_variance.index == 'female'].values[0]
female_footsize_mean = data_means['Foot_Size'][data_variance.index == 'female'].values[0]


female_height_variance = data_variance['Height'][data_variance.index == 'female'].values[0]
female_weight_variance = data_variance['Weight'][data_variance.index == 'female'].values[0]
female_footsize_variance = data_variance['Foot_Size'][data_variance.index == 'female'].values[0]

### Creating function to calculate likelihood

In [11]:
def p_x_given_y(x, mean_y, variance_y):

    # Input the arguments into a probability density function
    p = 1/(np.sqrt(2*np.pi*variance_y)) * np.exp((-(x-mean_y)**2)/(2*variance_y))
    
    # return p
    return p

### Applying Classifier to the test Person

In [12]:
prior_male * \
p_x_given_y(new_person['Height'][0], male_height_mean, male_height_variance) * \
p_x_given_y(new_person['Weight'][0], male_weight_mean, male_weight_variance) * \
p_x_given_y(new_person['Foot_Size'][0], male_footsize_mean, male_footsize_variance)


6.197071843878078e-09

In [13]:
prior_female * \
p_x_given_y(new_person['Height'][0], female_height_mean, female_height_variance) * \
p_x_given_y(new_person['Weight'][0], female_weight_mean, female_weight_variance) * \
p_x_given_y(new_person['Foot_Size'][0], female_footsize_mean, female_footsize_variance)

0.0005377909183630018

### Since the numerator of posterior for Female is greater than male. we predict that the new person is Female

# *Using sklearn Library*

### Initiate Classifier

In [14]:
gnb = GaussianNB()

### Defining features to be use in the prediction model

In [15]:
used_features = ['Height','Weight','Foot_Size']

### Training the Model

In [16]:
gnb.fit(data[used_features].values,data['Gender'])

GaussianNB(priors=None)

### Predict Gender of new_person using the new Model

In [17]:
gender_pred = gnb.predict(new_person[used_features])
print(gender_pred)

['female']
