 # Social Network Ads Classifier Project

## Problem Statement
The aim of this project is to predict whether a user will purchase a product based on their age, gender, and estimated salary using a dataset containing information about users' interactions with social network ads.

## Project Description
The dataset consists of the following columns:
- User ID: Unique identifier for each user
- Gender: Gender of the user
- Age: Age of the user
- Estimated Salary: Estimated salary of the user
- Purchased: Whether the user purchased the product (binary classification target)






In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv("Social_Network_Ads.csv")

In [4]:
df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


### Data Preprocessing
- **Handling Missing Values**: No missing values found in the dataset.
- **Encoding Categorical Features**: The 'Gender' column is one-hot encoded to convert it into numerical values.
- **Feature Selection**: The 'User ID' column is dropped as it does not contribute to the prediction. The 'Gender' column is also dropped after one-hot encoding.

In [6]:
df.isnull().any()

User ID            False
Gender             False
Age                False
EstimatedSalary    False
Purchased          False
dtype: bool

In [7]:
gender_df = pd.get_dummies(df['Gender'],drop_first=True)

gender_df

Unnamed: 0,Male
0,1
1,1
2,0
3,0
4,1
...,...
395,0
396,1
397,0
398,1


In [8]:
df.drop('User ID',axis=1,inplace=True)


In [9]:
df

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased
0,Male,19,19000,0
1,Male,35,20000,0
2,Female,26,43000,0
3,Female,27,57000,0
4,Male,19,76000,0
...,...,...,...,...
395,Female,46,41000,1
396,Male,51,23000,1
397,Female,50,20000,1
398,Male,36,33000,0


In [10]:
df.drop('Gender',axis=1,inplace=True)

df

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0
...,...,...,...
395,46,41000,1
396,51,23000,1
397,50,20000,1
398,36,33000,0


In [11]:
df= pd.concat([df,gender_df],axis=1)

In [12]:
df

Unnamed: 0,Age,EstimatedSalary,Purchased,Male
0,19,19000,0,1
1,35,20000,0,1
2,26,43000,0,0
3,27,57000,0,0
4,19,76000,0,1
...,...,...,...,...
395,46,41000,1,0
396,51,23000,1,1
397,50,20000,1,0
398,36,33000,0,1


### Model Training
The dataset is split into training and testing sets using a train-test split ratio of 80:20. Standard scaling is applied to the features to bring them to the same scale. The following classifier is trained:
- **Gaussian Naive Bayes Classifier**: Trained using the training data after standard scaling.





In [13]:
x = df.iloc[:,[0,1,3]].values
x

array([[   19, 19000,     1],
       [   35, 20000,     1],
       [   26, 43000,     0],
       ...,
       [   50, 20000,     0],
       [   36, 33000,     1],
       [   49, 36000,     0]], dtype=int64)

In [14]:
y = df.iloc[:,-2].values
y

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0,
       1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1,

In [15]:
from sklearn.model_selection import train_test_split

x_train , x_test ,y_train ,y_test = train_test_split( x , y , test_size=0.2 , random_state=0)

In [16]:
from sklearn.preprocessing import StandardScaler

In [17]:
sc = StandardScaler()

In [18]:
x_train = sc.fit_transform(x_train)

In [19]:
x_test = sc.fit_transform(x_test)

In [20]:
from sklearn.naive_bayes import GaussianNB

In [21]:
classifier = GaussianNB()

In [22]:
classifier.fit(x_train, y_train)

### Model Evaluation
The trained model is evaluated using the following metrics:
- **Accuracy Score**: Percentage of correctly classified instances in the test set.
- **Confusion Matrix**: A matrix showing the counts of true positive, true negative, false positive, and false negative predictions.

In [25]:
y_pred = classifier.predict(x_test)

In [26]:
y_pred

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1], dtype=int64)

In [27]:
from sklearn.metrics import accuracy_score,confusion_matrix

In [28]:
ac = accuracy_score(y_test,y_pred)
ac

0.9375


## Results
The accuracy achieved by the Gaussian Naive Bayes classifier on the test set is 93.75%.

In [29]:
import pickle 
pickle.dump(sc,open('scaler.pickle','wb'))
ssc = pickle.load(open('Scaler.pickle' , 'rb'))


In [30]:
pickle.dump(classifier,open('nbclassifier.pkl','wb'))

model = pickle.load(open('nbclassifier.pkl','rb'))