# About Dataset
This dataset captures demographic and behavioral information about individuals concerning their purchasing decisions. It consists of 400 samples, with each entry containing the following major columns:

- Gender: The gender of the individual, categorized as either "Male" or "Female."
- Age: The age of the individual, represented as an integer.
- EstimatedSalary: The estimated annual salary of the individual, expressed in monetary units.
- Purchased: A binary indicator (0 or 1) representing whether the individual made a purchase (1) or not (0).

The dataset can be utilized for predictive modeling to understand the factors influencing purchasing behavior, particularly focusing on how gender, age, and salary might correlate with purchasing decisions. This analysis could be beneficial for marketers looking to tailor their strategies to specific demographics based on these attributes.

In [194]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

In [255]:
df = pd.read_csv("Assets/Social_Network_Ads.csv")

In [11]:
df.head() #No need of User ID column

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [259]:
df.isnull().sum() #check if any value is null

User ID            0
Gender             0
Age                0
EstimatedSalary    0
Purchased          0
dtype: int64

In [13]:
df['Gender']

0        Male
1        Male
2      Female
3      Female
4        Male
        ...  
395    Female
396      Male
397    Female
398      Male
399    Female
Name: Gender, Length: 400, dtype: object

In [42]:
le = LabelEncoder()
Gender = le.fit_transform(df['Gender'])

In [44]:
Gender

array([1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,
       1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1,
       0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,
       1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
       1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
       0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1,
       0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0,

In [46]:
Gender = pd.DataFrame(data = Gender, columns = ['la_Gender'])

In [48]:
Gender

Unnamed: 0,la_Gender
0,1
1,1
2,0
3,0
4,1
...,...
395,0
396,1
397,0
398,1


In [51]:
df = pd.concat([df, Gender], axis=1)


In [59]:
df.drop(columns=['Gender'])

Unnamed: 0,User ID,Age,EstimatedSalary,Purchased,la_Gender
0,15624510,19,19000,0,1
1,15810944,35,20000,0,1
2,15668575,26,43000,0,0
3,15603246,27,57000,0,0
4,15804002,19,76000,0,1
...,...,...,...,...,...
395,15691863,46,41000,1,0
396,15706071,51,23000,1,1
397,15654296,50,20000,1,0
398,15755018,36,33000,0,1


In [65]:
columns = df[["Age","EstimatedSalary","la_Gender"]]

In [71]:
X = columns

In [75]:
X.shape

(400, 3)

In [83]:
y = df.iloc[:,-2]

In [85]:
y

0      0
1      0
2      0
3      0
4      0
      ..
395    1
396    1
397    1
398    0
399    1
Name: Purchased, Length: 400, dtype: int64

In [87]:
y.shape

(400,)

In [89]:
knn = KNeighborsClassifier()

In [91]:
knn.fit(X,y) #Firstly will pass the whole data to the model

In [141]:
df = pd.read_csv("Assets/predict_Social_Network_Ads.csv")

In [143]:
df

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased
0,Male,34,15000,
1,Male,23,30000,
2,Female,21,200000,
3,Male,22,10000,
4,Female,75,20000,
5,Female,16,150000,
6,Male,23,31000,


In [145]:
le_Gender = le.fit_transform(df['Gender'])

In [147]:
la_Gender = pd.DataFrame(le_Gender, columns = ['la_Gender'])

In [149]:
la_Gender

Unnamed: 0,la_Gender
0,1
1,1
2,0
3,1
4,0
5,0
6,1


In [151]:
df = pd.concat([df, la_Gender], axis=1)


In [153]:
df

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased,la_Gender
0,Male,34,15000,,1
1,Male,23,30000,,1
2,Female,21,200000,,0
3,Male,22,10000,,1
4,Female,75,20000,,0
5,Female,16,150000,,0
6,Male,23,31000,,1


In [155]:
df.drop(columns=["Gender"])

Unnamed: 0,Age,EstimatedSalary,Purchased,la_Gender
0,34,15000,,1
1,23,30000,,1
2,21,200000,,0
3,22,10000,,1
4,75,20000,,0
5,16,150000,,0
6,23,31000,,1


In [161]:
X_test = df[["Age","EstimatedSalary","la_Gender"]]

In [165]:
X_test.shape

(7, 3)

In [221]:
pred = knn.predict(X_test)

In [223]:
pred

array([0, 0, 1, 0, 0, 1, 0], dtype=int64)

In [225]:
pred = pred.flatten()

In [243]:
pred

array([0, 0, 1, 0, 0, 1, 0], dtype=int64)

In [245]:
input_file = 'Assets/predict_Social_Network_Ads.csv'

In [247]:
#Input file
df_input = pd.read_csv(input_file)
df_input

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased
0,Male,34,15000,
1,Male,23,30000,
2,Female,21,200000,
3,Male,22,10000,
4,Female,75,20000,
5,Female,16,150000,
6,Male,23,31000,


In [251]:
df_input['Purchased'] = pred
df_input

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased
0,Male,34,15000,0
1,Male,23,30000,0
2,Female,21,200000,1
3,Male,22,10000,0
4,Female,75,20000,0
5,Female,16,150000,1
6,Male,23,31000,0


In [253]:
df_input.to_csv(input_file,index = False)