**Step 1: Importing the libraries**

In [82]:
import pandas as pd
import seaborn as sns

**Step 2: Importing dataset**

In [83]:
df = pd.read_csv("Data.csv")
df

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,,Yes
5,France,35.0,58000.0,Yes
6,Spain,,52000.0,No
7,France,48.0,79000.0,Yes
8,Germany,50.0,83000.0,No
9,France,37.0,67000.0,Yes


In [84]:
df.dtypes

Country       object
Age          float64
Salary       float64
Purchased     object
dtype: object

**Step 3: Handling the missing data**

In [85]:
df.isnull().sum()

Country      0
Age          1
Salary       1
Purchased    0
dtype: int64

In [86]:
df['Age'] = df['Age'].fillna(df['Age'].mean())

In [87]:
df['Salary'] = df['Salary'].fillna(df['Salary'].mean())

In [88]:
df.isnull().sum()

Country      0
Age          0
Salary       0
Purchased    0
dtype: int64

In [89]:
df['Purchased'] = df['Purchased'].astype("str")
df['Age'] = df['Age'].astype("int64")
df['Salary'] = df['Salary'].astype("int64")

In [90]:
df.dtypes

Country      object
Age           int64
Salary        int64
Purchased    object
dtype: object

**Step 5: Creating a dummy variable**

In [93]:
dff = pd.get_dummies(df)
dff

Unnamed: 0,Age,Salary,Country_France,Country_Germany,Country_Spain,Purchased_No,Purchased_Yes
0,44,72000,1,0,0,1,0
1,27,48000,0,0,1,0,1
2,30,54000,0,1,0,1,0
3,38,61000,0,0,1,1,0
4,40,63777,0,1,0,0,1
5,35,58000,1,0,0,0,1
6,38,52000,0,0,1,1,0
7,48,79000,1,0,0,0,1
8,50,83000,0,1,0,1,0
9,37,67000,1,0,0,0,1


**Step 4: Encoding categorical data**

In [91]:
df['Country'].unique()

array(['France', 'Spain', 'Germany'], dtype=object)

In [92]:
df['Purchased'].unique()

array(['No', 'Yes'], dtype=object)

In [94]:
df['Country'] = df['Country'].map({'France':0, 'Spain':1, 'Germany':2})
df['Purchased'] = df['Purchased'].map({'Yes':1, 'No':0})

In [95]:
df

Unnamed: 0,Country,Age,Salary,Purchased
0,0,44,72000,0
1,1,27,48000,1
2,2,30,54000,0
3,1,38,61000,0
4,2,40,63777,1
5,0,35,58000,1
6,1,38,52000,0
7,0,48,79000,1
8,2,50,83000,0
9,0,37,67000,1


There are two ways to encode the data. 
1. I have used get dummies to encode and stored it in a different variable.
2. I have used map function to encode as the values are very less and using the same to split the data

**Step 6: Splitting the datasets into training sets and Test sets**

In [96]:
x = df[['Age', 'Salary', 'Country']].values
y = df['Purchased']

**Step 7: Feature Scaling**

In [100]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0)

In [101]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(x_train, y_train)
x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)