# Matching Feature
* The purpose of this notebook is to design a matching feature for finding the most similar kindergartens to the inputted data from the user
* This feature was developed and deployed as one of many features for [Kiddy](https://github.com/MaysaM-M-Mousa/GraduationProject-Backend) graduation project
* In this feature, we are going to use the [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) as a base for measuring the similarity between kindergartens

## Table Of Content
* EDA
* Preprocessing
* Finding Similarities
* Evaluation & Testing
* Tring New Data Input

In [1]:
# importing necessary libraris

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics.pairwise import cosine_similarity

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import Normalizer

import json
import joblib

In [2]:
original_df = pd.read_csv('D:\Machine Data\GradProjectData\ML\kindergarten.csv')

In [3]:
df = original_df
df

Unnamed: 0,id,name,location_formatted,latitude,longitude,email,phone,country,city,website,about,createdAt,id.1,start_date,end_date,registration_expiration,name.1,tuition,createdAt.1,kindergartenId
0,2,Al- Aqsa Kindergarten,"803 Nablus, Palestinian Territory",31.952162,35.233154,Aqsa@gmail.com,594177742,Palestine,Nablus,aqsa.edu,Nice Kindergarten,9/26/2022 14:02,5,1/30/2023,2/28/2023,2/28/2023,2023 First,100,11/30/2022 18:13,2
1,9,Al-Makhfeya,"Palestine, Nablus",32.217492,35.236420,makhfeya@edu.com,45342189,Palestine,Nablus,www.jaberi.com,summary,12/1/2022 4:32,6,12/5/2022,3/15/2023,12/25/2022,2023 First,350,12/1/2022 4:33,9
2,8,Al-Jaberi Kindergarten,"Palestine, Nablus",32.221399,35.238845,jaberi@edu.com,123456789,Palestine,Nablus,www.jaberi.com,summary,12/1/2022 4:31,7,12/25/2022,4/17/2023,1/10/2023,2022-2023 First,320,12/1/2022 4:34,8
3,14,Ammany,"Jordann, Amman",31.934158,35.930048,ammany@edu.com,56497542,Jordan,Amman,www.ammany.com,summary,12/1/2022 4:43,8,12/25/2022,4/17/2023,1/10/2023,2022-2023 First,150,12/1/2022 4:46,14
4,13,Amman,"Jordan, Amman",31.899435,35.212263,amman@edu.com,56497542,Jordan,Amman,www.amman.com,summary,12/1/2022 4:40,9,1/25/2023,5/29/2023,2/10/2023,2023 First,150,12/1/2022 4:47,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
155,145,Ūdalah kindergarten,22348 6th Point,32.153630,35.276460,dseivertsen30@gmail.com,6589592200,Palestine,Ūdalah,,summary,3/16/2020 23:33,161,12/23/2022,4/23/2023,1/4/2023,2022 Semester,255,12/23/2022 0:00,145
156,122,Wādī as Salqā kindergarten,4 Stephen Road,31.396843,34.365268,sedmonds2d@gmail.com,8187070387,Palestine,Wādī as Salqā,,summary,12/14/2018 0:16,162,1/2/2023,5/2/2023,1/14/2023,2023 Semester,405,1/2/2023 0:00,122
157,175,Wādī Raḩḩāl kindergarten,79 Delladonna Court,31.665160,35.167270,wstoddart3u@gmail.com,3105268284,Palestine,Wādī Raḩḩāl,,summary,6/20/2018 12:10,163,12/18/2022,4/18/2023,12/30/2022,2022 Semester,230,12/18/2022 0:00,175
158,84,Yāsūf kindergarten,973 Laurel Alley,32.109371,35.239715,jpockey1b@gmail.com,7357913251,Palestine,Yāsūf,,summary,6/14/2021 21:27,164,12/12/2022,4/12/2023,12/24/2022,2022 Semester,263,12/12/2022 0:00,84


# EDA

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 160 entries, 0 to 159
Data columns (total 20 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       160 non-null    int64  
 1   name                     160 non-null    object 
 2   location_formatted       160 non-null    object 
 3   latitude                 160 non-null    float64
 4   longitude                160 non-null    float64
 5   email                    160 non-null    object 
 6   phone                    160 non-null    int64  
 7   country                  160 non-null    object 
 8   city                     160 non-null    object 
 9   website                  23 non-null     object 
 10  about                    160 non-null    object 
 11  createdAt                160 non-null    object 
 12  id.1                     160 non-null    int64  
 13  start_date               160 non-null    object 
 14  end_date                 1

In [5]:
df.describe()

Unnamed: 0,id,latitude,longitude,phone,id.1,tuition,kindergartenId
count,160.0,160.0,160.0,160.0,160.0,160.0,160.0
mean,99.6875,31.995637,35.197226,4675585000.0,85.4625,328.98125,99.6875
std,52.175706,0.296952,0.292795,3001171000.0,46.395197,109.078961,52.175706
min,2.0,31.25997,34.2826,12515420.0,5.0,100.0,2.0
25%,56.75,31.876157,35.072538,2454120000.0,45.75,222.5,56.75
50%,101.5,32.023256,35.197103,4483783000.0,85.5,343.5,101.5
75%,145.25,32.214261,35.258366,7122264000.0,125.25,422.75,145.25
max,186.0,32.54346,36.085255,9947318000.0,165.0,499.0,186.0


# Preprocessing

### 1. Dropping unnecessary columns
What are we going to depend on to find the most similar kindergartens to the input data from user are:
* `latitude` 
* `longitude`
* `country`
* `city`
* `tuition`

In [6]:
df.drop(columns=['name', 'email', 'phone', 'website', 'about', 'createdAt', 'location_formatted',
                 'id.1', 'name.1', 'createdAt.1','kindergartenId', 'start_date', 'end_date', 'registration_expiration'],
       inplace=True)
df

Unnamed: 0,id,latitude,longitude,country,city,tuition
0,2,31.952162,35.233154,Palestine,Nablus,100
1,9,32.217492,35.236420,Palestine,Nablus,350
2,8,32.221399,35.238845,Palestine,Nablus,320
3,14,31.934158,35.930048,Jordan,Amman,150
4,13,31.899435,35.212263,Jordan,Amman,150
...,...,...,...,...,...,...
155,145,32.153630,35.276460,Palestine,Ūdalah,255
156,122,31.396843,34.365268,Palestine,Wādī as Salqā,405
157,175,31.665160,35.167270,Palestine,Wādī Raḩḩāl,230
158,84,32.109371,35.239715,Palestine,Yāsūf,263


### 2. Defining One-Hot-Encoders for `City` and `Country` features

In [7]:
city_encoder = OneHotEncoder() 
country_encoder = OneHotEncoder()

##### 2.1 Fitting city encoder to cities in our dataset

In [8]:
city_encoder.fit(df[['city']])
city_encoder.categories_

[array(['Al Buq‘ah', 'Al Bīrah', 'Al Judayrah', 'Al Jīb', 'Al Karmil',
        'Al Lubban al Gharbī', 'Al Majd', 'Al Mazra‘ah ash Sharqīyah',
        'Al Midyah', 'Al Mughayyir', 'Al Qarārah', 'Al ‘Awjā',
        'Al ‘Ayzarīyah', 'Amman', 'An Naşr', 'An Naşşārīyah',
        'Ash Shuhadā’', 'Ash Shuyūkh', 'Azun Atme', 'Aţ Ţaybah', 'Baghdad',
        'Bardalah', 'Bayt Maqdūm', 'Bayt Ta‘mar', 'Bayt Ūmmar',
        'Bayt ‘Īnūn', 'Bayt ‘Ūr at Taḩtā', 'Baytā al Fawqā', 'Bazzāryah',
        'Bethlehem', 'Bil‘īn', 'Burqah', 'Bīr Nabālā', 'Ciro',
        'Dayr Sāmit', 'Dayr al Ghuşūn', 'Dūrā al Qar‘', 'Far‘ūn', 'Faḩmah',
        'Idhnā', 'Iktābah', 'Immātīn', 'Jaba‘', 'Jifnā', 'Juḩr ad Dīk',
        'Jīt', 'Jūrat ash Sham‘ah', 'Jūrīsh', 'Kafr Dān', 'Kafr Thulth',
        'Kafr Zībād', 'Kafr ad Dīk', 'Kafr Şūr', 'Karney Shomron',
        'Khallat Şāliḩ', 'Khursā', 'Majdal Banī Fāḑil', 'Mardā',
        'Maythalūn', 'Mislīyah', 'Nablus', 'Naḩḩālīn', 'Qaffīn',
        'Qalandiyā', 'Qalqīlyah', 'Qar

##### 2.2 Transforming cities to OHE vectors

In [9]:
encoded_cities = city_encoder.transform(df[['city']]).toarray()
encoded_cities.shape

(160, 98)

In [10]:
encoded_cities

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

##### 2.3 Fitting city encoder to cities in our dataset

In [11]:
country_encoder.fit(df[['country']])
country_encoder.categories_

[array(['Egypt', 'Iran', 'Iraq', 'Jordan', 'Lebanon', 'Libya', 'Oman',
        'Palestine', 'Saudi Arabia', 'Sudan'], dtype=object)]

##### 2.4 Transforming countries to OHE vectors

In [12]:
encoded_countries = country_encoder.transform(df[['country']]).toarray()
encoded_countries.shape

(160, 10)

In [13]:
encoded_countries

array([[0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.]])

### 3. Concatenating cities and countries OHE vectors to the dataframe

In [14]:
df = pd.concat([df, pd.DataFrame(encoded_cities, columns=city_encoder.categories_)], axis=1)
df = pd.concat([df, pd.DataFrame(encoded_countries, columns=country_encoder.categories_)], axis=1)

df.drop(columns=['country', 'city'], inplace=True)
df

Unnamed: 0,id,latitude,longitude,tuition,"(Al Buq‘ah,)","(Al Bīrah,)","(Al Judayrah,)","(Al Jīb,)","(Al Karmil,)","(Al Lubban al Gharbī,)",...,"(Egypt,)","(Iran,)","(Iraq,)","(Jordan,)","(Lebanon,)","(Libya,)","(Oman,)","(Palestine,)","(Saudi Arabia,)","(Sudan,)"
0,2,31.952162,35.233154,100,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,9,32.217492,35.236420,350,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2,8,32.221399,35.238845,320,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,14,31.934158,35.930048,150,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,13,31.899435,35.212263,150,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
155,145,32.153630,35.276460,255,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
156,122,31.396843,34.365268,405,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
157,175,31.665160,35.167270,230,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
158,84,32.109371,35.239715,263,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


### 4. Normalizing our features

In [15]:
normalizer = Normalizer().fit(df.drop(columns=['id']))
normalizer



In [16]:
normalized_data = normalizer.transform(df.drop(columns=['id']))
normalized_data



array([[0.28852177, 0.31814849, 0.90298043, ..., 0.0090298 , 0.        ,
        0.        ],
       [0.09120455, 0.09975084, 0.99081556, ..., 0.0028309 , 0.        ,
        0.        ],
       [0.09958832, 0.10891449, 0.98904029, ..., 0.00309075, 0.        ,
        0.        ],
       ...,
       [0.13484744, 0.14976133, 0.97946487, ..., 0.00425854, 0.        ,
        0.        ],
       [0.12012942, 0.13184084, 0.98395067, ..., 0.00374126, 0.        ,
        0.        ],
       [0.08431098, 0.09125224, 0.99224557, ..., 0.00259072, 0.        ,
        0.        ]])

In [17]:
normalized_data.shape

(160, 111)

In [18]:
normalized_df = pd.DataFrame(normalized_data)
normalized_df = normalized_df.set_index(df['id'])
normalized_df

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,101,102,103,104,105,106,107,108,109,110
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,0.288522,0.318148,0.902980,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.009030,0.0,0.0
9,0.091205,0.099751,0.990816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.002831,0.0,0.0
8,0.099588,0.108914,0.989040,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.003091,0.0,0.0
14,0.202730,0.228097,0.952256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.006348,0.0,0.0,0.0,0.000000,0.0,0.0
13,0.202727,0.223781,0.953280,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.006355,0.0,0.0,0.0,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,0.123938,0.135975,0.982914,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.003855,0.0,0.0
122,0.077016,0.084297,0.993454,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.002453,0.0,0.0
175,0.134847,0.149761,0.979465,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.004259,0.0,0.0
84,0.120129,0.131841,0.983951,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.003741,0.0,0.0


# Finding Similarity

In [19]:
similarities = pd.DataFrame(cosine_similarity(normalized_data))
similarities = similarities.set_index(df['id'])
similarities.columns = df['id']
similarities

id,2,9,8,14,13,12,59,101,116,62,...,165,33,174,87,112,145,122,175,84,91
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,1.000000,0.952788,0.956524,0.990929,0.990480,0.991145,0.943195,0.942176,0.987336,0.990165,...,0.942129,0.940526,0.962012,0.947727,0.961840,0.966606,0.946131,0.971029,0.965127,0.949359
9,0.952788,1.000000,0.999921,0.984753,0.985337,0.984562,0.999540,0.999444,0.988826,0.985811,...,0.999439,0.999265,0.999475,0.999861,0.999495,0.998765,0.999769,0.997719,0.999032,0.999932
8,0.956524,0.999921,1.000000,0.986852,0.987395,0.986678,0.999083,0.998948,0.990616,0.987838,...,0.998942,0.998708,0.999798,0.999579,0.999810,0.999306,0.999424,0.998484,0.999501,0.999714
14,0.990929,0.984753,0.986852,1.000000,0.999990,0.999919,0.979085,0.978460,0.999604,0.999894,...,0.978429,0.977456,0.989813,0.981819,0.989723,0.992128,0.980864,0.994199,0.991399,0.982779
13,0.990480,0.985337,0.987395,0.999990,1.000000,0.999905,0.979767,0.979153,0.999689,0.999916,...,0.979124,0.978160,0.990291,0.982454,0.990203,0.992547,0.981517,0.994555,0.991838,0.983401
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,0.966606,0.998765,0.999306,0.992128,0.992547,0.992002,0.996825,0.996578,0.994978,0.992896,...,0.996566,0.996158,0.999836,0.997834,0.999824,1.000000,0.997497,0.999823,0.999969,0.998160
122,0.946131,0.999769,0.999424,0.980864,0.981517,0.980646,0.999954,0.999923,0.985456,0.982045,...,0.999921,0.999847,0.998574,0.999981,0.998607,0.997497,1.000000,0.996073,0.997884,0.999942
175,0.971029,0.997719,0.998484,0.994199,0.994555,0.994097,0.995242,0.994941,0.996601,0.994858,...,0.994926,0.994436,0.999363,0.996500,0.999340,0.999823,0.996073,1.000000,0.999705,0.996916
84,0.965127,0.999032,0.999501,0.991399,0.991838,0.991266,0.997264,0.997034,0.994391,0.992201,...,0.997023,0.996642,0.999919,0.998194,0.999911,0.999969,0.997884,0.999705,1.000000,0.998489


# Evaluation & Testing Results

In [20]:
def find_most_similar(kindergartenId, topN):
    target = list(zip(df['id'], similarities[kindergartenId]))
    sorted_target = sorted(target, key=lambda x: x[1], reverse=True)
    list_of_ids = [i[0] for i in sorted_target[:topN]]
    return original_df[original_df['id'].isin(list_of_ids)][['id','latitude', 'longitude', 'country', 'city', 'tuition']]

In [21]:
find_most_similar(67, 10)

Unnamed: 0,id,latitude,longitude,country,city,tuition
12,67,31.850989,35.183103,Palestine,Al Jīb,190
46,138,31.557252,34.979243,Palestine,Idhnā,200
56,38,32.18966,34.97063,Palestine,Qalqīlyah,195
60,53,32.358901,35.246147,Palestine,Şānūr,186
64,74,32.353463,35.074453,Palestine,Dayr al Ghuşūn,197
70,168,32.16523,34.97748,Palestine,Ḩablah,198
74,20,31.899435,35.212263,Iraq,Ciro,190
104,71,31.963463,35.215092,Palestine,Jifnā,197
117,58,32.029973,35.018973,Palestine,Rantīs,184
131,152,32.421701,35.386433,Palestine,Al Mughayyir,184


# Trying New Data Input (Pipeline)

In [22]:
def find_top_N_similar(user_input, topN):
    # storing the user input in a dataframe for the easiness of processing
    user_input_df = pd.DataFrame(user_input, index=[0])
    
    # encoding city and country columns in user_input
    user_input_city_encoded  = city_encoder.transform(user_input_df[['city']]).toarray()
    user_input_country_encoded = country_encoder.transform(user_input_df[['country']]).toarray()
    
    # concatenating encoded city and country to user_input dataframe
    user_input_df = pd.concat([user_input_df, pd.DataFrame(user_input_city_encoded, columns=city_encoder.categories_)], axis=1)
    user_input_df = pd.concat([user_input_df, pd.DataFrame(user_input_country_encoded, columns=country_encoder.categories_)], axis=1)

    # dropping old city and country columns
    user_input_df.drop(columns=['city', 'country'], inplace=True)
    
    # scaling user_input 
    user_input_scaled = normalizer.transform(user_input_df)
    
    # finding similarity between all pre-scaled kindergartens and the processes user_input df
    similarity = pd.DataFrame(cosine_similarity(user_input_scaled, normalized_df).reshape(-1, 1), columns=['similarity'])
    similarity['id'] = df['id']
    
    # sorting the result in descending order according to the similarity
    similarity = similarity.sort_values('similarity', ascending=False)
    
    # getting the Ids of the top N similar kindergarten to the user_input df
    ids_to_return = similarity['id'][:topN].values
    
    return original_df[original_df['id'].isin(ids_to_return)][['id','latitude', 'longitude', 'country', 'city', 'tuition']]

In [23]:
user_input = {
    'latitude': 32.2458139,
    'longitude': 35.227928,
    'country': 'Palestine',
    'city': 'Nablus',
    'tuition': 330
}

In [24]:
find_top_N_similar(user_input, 10)

  ids_to_return = similarity['id'][:topN].values


Unnamed: 0,id,latitude,longitude,country,city,tuition
2,8,32.221399,35.238845,Palestine,Nablus,320
29,181,31.851182,35.200835,Palestine,Bīr Nabālā,327
42,88,31.89609,35.08178,Palestine,Bayt ‘Ūr at Taḩtā,329
45,45,31.896059,35.254768,Palestine,Burqah,327
58,83,32.12214,35.17173,Palestine,Qīrah,325
68,129,32.38291,35.17912,Palestine,Faḩmah,337
76,164,32.213184,35.170689,Palestine,Jīt,337
82,5,32.04,35.98,Palestine,Nablus,340
102,184,31.896059,35.254768,Palestine,Burqah,333
112,161,32.38613,35.2878,Palestine,Mislīyah,335


# Saving Models

##### Saving city encoder

In [25]:
model_name = 'city_encoder.sav'
joblib.dump(city_encoder, model_name)

['city_encoder.sav']

##### Saving country encoder

In [26]:
model_name = 'country_encoder.sav'
joblib.dump(country_encoder, model_name)

['country_encoder.sav']

##### Saving scaler 

In [27]:
model_name = 'normalizer.sav'
joblib.dump(normalizer, model_name)

['normalizer.sav']

##### Saving scaled dataframe

In [28]:
normalized_df.to_csv('normalized_df.csv')

#### Coded by Maysam M. Mousa