<a href="https://colab.research.google.com/github/Jacquedelest/Shopping-Customer-in-some-Cities/blob/master/Data_Wrangling_and_Normalization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Wrangling with Pandas**

Import libraries

In [79]:
import pandas as pd

Read dataset with Pandas

In [80]:

shopping_customer = pd.read_csv("Shopping_CustomerData.csv")
print(shopping_customer.head())

   CustomerID CustomerGender  ...  SpendingScore CustomerCityID
0        1001           Male  ...             78              1
1        1002           Male  ...             63              1
2        1003         Female  ...             69              4
3        1004         Female  ...             30              1
4        1005         Female  ...              6              1

[5 rows x 8 columns]


Check information about dataset

In [81]:
print(shopping_customer.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   CustomerID      200 non-null    int64  
 1   CustomerGender  200 non-null    object 
 2   CustomerAge     200 non-null    int64  
 3   CustomerCity    200 non-null    object 
 4   AnnualIncome    200 non-null    float64
 5   CreditScore     200 non-null    int64  
 6   SpendingScore   200 non-null    int64  
 7   CustomerCityID  200 non-null    int64  
dtypes: float64(1), int64(5), object(2)
memory usage: 12.6+ KB
None


Check Missing Value

In [82]:
print(shopping_customer.isnull().sum())

CustomerID        0
CustomerGender    0
CustomerAge       0
CustomerCity      0
AnnualIncome      0
CreditScore       0
SpendingScore     0
CustomerCityID    0
dtype: int64


Statistical summary of dataframe

In [83]:
print(shopping_customer.describe(exclude=['O']))

        CustomerID  CustomerAge  ...  SpendingScore  CustomerCityID
count   200.000000   200.000000  ...      200.00000      200.000000
mean   1100.500000    45.520000  ...       50.70500        2.850000
std      57.879185    16.113592  ...       28.72269        1.475938
min    1001.000000    18.000000  ...        2.00000        1.000000
25%    1050.750000    31.750000  ...       27.75000        1.750000
50%    1100.500000    46.500000  ...       48.00000        3.000000
75%    1150.250000    59.000000  ...       77.00000        4.000000
max    1200.000000    75.000000  ...      100.00000        5.000000

[8 rows x 6 columns]


In [84]:
print(shopping_customer.describe(include=['O']))

       CustomerGender CustomerCity
count             200          200
unique              2            5
top            Female    Bengaluru
freq              112           50


# **Data normalization with MinMax**

In [85]:
import numpy as np
from sklearn import preprocessing

In [86]:
array = shopping_customer.values
X = array[:,4:8] #separate features from dataset. 
Y = array[:,0:3] #separate class from dataset

shopping_customer = pd.DataFrame(
    {
    'Customer ID' : array[:,0],
    'Customer Gender' : array[:,1],
    'Customer City ID' : array[:,2],
    'Customer City' : array[:,3],
    'Customer Age' : array[:,4],
    'Annual Income' : array[:,5],
    'Credit Score' : array[:,6],
    'Spending Score' : array[:,7]    
    }
    )
print("dataset before normalization :")
print(shopping_customer.head(10))

min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0,1)) #initialization MinMax
data = min_max_scaler.fit_transform(X) #transformation MinMax for feature
shopping_customer = pd.DataFrame(
    {
    'Customer Age' : data[:,0],
    'Annual Income' : data[:,1],
    'Credit Score' : data[:,2],
    'Spending Score' : data[:,3],
    'Customer ID' : array[:,0],
    'Customer Gender' : array[:,1],
    'Customer City ID' : array[:,2],
    'Customer City' : array[:,3]
    }
    )

print("dataset after normalization :")
print(shopping_customer.head(10))

dataset before normalization :
  Customer ID Customer Gender  ... Credit Score Spending Score
0        1001            Male  ...           78              1
1        1002            Male  ...           63              1
2        1003          Female  ...           69              4
3        1004          Female  ...           30              1
4        1005          Female  ...            6              1
5        1006          Female  ...           97              2
6        1007          Female  ...            2              2
7        1008          Female  ...           77              5
8        1009            Male  ...           22              2
9        1010          Female  ...           97              3

[10 rows x 8 columns]
dataset after normalization :
   Customer Age  Annual Income  ...  Customer City ID  Customer City
0      0.757719       0.294798  ...                49      Bengaluru
1      0.295262       0.228324  ...                59      Bengaluru
2      0.233602 

# **Shopping Index**

In [87]:
shopping_index = pd.read_csv("Shopping_ShoppingIndexData.csv")
print(shopping_index.head())

   Bengaluru  Chennai  Kolkata  Delhi  Mumbai  ShoppingIndex
0        180      159      204    134     339             92
1        121      336      419    419     346             90
2        153      242      399    378     107             88
3        163      156      186    364     419             87
4        404      115      133    426     353             85
