# Customer Segmentation 

Customer segmentation is an exteremely popular technique used in customer analytics. It aims to dispers customers into groups with distinct behaviours so that they can be treated as a class. This is an extremely powerful concept. 

This treatment can lead to better customer engagement, experience and lower expenditure to the business.

In [1]:
import logging
from datetime import datetime
import pandas as pd
import numpy as np
import tensorflow as tf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from scipy import stats

init_notebook_mode(connected=True)
np.set_printoptions(suppress=True)

logger = logging.getLogger('tensorflow')
logger.setLevel(logging.DEBUG)

In [2]:
city = pd.read_csv("data/city.csv", names = ['CityID', 'CityName', 'Zipcode', 'CountryID'], header = 0)
country = pd.read_csv("data/country.csv" , names = ['CountryID', 'CountryName', 'CountryCode'], header = 0)
customer = pd.read_csv("data/customer.csv", names = ['CustomerID', 'FirstName', 'MiddleInitial', 'LastName',
       'CityID', 'Address'], header = 0)
product = pd.read_csv("data/product.csv", names = ['ProductID', 'ProductName', 'Price', 'CategoryID', 'Class',
       'ModifyDate', 'Resistant', 'IsAllergic', 'VitalityDays'], header = 0)
product_category = pd.read_csv("data/product_category.csv", names = ['CategoryID', 'CategoryName'], header = 0)
staff = pd.read_csv("data/staff.csv", names = ['EmployeeID', 'FirstName', 'MiddleInitial', 'LastName',
       'BirthDate', 'Gender', 'CityID', 'HireDate'], header = 0)
transaction = pd.read_csv("data/transaction.csv", names = ['SalesID', 'SalesPersonID', 'CustomerID', 'ProductID',
       'Quantity', 'Discount', 'TotalPrice', 'SalesDate',
       'TransactionNumber'], header = 0)

In [3]:
city.head()

Unnamed: 0,CityID,CityName,Zipcode,CountryID
0,1,Dayton,80563,32
1,2,Buffalo,17420,32
2,3,Chicago,44751,32
3,4,Fremont,20641,32
4,5,Virginia Beach,62389,32


In [4]:
country.head()

Unnamed: 0,CountryID,CountryName,CountryCode
0,1,Armenia,AN
1,2,Canada,FO
2,3,Belize,MK
3,4,Uganda,LV
4,5,Thailand,VI


In [5]:
customer.head()

Unnamed: 0,CustomerID,FirstName,MiddleInitial,LastName,CityID,Address
0,1,Stefanie,Y,Frye,79,97 Oak Avenue
1,2,Sandy,T,Kirby,96,52 White First Freeway
2,3,Lee,T,Zhang,55,921 White Fabien Avenue
3,4,Regina,S,Avery,40,75 Old Avenue
4,5,Daniel,S,Mccann,2,283 South Green Hague Avenue


In [6]:
product.head()

Unnamed: 0,ProductID,ProductName,Price,CategoryID,Class,ModifyDate,Resistant,IsAllergic,VitalityDays
0,1,Flour - Whole Wheat,742988,3,Medium,2018-02-16 08:21:49.190,Durable,,
1,2,Cookie Chocolate Chip With,912329,3,Medium,2017-02-12 11:39:10.970,,,
2,3,Onions - Cippolini,91379,9,Medium,2018-03-15 08:11:51.560,Weak,False,111.0
3,4,"Sauce - Gravy, Au Jus, Mix",543055,9,Medium,2017-07-16 00:46:28.880,Durable,,
4,5,Artichokes - Jerusalem,654771,2,Low,2017-08-16 14:13:35.430,Durable,True,27.0


In [7]:
product_category.head()

Unnamed: 0,CategoryID,CategoryName
0,1,Confections
1,2,Shell fish
2,3,Cereals
3,4,Dairy
4,5,Beverages


In [12]:
transaction.head()

Unnamed: 0,SalesID,SalesPersonID,CustomerID,ProductID,Quantity,Discount,TotalPrice,SalesDate,TransactionNumber
0,1,6,27039,381,7,,0,2018-02-05 07:38:25.430,FQL4S94E4ME1EZFTG42G
1,2,16,25011,61,7,,0,2018-02-02 16:03:31.150,12UGLX40DJ1A5DTFBHB8
2,3,13,94024,23,24,,0,2018-05-03 19:31:56.880,5DT8RCPL87KI5EORO7B0
3,4,8,73966,176,19,0.2,0,2018-04-07 14:43:55.420,R3DR9MLD5NR76VO17ULE
4,5,10,32653,310,9,,0,2018-02-12 15:37:03.940,4BGS0Z5OMAZ8NDAFHHP3


In [9]:
staff.head()

Unnamed: 0,EmployeeID,FirstName,MiddleInitial,LastName,BirthDate,Gender,CityID,HireDate
0,1,Nicole,T,Fuller,1981-03-07 00:00:00.000,F,80,2011-06-20 07:15:36.920
1,2,Christine,W,Palmer,1968-01-25 00:00:00.000,F,4,2011-04-27 04:07:56.930
2,3,Pablo,Y,Cline,1963-02-09 00:00:00.000,M,70,2012-03-30 18:55:23.270
3,4,Darnell,O,Nielsen,1989-02-06 00:00:00.000,M,39,2014-03-06 06:55:02.780
4,5,Desiree,L,Stuart,1963-05-03 00:00:00.000,F,23,2014-11-16 22:59:54.720


## Building the feature vector

In order to categorise the customers, we need to build a semantic feature vector. This is essentially saying over what context do you want to build your clusters over? Geographic, purchase behaviours, etc.

Given that this is a made up example, let's build a feature vector which involves all the information we have. This may not be the right thing to do, and design decision such as these should be discussed in detail with the team whose interested in building these clusters. 

Start from the transaction dataset, these features will hopefully corresponding the purchasing behaviour. Then we can join the customer data to the transactions.

## Please describe how you would engage with the business to clarify requirements.
