# Customer Intelligence

Customer intelligence is a quite a large domain, and is studied extensively in both industry and academia. The whole field in general can often find one need at it's inception.

How can we understand our customer better?

Often Data Science teams are faced with this question when working in industry, and many teams tackle the problem in a wide variety of ways. However, it is important to note that this question is almost impossible to practically answer due to two reasons.

* What does *better* mean?
What metric tells you better? Is it measurable? Is the *better* consistent throughout the organisation? Is it about sales? Perhaps it's about how your brand is perceived? Or maybe it's about developing more suitable products?
* Better is a rolling goalpost.
Understanding customers better is always a moving target. You can always understand better, you can always look at the customer cake in different angles.

Enought theorising, let's answer some questions.

## 1. What are the possible scenarios in which the data could be used?

Even without diving into the data, we can generate scenarios which this data may come in handy. Let's discuss some of these, prior to deep diving with some exploratory data analysis to really demonstrate **how** it is done.

### Business intelligence

This type of data, is most often used for development of BI metrics. The primary focus of BI is to give a better situational awareness for operational teams. Much of the data in question here would be usable for all levels of the business. For example, sales metrics are important at the highest level of business to determine business strategies. 

Business intelligence should give an instant snapshot into the current state of your business. Let's define some potential BI metrics that a e-commerce retailer may derive from the data feed given. 

1. Sales variance (YoY, QoQ, or various other facets)
2. Product demand (Which proucts are hottest, again multiple facets depending on need)
3. Cagetory demand (Similar to above)

### QA and Analytics

Question answering usually using descriptive analytics is the most common usage of date warehouses. After all they're placed in OLAP databases so you can do, well, **analytics**. From my previous experience being in the "data science" teams of non-data businesses, QA analytics seems to get the most attention. 

Analytics takes BI one step further. While BI is about seeing the state of your business at any given time, analytics is about helping the decision making process. Let's see what some of the questions that may arise in a e-commerce retailer and how the given data may be used to aid the decision making process.

#### Scenario: Add campaign
Add campaigns have become both easier and harder to carry out. They are easier in the sense that the delivery medium has been somewhat started to become dominated by social media platforms which has been the traditional how to split the budget has been less important. 

1. What are the possible scenarios in which the data could be used?
2. Use one of these scenarios to develop a model of your choice, using the tools of your choice.
    * Please describe how you would engage with the business to clarify requirements.
    * Show thinking behind setting up data model and how this is designed.
    * Prepare a presentation of your findings to present to the business stakeholders


In [20]:
import os
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from datetime import datetime
import plotly.graph_objs as go
from scipy import stats

init_notebook_mode(connected=True)

In [2]:
city = pd.read_csv("data/city.csv")
country = pd.read_csv("data/country.csv")
customer = pd.read_csv("data/customer.csv")
product = pd.read_csv("data/product.csv")
product_category = pd.read_csv("data/product_category.csv")
staff = pd.read_csv("data/staff.csv")
transaction = pd.read_csv("data/transaction.csv").rename(columns={'\ufeffSalesID':'SalesID'}) # Column rename.

In [3]:
print(city.shape
      , country.shape
      , customer.shape
      , product.shape
      , product_category.shape
      , staff.shape
      , transaction.shape)

(96, 4) (206, 3) (98759, 6) (452, 9) (11, 2) (23, 8) (1000000, 9)


In [4]:
transaction.head()

Unnamed: 0,SalesID,SalesPersonID,CustomerID,ProductID,Quantity,Discount,TotalPrice,SalesDate,TransactionNumber
0,1,6,27039,381,7,,0,2018-02-05 07:38:25.430,FQL4S94E4ME1EZFTG42G
1,2,16,25011,61,7,,0,2018-02-02 16:03:31.150,12UGLX40DJ1A5DTFBHB8
2,3,13,94024,23,24,,0,2018-05-03 19:31:56.880,5DT8RCPL87KI5EORO7B0
3,4,8,73966,176,19,0.2,0,2018-04-07 14:43:55.420,R3DR9MLD5NR76VO17ULE
4,5,10,32653,310,9,,0,2018-02-12 15:37:03.940,4BGS0Z5OMAZ8NDAFHHP3


#### Business intelligence and operations
In many cases, sales and transaction data help to build a clear picture of state of the business. For many teams this means descriptive statistics and operational metrics. 

In [78]:
# Convert type and then to date
transaction['SalesDate'] = pd.to_datetime(transaction['SalesDate'], errors = 'coerce')
transaction['Date'] =  transaction['SalesDate'].dt.date
transaction['DayOfWeek'] =  pd.to_datetime(transaction['Date']).dt.weekday
transaction['Day'] =  pd.to_datetime(transaction['Date']).dt.day_name()

In [79]:
transaction.head()

Unnamed: 0,SalesID,SalesPersonID,CustomerID,ProductID,Quantity,Discount,TotalPrice,SalesDate,TransactionNumber,Date,DayOfWeek,tx,Day
0,1,6,27039,381,7,,0,2018-02-05 07:38:25.430,FQL4S94E4ME1EZFTG42G,2018-02-05,0.0,1,Monday
1,2,16,25011,61,7,,0,2018-02-02 16:03:31.150,12UGLX40DJ1A5DTFBHB8,2018-02-02,4.0,1,Friday
2,3,13,94024,23,24,,0,2018-05-03 19:31:56.880,5DT8RCPL87KI5EORO7B0,2018-05-03,3.0,1,Thursday
3,4,8,73966,176,19,0.2,0,2018-04-07 14:43:55.420,R3DR9MLD5NR76VO17ULE,2018-04-07,5.0,1,Saturday
4,5,10,32653,310,9,,0,2018-02-12 15:37:03.940,4BGS0Z5OMAZ8NDAFHHP3,2018-02-12,0.0,1,Monday


In [21]:
# Aggregate sales by day
tx_by_day = transaction.groupby(['Date']).count().reset_index()
# Day of week
tx_by_day['Date'] = pd.to_datetime(tx_by_day['Date'])
tx_by_day.head()

Unnamed: 0,Date,SalesID,SalesPersonID,CustomerID,ProductID,Quantity,Discount,TotalPrice,SalesDate,TransactionNumber,DayOfWeek
0,2018-01-01,7755,7755,7755,7755,7755,1435,7755,7755,7755,7755
1,2018-01-02,7674,7674,7674,7674,7674,1570,7674,7674,7674,7674
2,2018-01-03,7575,7575,7575,7575,7575,1526,7575,7575,7575,7575
3,2018-01-04,7646,7646,7646,7646,7646,1455,7646,7646,7646,7646
4,2018-01-05,7876,7876,7876,7876,7876,1510,7876,7876,7876,7876


In [22]:
# Plot time series.
iplot({'data':[go.Scatter(x = tx_by_day['Date'], y = tx_by_day['SalesID'])] , 'layout': {'title':'Sales',}} )

Looks like there isn't a strong seasonality within the quarter. Let's do some more digging.

In [81]:
# Average sales by day of week.
tx_by_dow = transaction.groupby(['DayOfWeek','Day']).count().reset_index()
tx_by_dow = tx_by_dow.sort_values(by=['DayOfWeek'])
tx_by_dow.head()

Unnamed: 0,DayOfWeek,Day,SalesID,SalesPersonID,CustomerID,ProductID,Quantity,Discount,TotalPrice,SalesDate,TransactionNumber,Date,tx
0,0.0,Monday,145736,145736,145736,145736,145736,29249,145736,145736,145736,145736,145736
1,1.0,Tuesday,145816,145816,145816,145816,145816,29362,145816,145816,145816,145816,145816
2,2.0,Wednesday,145609,145609,145609,145609,145609,28943,145609,145609,145609,145609,145609
3,3.0,Thursday,138513,138513,138513,138513,138513,27638,138513,138513,138513,138513,138513
4,4.0,Friday,138677,138677,138677,138677,138677,27735,138677,138677,138677,138677,138677


In [83]:
iplot({'data':[go.Bar(x = tx_by_dow['Day'], y = tx_by_dow['SalesID'])] , 'layout': {'title':'Sales by day of week',}} )

Looks like sales are lower during the later weeks. Let's have a look at the time that sales are made.