Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

# Predicting the number of your customers

## Introduction

Suppose you have a shop. Every month you count the total number of your customers who bought from your shop. To simplify, we think of people buying one unit from your shop or nothing. 


Your customers are of two types:
* some are "returning" customers (i.e. they also bought last month) and 
* the others are new (did not buy last month)

So we will not worry about customers who, say, bought two months ago but not last month.

In this notebook, we first generate the data ourselves (so that you fully understand the structure of the data) and then we analyze this data.

## Importing libraries

We first import the libraries that we need.

In [None]:
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline

## Model

We will generate the data with the following model.

Let $x_t$ denote the logarithm of the number of customers in period $t$. We assume that $x_t$ evolves over time as follows:

$$
x_t = \rho x_{t-1} + u_t
$$

where $u_t$ is normally distributed with mean $\mu_u \geq 0$ and standard deviation $\sigma_u \geq 0$ and $\rho \in [0,1]$. Hence, there is a fraction $\rho$ of previous period customers that return to buy this period and there is an inflow $u_t$ of new customers in period $t$.


-------------

**Exercise**

What is an advantage of defining $x_t$ as the logarithm of the number of customers? [hint: what would you need to "worry" about if $x_t$ denotes the number of customers?]

--------------

YOUR ANSWER HERE

We assume that both $\mu$ and $\rho$ are a function of the price that is charged. If you charge a high price, fewer customers will return and also the inflow of new customers will be lower.

We generate data for 24 months. The code below starts from today and generates dates with a Monthly frequency back into the past years. We define how $\rho$ and $\mu$ depend on the price $p$ that the shop charges in a period.

Finally, we create a pandas dataframe `df_customers` with this data. 

In [None]:
number_of_periods = 24
todays_date = datetime.datetime.now().date()
index = pd.date_range(end = todays_date, periods=number_of_periods, freq='M')
columns = ['log number of customers']

def mu(m,p):
    return m*(1-p)

def rho(r,p):
    return r*(1-p)

sigma = 1.0

p_0 = [0.2]
p_1 = [0.6]
period_0 = 12
period_1 = number_of_periods+1-period_0
vector_p = period_0*p_0 + period_1*p_1

x_0 = 10
x = []
x.append(x_0)
for t in range(1,number_of_periods+1):
    u = np.random.normal(mu(2,vector_p[t]), sigma)
    x.append(rho(0.5,vector_p[t])*x[t-1]+u)
    
df_customers = pd.DataFrame(x[1:], index=index, columns=columns)

---------

**Exercise**

Explain what the code above does:

* what is `vector_p` and how is it generated?
* what is the type of `x`? How is this vector generated?
* show what the first couple of rows of the `df_customers` look like.

YOUR ANSWER HERE

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

-----------

**Exercise**

Calculate the number of customers in each of the periods. [hint: you may want to check numpy's `exp` function]

-----------

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

-----------

**Exercise**

Calculate the average number of customers over the period of 24 months. [hint: check the datacamp course on pandas or google "python pandas average" to see how to calculate an average in a dataframe]

-----------

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

On the basis of the mean, you might conclude that you should expect around 30 customers on average (this can be different in your notebook as we are using random numbers here).

-----------

**Exercise**

To get an idea on whether this is realistic, also calculate the median number of customers. What do you learn from this?

--------------

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

To get some idea on how the number of customers varies over time, we will plot the number of customers together with two moving averages: one moving average over 3 months, the other over 6 months.

In [None]:
df_customers['MA_3'] = df_customers['number_of_customers'].rolling(window=3).mean()
df_customers['MA_6'] = df_customers['number_of_customers'].rolling(window=6).mean()

------------

**Exercise**

Plot the number of customers and the moving averages defined above.

-------------

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Consider the first couple of rows of the dataframe `df_customers`.

----------

**Exercise**

Why does "NaN" mean and why do they appear? Hint: use google if you do not know what "NaN" stands for.

-----------

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

--------------

**Exercise**

Add a column to `df_customers` with the price per period.

------------

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Assume that your costs per sale equal 0.1.

-----------

**Exercise**

Add a column `profits` to the dataframe.

-----------


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

------------

**Exercise**

Plot the number of customers against profit. Explain the shape of curve that you see.

------------

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE