


# BUSA 603 Lab 3: Market Mix Modelling

## Outcomes

- Understand, run and interpret results of simple Marketing Mix Model with Regression.

## Marketing Mix Model

A Marketing Mix Model (MMM) is a technique used to determine market attribution. Specifically, it is a statistical technique (usually regression) on marketing and sales data to estimate the impact of various marketing mix.

Usually, Marketing Mix Models attempt to **measure the impact of immeasurable marketing channels**, like TV, radio, newspapers, social media, etc. 

MMM's subset, Attribution Modeling, is concerned about the impact of digital marketing touchpoints on conversion. Here the digital marketing touchpoints are instances in which a customer interacts with an element of the firm’s digital marketing efforts. The problem of marketing attribution is similar to generally analyzing the impact of the marketing mix on marketing outcomes. However, the focus of marketing attribution on digital marketing means that there is rich, granular data on the actions of each individual customer across a number of touchpoints. Marketing mix modeling is the starting place for marketing analytics professionals who want to apply statistical models to data.

Generally, in MMM your output variable will be sales or conversions, but can also be things like website traffic. Your input variables typically consist of marketing spend by channel by period (day, week, month, quarter, etc…), but can also include other variables which we’ll get to later.

## Review Linear Regression

Linear regression is a linear approach to modeling the relationship between a dependent (output) variable and one or more independent (input) variables. In simpler terms, it is the **line of best fit** that represents a dataset.

### Simple linear regression

Below is an example of a line that best fits the data points. The line is represented by the following equation:

    
$y = w_0 + w_1x$

$w_0$ is the intercept and $w_1$ is the slop.



<div>
<img src="https://github.com/franklin-univ-data-science/data/blob/master/images/10_01.png?raw=true" width="600"/>
</div>



This best-fitting line is also called the regression line. 

The distance between sample points ($y$) and regression line ($\hat{y}$) are the offsets — the errors of our prediction. 

The difference between a sample point and the regression line is residual $y-\hat{y}$.

We also often use Sum of Squares Error (SSE) $SSE = \sum_{i=1}^n(y^{(i)}-\hat{y}^{(i)})^{2}$ for the error of regression model.



### Multiple linear regression

Simple linear regression is useful when you want to find an equation that represents two variables, the independent variable (x) and the dependent variable (y). But what if you have many independent variables? For example, the price of a car is probably based on multiple factors, like its horsepower, the size of the car, and the value of the brand itself.

This is when multiple regression comes in. Multiple regression is used to explain the relationship between a dependent variable and more than one independent variable.

The image below shows a plot between Target (y) and Feature1 and Feature2 (x1 and x2). When there are two independent variables, a plane of best fit is found instead of a line of best fit.

<div>
<img src="https://github.com/franklin-univ-data-science/data/blob/master/images/10_15.png?raw=true" width="600"/>
</div>

More generally, a linear system with m input variables is defined as:
$y= w_0 + w_1x_1 + ... + w_mx_m $

## Explore the Marketing dataset

### Load  dataset into a data frame

Let's load a fictional dataset that consists of marketing spend on TV, radio, and newspaper, as well as the corresponding dollar sales by period.

In [None]:
import pandas as pd

df = pd.read_csv('Advertising.csv')

df.head()

### Explore the data

you can see that the variable, Unnamed: 0, is essentially an index starting at 1 — so we can remove it.

In [None]:
# here, axis='columns' means we want to drop the column
df = df.drop(['Unnamed: 0'],axis='columns')
df.head()

Because this is a simple dataset, there are a lot of steps that we don’t have to worry about, like handling outliers and missing values. But generally, you want to make sure that your dataset is clean.

### Visualizing the important characteristics of a dataset

Exploratory Data Analysis (EDA) is an important and recommended first step prior to the training of a machine learning model. 

**Scatterplot matrix** allows us to visualize the pair-wise correlations between the different variables in this dataset in one place.

Using the following scatterplot matrix, we can quickly eyeball how the data is distributed, the relationship between features and target, and whether the variables contain outliers. 

We will the seaborn library to draw the plots. Seaborn is a Python data visualization library that provides a high-level interface for drawing attractive and informative statistical graphics conveniently. Reference the exampels in the [Step-by-step Seaborn Tutorial](https://elitedatascience.com/python-seaborn-tutorial) for more details.

In [None]:
import seaborn as sns

sns.pairplot(df, height=2.5)


There appears to be a strong relationship between TV and sales, less for radio, and even less for newspapers.

## Estimate the coefficient by OLS model

Ordinary least squares (OLS) regression is a statistical method of analysis that estimates the relationship between one or more independent variables and a dependent variable; the method estimates the relationship by minimizing the sum of the squares in the difference between the observed and predicted values of the dependent variable, which is the residual we mentioned before. 

What makes Python so amazing is that it already has a library that we can use to create an OLS model:

In [None]:
import statsmodels.formula.api as sm
model = sm.ols(formula="sales~TV+radio+newspaper", data=df).fit()
print(model.summary())

In [None]:
df['y_pred'] = y_pred = model.predict()
df.head()

## Evaluate the performance of linear regression models

### Residual plots

Since our model uses multiple explanatory variables, we can't visualize the linear regression line (or hyperplane to be precise) in a two-dimensional plot.

But we can plot the residuals (**the differences between the actual and predicted values**) versus the predicted values to diagnose our regression model. **Residual plots** are a commonly used graphical tool for diagnosing regression models. They can help detect nonlinearity and outliers, and check whether the errors are randomly distributed.

In [None]:
import matplotlib.pyplot as plt

plt.scatter(df['y_pred'],  df['sales'] - df['y_pred'],
            c='steelblue', marker='o', edgecolor='white'
            )
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.hlines(y=0, xmin=0, xmax=30, color='black', lw=2)

plt.show()

In the case of a perfect prediction, the residuals would be precisely zero, which we will probably never encounter in real and practical applications. However, for a good regression model, we would expect that the errors are randomly distributed, and the residuals should be randomly scattered around the centerline. 

Suppose we see patterns in a residual plot. In that case, it means that our model is unable to capture some explanatory information, which has leaked into the residuals, as we can slightly see in our previous residual plot. Furthermore, we can also use residual plots to detect outliers, represented by the points with a significant deviation from the centerline.

As the linear regression only catches the linear relationship between input and output, the inverted V shape in the residual plot is probably caused by the non-linear relationship between TV and sales. Also, one data point seems to be the outlier.   


### R-squared

Sometimes it is useful to report the coefficient of determination( $R^2$ ):


$R^2 = \frac{SST-SSE}{SST}$

where $SST = \sum_{i=1}^n(y^{(i)}-\mu_y)^{2}$

$SSE = \sum_{i=1}^n(y^{(i)}-\hat{y}^{(i)})^{2}$

Here, SST is simply the total variance of the output variable, and SSE is the sum of squared fitting errors. Thus, $R^2$ can be understood as the **percentage of the response variance that is explained by a linear model**.

In our fitting, The R-squared is 0.897, which means that almost 90% of all variations in our data can be explained by our model, which is pretty good. 




## Interpret model results

### p-values
The p-value for each input variable tests the null hypothesis that the coefficient equals zero (no effect). A low p-value (<= 0.05) indicates that you can reject the null hypothesis. In other words, an input variable that has a low p-value is likely to be a meaningful improvement to your model because changes in the input variable's value are related to changes in the output variable. Conversely, a larger (insignificant) p-value suggests that changes in the input are not associated with changes in the output.

In this case, the p-values of tv and radio are small (<0.05), while the p-value of the newspaper is large. Thus, only the tv and radio are meaningful input variables.

### coefficients

Regression coefficients represent the mean change in the output variable for one unit of change in the input variable while holding other inputs in the model constant. 

In this case, every 1 unit TV expense increases 0.0458 unit sales, and every 1 unit radio expense increases 0.1885 unit sales. Thus, the radio channel is more productive. 


## Further Considerations

- In reality, the data probably won’t be as clean as this and the results probably won’t look as pretty. In practice, you’ll want to consider more variables that impact sales, including but not limited to
seasonality. It’s almost always the case that company sales are seasonal. For example, a snowboard company’s sales would be much higher during the winter than in the summer. In practice, you’ll want to include a variable to account for seasonality.

- Carryover Effects: The impact of marketing is not usually immediate. In many cases, consumers need time to think about their purchasing decisions after seeing advertisements. Carryover effects account for the time lag between when consumers are exposed to an ad and their response to the ad.

- Base sales vs incremental sales: Not every sale is attributed to marketing. If a company spent absolutely nothing on marketing and still made sales, this would be called its base sales. Thus, to take it a step further, you could try to model advertising spend on incremental sales as opposed to total sales.


*Reference: [How to Build a Simple Marketing Mix Model with Python](https://towardsdatascience.com/building-a-simple-marketing-mix-model-with-ols-571ac3d5b64f)*

## Assignment

The Red Bull data contains the sales amount and the expenses in dollars for the following channels:

- Banner ad
- Facebook
- Instagram
- E-zine
- TV
- Twitter
- YouTube

Action items:

- Explore the data; discuss the relationship between the input variables and the output variable.
- Apply the OLS model using all the input variables.
- Evaluate the model performance; discuss the residual plot.
- Find which channel is the best using linear regression, and explain why. 




In [None]:
import pandas as pd

df = pd.read_csv('redbull.csv')
df.head()