# Executive Summary


1) This company is **spending 18% of their revenue in advertising** (\$ 102 million out of a \$557 million revenue). While advertising spending depends strongly on the type of industry and the market, 18% is definitely on the high side (see [here](http://smallbusiness.chron.com/percentage-gross-revenue-should-used-marketing-advertising-55928.html)). So we wonder if **this company is spending too much in advertising**.

2) There is a **huge variability in the effectiveness of different media initiatives**. Indeed, while a few initiatives (4, 9 & 14) clearly drive up sales, all the others have no effect on them. Dropping those useless initiatives would save about 88% of the adervtising budget!

3) Vert little money is invested in those initiatives that actually drive up sales significanlty. Threfeore, there is a great opportunity to optimize the spending plan and **both reduce advertising spending and increase sales.**

4) Six of the studied structural occurrences were found to increase weekly sales (we dub them positive structural occurrences) while the remaining four were found to decrease sales (negative structural occurrences). The effect was significant in certain cases (up to 20%). Further work is necessary to see how one can use positive structural occurrences to their advantage and minimize the effect of negative structural occurrences.

In summary, based on the above summarized findings, there is a **huge potential for optimizing the spending plan**. We have identified which marketing initiatives to discountinue and how to redistribute the budget. Our results show that we can both **both reduce advertising spending (by XX) and increase sales (by YY)**. The details of this analysis are reported below.

# Data Analysis

Below we use standard python tools for data science. These include pandas libraries for data wrangling, statistical libraries for data analysis and plotting libraries to visualize the results.

In [1]:
import matplotlib.pyplot as plt 
import pandas as pd
import seaborn as sns
import numpy as np
import statsmodels.api as sm
import statsmodels.tsa as tsa
import statsmodels.formula.api as smf
from scipy import stats

observations = pd.DataFrame.from_csv('./observations.csv', index_col=None)
spend = pd.DataFrame.from_csv('./spend_values.csv', index_col=None)



## The idea

In what follows we want to see if and how a certain media initiative is affecting sales. In order to do that, we plot the weekly sales vs the number of media impressions in that week. The key idea is that, if the media campaign is working, then **the more impressions there are in a week, the higher the sales**. In statistical language, people say that there is a positive correlation between these two variables. 

Below we perform this type of analysis by fitting a straight line to the data and looking at the slope of that line. If the slope is positive and large, then the media campaign is working. If the slope is zero, then it means it has no effect on sales. If it is negative, then the media campaign is reducing sales. We also focus on the p-value obtained from the linear fit. Low p-values (< 0.05) mean that indeed there is a correlation; larger p-value imply that the data does not show a statically meaninful trend, even if the estimated slope is $\neq 0$.

In [18]:
# Performing a simple time-series analysis
 
for i in range(1,21):
    
    name  = 'Media ' + str(i) + ' Impressions'

    x = observations[name]
    y = observations['Sales']
    slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
    if p_value < 0.05:
        print(name)
        print(slope, p_value)


Media 4 Impressions
(0.94844273769339082, 0.0024921889321075207)
Media 9 Impressions
(1.1155098752792703, 9.2639306094687362e-05)
Media 14 Impressions
(1.1173720894054608, 5.657327783174512e-05)


## Only three media initiatives have a positive effect on sales!

Based on the above analysis, only three initiatives show a positive correlation between the number of impressions and sales. In other words, there is no statistical evidence that most of the initiatives increase sales. Based on this analysis, one could leave only inititives 4, 9 and 14 and **save \$89 million**, which corresponds to 88% of the advertising budget.

In [2]:
for i in range(1,21):
    
    name  = 'Media ' + str(i) + ' Impressions'

    plt.figure(figsize=(15,7.5))
    x = observations[name]
    y = observations['Sales']
    xp = np.linspace(0, 1000000, 100)
    slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
    z = np.polyfit(x, y, 1)

    p = np.poly1d(z)
    plt.xlabel(name)
    plt.ylabel('Sales ($)')
    plt.title = 'Correlation between sales and type of media campaign'
    plt.plot(x, y, '.', xp, p(xp), '-')

0.448241808262


[<matplotlib.lines.Line2D at 0x103c76250>,
 <matplotlib.lines.Line2D at 0x10b603990>]

References

http://www.andrewwalterowens.com/post/111019666926/marketing-mix-modeling-using-statsmodels-part-1

https://analyticsartist.wordpress.com/2014/08/17/marketing-mix-modeling-explained-with-r/

https://analyticsartist.wordpress.com/2013/11/02/calculating-adstock-effect/