___
<a id='section5'></a>
<h2 align="center">KPI Analysis - Regression Trends</h2>

Key Performance Indicator (KPI) analysis will use trends from input (x) against output (y) that can be used to find patterns at various account segments such as Brand vs Performance. Regression trends, such as linear, will be based off numerous data points that are illustrated in scatter plot below. Colors will represent categories such as theme (product & lifestyle), campaign type (brand & performance) and creative audience (women, men, wedding, etc.).

<a id='sub11'></a>

In [1]:
from scipy import stats
import pandas as pd
from ipywidgets import interact
from IPython.display import HTML#, IFrame
from plotly.offline import init_notebook_mode, plot#, iplot
import plotly.graph_objs as go
import pandas as pd
import numpy as np
import warnings

#prep plotly offline and filter warnings
warnings.filterwarnings('ignore')
init_notebook_mode(connected=True)

#read and clean up dataframe from csv file
trend_df = pd.read_csv('Data for charts/kpi_analysis_data.csv')
trend_df = trend_df.dropna(subset=['Campaign_Type', 'Theme', 'Audience'])
trend_df = trend_df[trend_df['Theme'] != 'Other']
trend_df['Week'] = pd.to_datetime(trend_df['Week'], format='%m/%d/%Y')
trend_df = trend_df[trend_df['Week']>='2018-01-01']
trend_df['Quarter'] = pd.PeriodIndex(trend_df['Week'], freq='Q').astype(str)
q_list = list(trend_df['Quarter'].unique())[:-1]
q_list.insert(0,'ALL')

#clean up variable selection (i.e I_Visits=Interested Visits)
n_col = list(trend_df.select_dtypes('number').columns)
x_list = []
col_key = dict()
for i in n_col:
    x_list.append(str(i).replace('Imp', 'Impressions').replace('I_Visits', 'Interested Vists').replace('LP_Conversions', 'Universal Landing Pages'))
    col_key[str(i).replace('Imp', 'Impressions').replace('I_Visits', 'Interested Vists').replace('LP_Conversions', 'Universal Landing Pages')] = i

#interactive dropdowns
@interact
def scatter_plot(x=x_list, y=x_list[1:], breakout=['Campaign_Type', 'Theme', 'Creative_Audience'], annotations=['CTA', 'Product', 'Product_Count', 'Material', 'Partner', 'Week'], quarter=q_list, trend=['Linear', 'Polynomial', 'Logarithmic']):   
    x_col = col_key[x]
    y_col = col_key[y]

    #take out na's and find outliers
    df = trend_df
    thresholds = df[x_col].quantile([.45,.998])
    max_y = df[y_col].quantile([.999])
    df = df.dropna(subset=[x_col])
    
    #filter on quarter and outliers (quantiles)
    df = df if quarter == 'ALL' else df[df['Quarter'] == quarter]
    visual_df = df[(df[x_col] >= min(thresholds)) & (df[x_col] <= max(thresholds))]
    visual_df = visual_df[(visual_df[y_col] > 0) & (visual_df[y_col] < int(max_y))]
    
    #empty array for visual plot data set (will append breakouts later) & colors to index
    visual_data = []
    colors = ['blue', 'lightblue', 'red', 'coral', 'green', 'lightgreen', 'deeppink', 'lightpink', 'darkturquoise', 'turquoise', 'violet', 'mediumpurple']
    k=0

    #begin loop of breakout (i.e. campaign_type = brand then performance)
    for i in visual_df[breakout].unique():
        temp_df = visual_df[visual_df[breakout] == i]
         
        #dynamic trend selection (linear, polynomial, logarithmic)
        if trend == 'Linear':
            slope, intercept, r_value, p_value, std_err = stats.linregress(temp_df[x_col].values,temp_df[y_col].values)
            str_line= str(round(slope,4))+'*x+'+str(round(intercept, 2))
            x_line = temp_df[x_col].values
            y_line = slope*temp_df[x_col].values+intercept
        elif trend == 'Polynomial':
            poly_slope2, poly_slope1, poly_intercept = np.polyfit(temp_df[x_col].values, temp_df[y_col].values, 2)
            str_line = str(round(poly_slope2,4))+'x^2 + '+str(round(poly_slope1,4))+'x + '+str(round(poly_intercept,2))
            x_line = np.linspace(np.min(temp_df[x_col].values), np.max(temp_df[x_col].values), 100)
            y_line = (poly_slope2*(x_line**2))+(poly_slope1*x_line)+poly_intercept
        elif trend == 'Logarithmic':
            log_slope, log_intercept = np.polyfit(np.log(temp_df[x_col].values), temp_df[y_col].values, 1, w=temp_df[y_col].values)
            str_line = str(round(log_slope,2))+'*log(x) + '+ str(round(log_intercept,2))
            x_line = np.linspace(np.min(temp_df[x_col].values), np.max(temp_df[x_col].values), 100)
            y_line = log_slope*np.log(x_line)+log_intercept
            filter_df = pd.DataFrame({'x':x_line, 'y':y_line})
            filter_df = filter_df[filter_df['y'] >= 0]
            x_line = filter_df['x'].values
            y_line = filter_df['y'].values
            
        #append trend line then data points in scatter plot
        visual_data.append(go.Scatter(x = x_line, y = y_line, mode = 'lines', name = i+' Trends', text='Formula = '+str(str_line), marker=dict(size = 5, color=colors[k])))
        visual_data.append(go.Scatter(x = temp_df[x_col].values, y = temp_df[y_col].values, mode = 'markers', text=temp_df[annotations].values, name = i, marker=dict(size=5, color=colors[(k+1)])))
        k+=2
    
    #set up layout of visual data and plot chart
    layout = go.Layout(title=go.layout.Title(text=x+' v '+y),  yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(text=y)), xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(text=x)))
    fig = go.Figure(data=visual_data, layout=layout)
    temp_plot = plot(fig, config={"displayModeBar": False}, show_link=False, include_plotlyjs=False, output_type='div', auto_open=False)
    #return iplot(fig) ## jupyter notebook output
    #temp_plot = plot(fig, auto_open=False)
    return HTML(temp_plot)
    #return IFrame(temp_plot, width=950,height=550)## html file output
    

interactive(children=(Dropdown(description='x', options=('Impressions', 'Clicks', 'Universal Landing Pages', '…

<a id='sub12'></a>
__Chart callouts:__
+ Interactive dropdowns allow user to choose x and y axis (input and output) along with breakout (subsets) within data. Frequently used x and y combos:
    + x = Impressions | y = Clicks (CTR)
    + x = Impressions | y = Universal Landing Pages (ULP conversion rate)
    + x = Visits | y = Interested Visits (IVR)
    + x = Impressions | y = Revenue __OR__ Orders (sale conversion rate)
    + x = Universal Landing Pages | y = Revenue __OR__ Orders (site activity to sale conversion)
+ Hover labels (annotations) are another dropdown option (includes CTA, Product, Product Count, Partner, and more)
+ Timeseries option available through quarter dropdown (ALL = Q1 2018 - Q1 2019)
+ Trend option can vary but will typically fall under logarithmic or linear. Details below:
    + Logarithmic trend - as one or more input variables (x) increase the output variable (y) will increase quickly in the beginning, but the gains decrease as input variable x grows (i.e. x = impressions vs y = orders). Can be used to find optimal investment levels in regards to market saturation.
    + Linear trend - relationship between two or more variables that remain constant (i.e. x = clicks vs y = visits). Better for to use in QA analysis since trend between variables will move in straight line.
    + Polynomial trend - similar to logarithmic trend but is better for gauging gains and losses since trends are capable of having hills and valleys (i.e. x = impressions vs y = revenue).

__Trend Insights:__
+ Campaign (Brand vs Performance) Callouts:
    + Overall, Brand campaigns had a higher CTR than performance campaigns (log trend recommended)
    + Performance segment begins to surpass brand efficiency in lower portion of consumer funnel, found in:
        + Impressions vs ULP - ad reach to site landing page efficiency is where performance starts, since it's the next step in step after CTR, to take a significant lead over brand (log trend).
        + Impressions vs orders - performance increases lead over brand at bottom of funnel (log trend).
        + Impressions vs revenue - performance segment still dominates brand output, but losses occur for performance revenue when impression levels reach over 800K viewers (polynomial trend).
    + Performance segment did slightly better than brand segment from site engagement (visits vs interested visits) as well (linear trend).
    + __Key Takeaways:__
        + Brand has proven to be a strong upper funnel tactic but does very little for sale conversions. Moving forward brand campaigns can be used as an upper funnel tactic to assist performance campaign sale conversions (__found below in lag time analysis__).
            + Performance segment has a strong investment to sale conversion relationship (impression to revenue). That being said, market saturation occured for performance segment at 800K weekly impressions (by creative/vendor).
                + Holiday impressions (Q4 2018) from performance segment reached over 1.2 million weekly impresisons without any signs of sale CVR fatigue.
+ Theme (Product vs Lifestyle) Callouts:
    + In theme group creatives can be either product or lifestyle classification. Product ads, ads focused on products, are found far more often than lifestyle ads, these ads will contain people/models wearing David Yurman jewelery.
    + Generally, product ads tend to outperform lifestyle ads. Lifestyle segment performance could in part be due to low digital investment which is possibly not at a significant level for analysis.
        + KPIs that had smallest difference between product and lifestyle performance were CTR and IVR. Lifestyle low funnel conversions almost completely fell off and show little to no correlation to upper funnel metrics such as impressions.
    + __Key Takeaways:__
        + Continue high investment behind product focused ads but similar to brand segment try to use lifestyle segment to leverage product ad sale conversions.
        + Product focused creatives should not exceed 1,000,000 weekly impressions unless holiday season in which case market saturation has yet to be met
            + Can invest more than last year holiday season and still reap high ROI from product creatives (similar to performance campaigns).
+ Creative Audience (Women, Men, Wedding, & Geotarget) Callouts:
    + Men and Women typically were top performing segments. Although, wedding did bring in strong CTR similar to men and women.
    + While geotarget segment did not have as much investment or unique creatives it did show possible opportunity in specific cases (__reference wordcloud analysis__).
    + __Key Takeaways:__
        + Men and women segments drove bottom funnel conversions. Will need to keep high share of gender focused ads within creative rotation.
        + Wedding had the lowest sale conversions and even lowest landing page visits. However, wedding also had one of the strongest CTR in comparison to other groups suggesting consumers are more inclined to drop off and finish purchase in stores.
            + Consider wedding product sales model to compare against media performance to gain better insight into this area.
        + Geotarget creatives (i.e. creatives with store location or store locator) has shown strong performance across several KPIs in Q1 2019. Will monitor geotargeted ads moving forward and make proper optimizations.