In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab_m1.ipynb")

# Lab M1: Discontinuity

## Setup

In [None]:
import warnings
from datascience import Table, are
from numpy import mean, sqrt
from statsmodels.formula.api import ols

# Force display of all values 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Handle some obnoxious warning messages
warnings.filterwarnings("ignore")

## Sales Experience

### Business Decision

Reynolds Products is assessing the effectiveness of its new sales staff, which is hired contigent on sales performance during an intial 90-day probation period.  Management suspects that average daily sales for an individual salesperson increases with experience during and beyond the probation period.

Assume that you are the manager of new hire sales for Reynolds Products.  What do you think about the effectiveness of the probation period for new new salespeople?

### Data

Retrieve a dataset from file 'Reynolds.csv'.  Show the first few observations.
Visualize the dataset as a scatterplot.

In [None]:
reynolds = ...
reynolds

reynolds.scatter('Employed')

In [None]:
grader.check("q1")

### Analysis: One-Piece Model

Build a linear regression model to predict a salesperson's average sales based on number of days employed.
Show the model goodness of fit (R^2).  
Show the model parameters.

In [None]:
model_1 = ...
model_1.rsquared
model_1.params

In [None]:
grader.check("q2_1")

Add a variable for the predicted average sales. 
Add a variable for the prediction errors. _(these are called "residuals", use the .resid function)_

In [None]:
predicted_sales_1 = ...
prediction_errors_1 = ...

reynolds_predictions = reynolds.with_columns('sales_predicted', predicted_sales_1,
                         'error', prediction_errors_1)
reynolds_predictions

In [None]:
grader.check("q2_2")

RMSE is $\textit{sqrt}(\textit{mean}(\textit{error}^2))$.  It is a measure of predictive peformance, similar to R^2.

Show the RMSE calculated based on the dataset.  
Visualize the model performance as a scatterplot of average sales and predicted average sales vs. number of days employed.

In [None]:
RMSE_1 = ...
RMSE_1

selected_cols = ...
selected_cols.scatter('Employed')

In [None]:
grader.check("q2_3")

#### Predict

Predict the average sales of a salesperson that has been employed for 80 days.

In [None]:
sales_80_prediction_1 = model_1.predict(Table().with_columns('Employed', 80))
sales_80_prediction_1

In [None]:
grader.check("q2_4")

### Analysis: Two-Piece Model

Reset the dataset to include only average sales and number of days employed variables.  
Visualize the dataset as a scatterplot.

In [None]:
sales_employed = ...

sales_employed.scatter('Employed')

In [None]:
grader.check("q3_1")

Set the breakpoint to 90.  This is the predictor variable value at which an apparent discontinuity occurs.

In [None]:
breakpoint = 90
breakpoint

#### Model 1

Filter the dataset to include only sales associated with number of days employed less than or equal to the breakpoint.  
Show the resulting filtered dataset.

In [None]:
sales_employed_below_break = ...
sales_employed_below_break.show()

In [None]:
grader.check("q3_2")

Build a linear regression model to predict a salesperson's average sales based on number of days employed (use the filtered dataset).  
Show the model goodness of fit (R^2).  
Show the model parameters.

In [None]:
model_below_break = ...
model_below_break.rsquared
model_below_break.params

In [None]:
grader.check("q3_3")

#### Model 2

Filter the (unfiltered) data to include only sales associated with number of days employed greater than the breakpoint.  
Show the resulting filtered dataset.

In [None]:
sales_employed_above_break = ...
sales_employed_above_break.show()

In [None]:
grader.check("q3_4")

Build a linear regression model to predict a salesperson's average sales based on number of days employed (use the filtered dataset).  
Show the model goodness of fit (R^2).  
Show the model parameters.

In [None]:
model_above_break = ...
model_above_break.rsquared
model_above_break.params

In [None]:
grader.check("q3_5")

#### Recombine

Add a variable for predicted sales to each filtered dataset.  
Add a variable for errors to each filtered dataset.  
Build a new dataset that combines the filtered datasets.  
Show the resulting dataset.

In [None]:
predicted_sales_below_break = ...
errors_below_break = ...

predicted_sales_above_break = ...
errors_above_break = ...

below_break_preds = sales_employed_below_break.with_columns('sales_predicted', predicted_sales_below_break, 'error', errors_below_break)
above_break_preds = sales_employed_above_break.with_columns('sales_predicted', predicted_sales_above_break, 'error', errors_above_break)
data_combo = ...

data_combo.show()

In [None]:
grader.check("q3_4")

Show the RMSE calculated based on the new dataset.  
Visualize the performance of the 2-piece model as a scatterplot of average sales and predicted average sales
vs. number of days employed.

In [None]:
RMSE_combo = ...
RMSE_combo

selected_cols_combo = ...
selected_cols_combo.scatter('Employed')

In [None]:
grader.check("q3_5")

#### Predict

Predict the average sales of a salesperson that has been employed for 80 days.

In [None]:
sales_80_prediction = ...
sales_80_prediction

In [None]:
grader.check("q3_6")

Predict the average sales of a salesperson that has been employed for 100 days.

In [None]:
sales_100_prediction = ...
sales_100_prediction

In [None]:
grader.check("q3_7")

### Analysis: Piecewise Model

Reset dataset to include average sales and number of days employed variables only.
Visualize the dataset as a scatterplot.

In [None]:
sales_employed_2 = ...

sales_employed_2.scatter('Employed')

In [None]:
grader.check("q4_1")

Set the breakpoint to 90.  This is the predictor variable value at which an apparent discontinuity occurs.  
Add a variable to the datset for switching (1 means number of days employed is greater than breakpoint, 0 means otherwise).  
Add a variable to the dataset for adjustment, like this: $(\textit{number of days employed} - \textit{breakpoint}) \times \textit{switch}$  
Show the resulting dataset.

In [None]:
breakpoint = 90

switch = ...
switch_adjustment = sales_employed_2.with_column('switch', (switch).astype(int))

adjustment = ...
switch_adjustment = switch_adjustment.with_column('adjust', adjustment)

switch_adjustment.show()

In [None]:
grader.check("q4_2")

Build a linear regression model to predict a salesperson's average sales based on number of days employed and an adjustment.  
Show the model goodness of fit (R^2).  
Show the model parameters.

In [None]:
model_2 = ...
model_2.rsquared
model_2.params

In [None]:
grader.check("q4_3")

Add a variable to the dataset for predicted sales.  Show the resulting dataset.

In [None]:
predicted_sales_2 = ...

switch_adjustment_preds = switch_adjustment.with_column('sales_predicted', predicted_sales_2)
switch_adjustment_preds.show()

In [None]:
grader.check("q4_4")

Show the RMSE calculated based on the dataset.
Visualize the performance of the model as a scatterplot of average sales and predicted average sales
vs. number of days employed.

In [None]:
RMSE_2 = ...
RMSE_2

selected_cols_switch_adjust = switch_adjustment_preds.select('Employed','Sales', 'sales_predicted')
selected_cols_switch_adjust.scatter('Employed')

In [None]:
grader.check("q4_5")

Predict the average sales of a salesperson that has been employed for 80 days.

In [None]:
sales_80_prediction_2 = model_2.predict(Table().with_columns('Employed', 80, 'adjust', (100-breakpoint)*0))
sales_80_prediction_2

In [None]:
grader.check("q4_6")

Predict the average sales of a salesperson that has been employed for 100 days.

In [None]:
sales_100_prediction_2 = model_2.predict(Table().with_columns('Employed', 100, 'adjust', (100-breakpoint)*1))
sales_100_prediction_2

In [None]:
grader.check("q4_7")

Predict the average sales of a salesperson that has been employed for 90 days.

In [None]:
sales_90_prediction = model_2.predict(Table().with_columns('Employed', 90, 'adjust', (90-breakpoint)*1))
sales_90_prediction

In [None]:
grader.check("q4_8")

<p style="text-align:left; font-size:10px;">
Copyright (c) Huntsinger Associates, LLC
<span style="float:right;">
Document revised November 18, 2023
</span>
</p>