{

"nbformat": 4,

"nbformat_minor": 0,

"metadata": {

"kernelspec": {

"display_name": "Python 3",

"language": "python",

"name": "python3"

},

"language_info": {

"codemirror_mode": {

"name": "ipython",

"version": 3

},

"file_extension": ".py",

"mimetype": "text/x-python",

"name": "python",

"nbconvert_exporter": "python",

"pygments_lexer": "ipython3",

"version": "3.7.3"

},

"colab": {

"name": "Flock Freight Takehome Response.ipynb",

"provenance": \[\],

"collapsed_sections": \[

"yMGlbagJLcvS",

"oHF5_ZHDLcvV"

\]

}

},

"cells": \[

{

"cell_type": "markdown",

"metadata": {

"id": "mQIPVEGFLcvJ"

},

"source": \[

"\# Flock Freight Value Metrics\\n",

"\\n",

"\#\# Goal\\n",

"\\n",

"1. Create some metric(s) which quantifies the value of a carrier
relationship.\\n",

"2. Assess the ability to predict this value.\\n",

"3. Summarize the findings. Imagine this needs to be communicated to the
carrier relationship team in an understandable way.\\n",

"4. Explain the approach you took in part 1, why you chose it, and what
its limitations are.\\n",

"5. Explain what you would ideally do next, if you were doing this in
your job and had the time and resources you needed to do it to your
satisfaction.\\n",

"\\n",

"\#\# Ideas\\n",

"\\n",

"- Customer lifetime value (LTV)\\n",

" - Pros: Standard business metric, units (dollars) are
understandable\\n",

" - Cons: Can be hard to model real costs with maketing, call center,
etc. Simple churn model isn't going to work given the sporadic jobs\\n",

"- Monthly net revenue\\n",

" - Pros: Can start simple and build, units (dollars per month) are
understandable\\n",

" - Cons: not as useful as LTV for planning marketing, acquisition
costs, \\n",

"- Churn model using
\[Convoys\](https://better.engineering/convoys/)\\n",

" - Pros: I have been wanting to use the model for a while, it is
well-suited to the \\"rolling start\\" data, and the trucking name pun
is gold\\n",

" - Cons: Learning a new library in limited time, not sure how to
leverage for useful predictions\\n",

"- Probability of becoming a large revenue customer\\n",

" - Pros: Turns problem into a binary supervised learning problem,
demonstrates I can do ML\\n",

" - Cons: Without understanding the domains and goals it is likely to
produce garbage\\n",

"- Probability of accepting job\\n",

" - Pros: Simple metric, easy to finish in time, likely to be right\\n",

" - Cons: Simple metric, is a less intuitive \\"value of relationship\\"
measure\\n",

" \\n",

"Based on the pros and cons (and limited time) I think I am going to go
with monthly net and probability of acceptance.\\n",

"\\n",

"\#\# Future Work\\n",

"\\n",

"Top 3 things I would do if this were a real task, in priority
order:\\n",

"\\n",

"1. Better understand the business. Impossible to make a good LTV model
without that.\\n",

"2. Learn what value metrics would be useful. Talk with consumers of the
information.\\n",

"3. Spend more time looking at the data. Get crisp on missing values,
seasonality, data generation process. Meaning of some terms needs
clarity (see below)\\n",

"\\n",

"For the two value metrics I did below there are specific \\"Future
Work\\" sections embedded below.\\n",

"\\n",

"\#\# Specific future tasks\\n",

"\\n",

"- Sometimes \`LOADS\` is larger than \`OFFERS\` + \`SELF_SERVE_OFFERS\`
in a single row. Figure out why.\\n",

" - My guess: delay between calendar week of offer and load. This is
supported by the existance of fields like \`OFFERS_REVENUE_HAULED\`? If
so a data format with a record per offer and proper occurred at
timestamps would be better.\\n",

" - Example: Using only 2020 data for ID
\`9f22382a-efdb-4597-a057-bb5959bcdb00\` Loads \`768\` Offers (self and
regular) \`195\`.\\n",

" - Actually, Loads \> Total Offers seems to be true for the majority of
carriers. Must be that loads happen without offers?\\n",

"- Total Escalations is almost as large as sum of Loads. That seems very
high, so I must be misunderstanding something?\\n",

"- Need to understand the mechanism for the partial data in next to last
week (see below)\\n",

"\\n",

"\#\# Time Log\\n",

"\\n",

"- Friday night: \\n",

" - \~30 minutes reading documentation, downloading CSV and poking
around in Tableau\\n",

"- Saturday: \\n",

" - 9:30-10:45 planning, documentation\\n",

" - 11:15-12:00 math + code for Wilson\\n",

" - 1:00-1:45 code: finished up Wilson + validation\\n",

"- Monday\\n",

" - 7:00-8:00 simple net revenue predictions, documentation\\n",

" - 8:30-10:00 pivot to \`REVENUE_HAULED\`, debugging\\n",

" - 10:00-12:00 regression, more debugging, documentation"

\]

},

{

"cell_type": "code",

"metadata": {

"id": "SDcLBKAHLcvK",

"outputId": "ff75fd16-cc9d-4f1e-fb04-a24555a08dd3"

},

"source": \[

"print(f\\"Total Hours {(30+75+45+45+60+90+120)/60}\\")"

\],

"execution_count": null,

"outputs": \[

{

"name": "stdout",

"output_type": "stream",

"text": \[

"Total Hours 7.75\\n"

\]

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "QcGxFbOjLcvL"

},

"source": \[

"import pandas as pd\\n",

"import numpy as np\\n",

"from datetime import datetime, timedelta\\n",

"from matplotlib import pyplot as plt\\n",

"%matplotlib inline"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "A4YdR26oLcvL"

},

"source": \[

"\# Likelihoood of Delay Escalation\\n",

"\\n",

"This was \*\*Probability of Accepting a job\*\*, but something is up
with the \`Offers\` and \`Loads\` data (see \\"Specific future tasks\\"
at top). I had already done the scoring code and didn't want to toss it
given the limited time, so I pivoted to likelihood of escalation for a
delay.\\n",

"\\n",

"\#\# Goal\\n",

"\\n",

"Rank carriers by the likelihood that they will have delay escalations
based on their past performance.\\n",

"\\n",

"\#\# Approach\\n",

"\\n",

"The obvious approach is to estimate the probability of delay escalation
based on the carrier's past performance using something the ratio of
delays \$D\$ to loads \$L\$\\n",

"\\n",

"\\\\begin{equation\*}\\n",

"p(delay) \\\\approx \\\\frac{D}{L}\\n",

"\\\\end{equation\*}\\n",

"\\n",

"But that can be a bad estimator for small samples. For example a
carrier with a single load that is delayed probably isn't as likely to
have delays as a carrier with 100 loads that \*all\* had delays.\\n",

"\\n",

"The solution I like for this is the Wilson score\\n",

"\\n",

"\\\\begin{equation\*}\\n",

"\\\\frac{\\\\alpha + \\\\frac{z^2}{2}}{N+z^2} \\\\pm \\\\frac{z}{N+z^2}
\\\\sqrt{\\\\frac{\\\\alpha\\\\left(N-\\\\alpha\\\\right)}{N} +
\\\\frac{z^2}{4}}\\n",

"\\\\end{equation\*}\\n",

"\\n",

"For a 0.95 confidence level we can use \$z\$ = 1.96.\\n",

"\\n",

"For background:\\n",

"- I really like the article \\"\[how not to sort by average
rating\](https://www.evanmiller.org/how-not-to-sort-by-average-rating.html)\\"
as an overview for using the Wilson score for problems like this. Other
resources:\\n",

"- \[This has nice theoretical
motivation\](https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval\#Wilson_score_interval)\\n",

"- Wilson score is also \[Reddit's \\"best\\"
score\](https://redditblog.com/2009/10/15/reddits-new-comment-sorting-system/)\\n",

"\\n",

"\#\# Future work\\n",

"\\n",

"- Get data with load per row (see notes at top) so that I can actually
do percentage of loads with delay rather than this messy weekly
sum.\\n",

"- Use additional information about the carrier (not just past
performance) to make a more predictive model.\\n",

"- Explore seasonality (likely to be significant because of weather)."

\]

},

{

"cell_type": "code",

"metadata": {

"id": "Pmj91ibVLcvM",

"outputId": "3f61e776-1149-4c4e-8e73-33dba743db72"

},

"source": \[

"df = pd.read_csv(\\"carrier_valuation_takehome_data.csv\\")\\n",

"df.head()"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/html": \[

"\<div\>\\n",

"\<style scoped\>\\n",

" .dataframe tbody tr th:only-of-type {\\n",

" vertical-align: middle;\\n",

" }\\n",

"\\n",

" .dataframe tbody tr th {\\n",

" vertical-align: top;\\n",

" }\\n",

"\\n",

" .dataframe thead th {\\n",

" text-align: right;\\n",

" }\\n",

"\</style\>\\n",

"\<table border=\\"1\\" class=\\"dataframe\\"\>\\n",

" \<thead\>\\n",

" \<tr style=\\"text-align: right;\\"\>\\n",

" \<th\>\</th\>\\n",

" \<th\>ID\</th\>\\n",

" \<th\>CALENDAR_WEEK\</th\>\\n",

" \<th\>TRACKING_SUPPORTED\</th\>\\n",

" \<th\>BOL_NUMBER_TRACKING_SUPPORTED\</th\>\\n",

" \<th\>PICKUP_NUMBER_TRACKING_SUPPORTED\</th\>\\n",

" \<th\>AUTOMATIC_CHECK_IN_SUPPORTED\</th\>\\n",

" \<th\>DOCUMENT_RETRIEVAL_SUPPORTED\</th\>\\n",

" \<th\>POWER_UNITS\</th\>\\n",

" \<th\>LOADS\</th\>\\n",

" \<th\>SELF_SERVE_LOADS\</th\>\\n",

" \<th\>...\</th\>\\n",

" \<th\>ESCALATIONS_APPT\</th\>\\n",

" \<th\>ESCALATIONS_CLAIMS\</th\>\\n",

" \<th\>ESCALATIONS_FEES\</th\>\\n",

" \<th\>ESCALATIONS_TRACKING\</th\>\\n",

" \<th\>ESCALATIONS_EQUIPMENT\</th\>\\n",

" \<th\>ESCALATIONS_EDIT\</th\>\\n",

" \<th\>ESCALATIONS_OTHER\</th\>\\n",

" \<th\>ESCALATIONS_DELAY\</th\>\\n",

" \<th\>ESCALATIONS_DETAILS\</th\>\\n",

" \<th\>ESCALATIONS_AVG_RESPONSE_SECONDS\</th\>\\n",

" \</tr\>\\n",

" \</thead\>\\n",

" \<tbody\>\\n",

" \<tr\>\\n",

" \<th\>0\</th\>\\n",

" \<td\>00008702-898c-4fbf-a65e-386927d6d219\</td\>\\n",

" \<td\>2019-12-16 00:00:00\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>NaN\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>...\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0.0\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>1\</th\>\\n",

" \<td\>00008702-898c-4fbf-a65e-386927d6d219\</td\>\\n",

" \<td\>2019-12-23 00:00:00\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>NaN\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>...\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0.0\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>2\</th\>\\n",

" \<td\>00008702-898c-4fbf-a65e-386927d6d219\</td\>\\n",

" \<td\>2019-12-30 00:00:00\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>NaN\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>...\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0.0\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>3\</th\>\\n",

" \<td\>00008702-898c-4fbf-a65e-386927d6d219\</td\>\\n",

" \<td\>2020-01-06 00:00:00\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>NaN\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>...\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0.0\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>4\</th\>\\n",

" \<td\>00008702-898c-4fbf-a65e-386927d6d219\</td\>\\n",

" \<td\>2020-01-13 00:00:00\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>False\</td\>\\n",

" \<td\>NaN\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>...\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>0.0\</td\>\\n",

" \</tr\>\\n",

" \</tbody\>\\n",

"\</table\>\\n",

"\<p\>5 rows × 38 columns\</p\>\\n",

"\</div\>"

\],

"text/plain": \[

" ID CALENDAR_WEEK \\\\\\n",

"0 00008702-898c-4fbf-a65e-386927d6d219 2019-12-16 00:00:00 \\n",

"1 00008702-898c-4fbf-a65e-386927d6d219 2019-12-23 00:00:00 \\n",

"2 00008702-898c-4fbf-a65e-386927d6d219 2019-12-30 00:00:00 \\n",

"3 00008702-898c-4fbf-a65e-386927d6d219 2020-01-06 00:00:00 \\n",

"4 00008702-898c-4fbf-a65e-386927d6d219 2020-01-13 00:00:00 \\n",

"\\n",

" TRACKING_SUPPORTED BOL_NUMBER_TRACKING_SUPPORTED \\\\\\n",

"0 False False \\n",

"1 False False \\n",

"2 False False \\n",

"3 False False \\n",

"4 False False \\n",

"\\n",

" PICKUP_NUMBER_TRACKING_SUPPORTED AUTOMATIC_CHECK_IN_SUPPORTED
\\\\\\n",

"0 False False \\n",

"1 False False \\n",

"2 False False \\n",

"3 False False \\n",

"4 False False \\n",

"\\n",

" DOCUMENT_RETRIEVAL_SUPPORTED POWER_UNITS LOADS SELF_SERVE_LOADS ...
\\\\\\n",

"0 False NaN 0 0 ... \\n",

"1 False NaN 0 0 ... \\n",

"2 False NaN 0 0 ... \\n",

"3 False NaN 0 0 ... \\n",

"4 False NaN 0 0 ... \\n",

"\\n",

" ESCALATIONS_APPT ESCALATIONS_CLAIMS ESCALATIONS_FEES \\\\\\n",

"0 0 0 0 \\n",

"1 0 0 0 \\n",

"2 0 0 0 \\n",

"3 0 0 0 \\n",

"4 0 0 0 \\n",

"\\n",

" ESCALATIONS_TRACKING ESCALATIONS_EQUIPMENT ESCALATIONS_EDIT \\\\\\n",

"0 0 0 0 \\n",

"1 0 0 0 \\n",

"2 0 0 0 \\n",

"3 0 0 0 \\n",

"4 0 0 0 \\n",

"\\n",

" ESCALATIONS_OTHER ESCALATIONS_DELAY ESCALATIONS_DETAILS \\\\\\n",

"0 0 0 0 \\n",

"1 0 0 0 \\n",

"2 0 0 0 \\n",

"3 0 0 0 \\n",

"4 0 0 0 \\n",

"\\n",

" ESCALATIONS_AVG_RESPONSE_SECONDS \\n",

"0 0.0 \\n",

"1 0.0 \\n",

"2 0.0 \\n",

"3 0.0 \\n",

"4 0.0 \\n",

"\\n",

"\[5 rows x 38 columns\]"

\]

},

"execution_count": 3,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "zRu8JTlmLcvM"

},

"source": \[

"def wilson(alpha, N, z=1.96, trimAlpha = True):\\n",

" if alpha == 0:\\n",

" return 0\\n",

" \\n",

" if alpha \> N:\\n",

" if trimAlpha:\\n",

" alpha = N\\n",

" else:\\n",

" raise ValueError('Alpha cannot be larger than N')\\n",

"\\n",

" z2 = np.square(z)\\n",

" wilson1 = np.divide(np.add(alpha, z2/2), \\n",

" np.add(N, z2))\\n",

" wilson2 = np.divide(z, \\n",

" np.add(N, z2))\\n",

" wilson3 = np.divide(np.multiply(alpha, np.subtract(N, alpha)),\\n",

" N)\\n",

" wilson4 = np.divide(z2, 4)\\n",

"\\n",

" wilsonInterval = np.subtract(wilson1, np.multiply(wilson2,
np.sqrt(np.add(wilson3, wilson4))))\\n",

"\\n",

" return wilsonInterval\\n",

"\\n",

"\# Let's also calculate the average\\n",

"def correctedMean(alpha, N):\\n",

" return min(alpha,N)/N"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "PRyw_64dLcvM"

},

"source": \[

"\# used this multiple times, so decided to wrap it as a function\\n",

"def addScores(rawDf):\\n",

" df = rawDf.groupby(\\"ID\\")\['ESCALATIONS_DELAY','LOADS'\].sum()\\n",

" \# Drop carriers with no loads (no prediction)\\n",

" df = df\[df\['LOADS'\] != 0\]\\n",

" \# Calculate Delay Wilson Score\\n",

" df\[\\"DELAY_WILSON_SCORE\\"\] = df.apply(\\n",

" lambda df: wilson(df\['ESCALATIONS_DELAY'\],df\['LOADS'\]),
axis=1)\\n",

" df\[\\"DELAY_MEAN\\"\] = df.apply(\\n",

" lambda df: correctedMean(df\['ESCALATIONS_DELAY'\],df\['LOADS'\]),
axis=1)\\n",

" return df"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "F7dQ9aFaLcvN",

"outputId": "80457c21-8115-4517-9d5e-751331708809"

},

"source": \[

"loadsDf = addScores(df)\\n",

"\\n",

"loadsDf.head()"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/html": \[

"\<div\>\\n",

"\<style scoped\>\\n",

" .dataframe tbody tr th:only-of-type {\\n",

" vertical-align: middle;\\n",

" }\\n",

"\\n",

" .dataframe tbody tr th {\\n",

" vertical-align: top;\\n",

" }\\n",

"\\n",

" .dataframe thead th {\\n",

" text-align: right;\\n",

" }\\n",

"\</style\>\\n",

"\<table border=\\"1\\" class=\\"dataframe\\"\>\\n",

" \<thead\>\\n",

" \<tr style=\\"text-align: right;\\"\>\\n",

" \<th\>\</th\>\\n",

" \<th\>ESCALATIONS_DELAY\</th\>\\n",

" \<th\>LOADS\</th\>\\n",

" \<th\>DELAY_WILSON_SCORE\</th\>\\n",

" \<th\>DELAY_MEAN\</th\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>ID\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \</tr\>\\n",

" \</thead\>\\n",

" \<tbody\>\\n",

" \<tr\>\\n",

" \<th\>001b992a-d64d-46f4-98a8-4ae08d300945\</th\>\\n",

" \<td\>3\</td\>\\n",

" \<td\>8\</td\>\\n",

" \<td\>0.136842\</td\>\\n",

" \<td\>0.375000\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>00241b8b-d8f0-4623-a0fe-03f679a337ad\</th\>\\n",

" \<td\>2\</td\>\\n",

" \<td\>11\</td\>\\n",

" \<td\>0.051367\</td\>\\n",

" \<td\>0.181818\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>002f1b0a-8372-4534-b8f1-5912c03d126c\</th\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>5\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>003650f8-3ca5-4292-827a-4456e76a73a8\</th\>\\n",

" \<td\>3\</td\>\\n",

" \<td\>10\</td\>\\n",

" \<td\>0.107789\</td\>\\n",

" \<td\>0.300000\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>004371a5-4624-4821-a66b-525e2d1cd455\</th\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>5\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \</tr\>\\n",

" \</tbody\>\\n",

"\</table\>\\n",

"\</div\>"

\],

"text/plain": \[

" ESCALATIONS_DELAY LOADS \\\\\\n",

"ID \\n",

"001b992a-d64d-46f4-98a8-4ae08d300945 3 8 \\n",

"00241b8b-d8f0-4623-a0fe-03f679a337ad 2 11 \\n",

"002f1b0a-8372-4534-b8f1-5912c03d126c 0 5 \\n",

"003650f8-3ca5-4292-827a-4456e76a73a8 3 10 \\n",

"004371a5-4624-4821-a66b-525e2d1cd455 0 5 \\n",

"\\n",

" DELAY_WILSON_SCORE DELAY_MEAN \\n",

"ID \\n",

"001b992a-d64d-46f4-98a8-4ae08d300945 0.136842 0.375000 \\n",

"00241b8b-d8f0-4623-a0fe-03f679a337ad 0.051367 0.181818 \\n",

"002f1b0a-8372-4534-b8f1-5912c03d126c 0.000000 0.000000 \\n",

"003650f8-3ca5-4292-827a-4456e76a73a8 0.107789 0.300000 \\n",

"004371a5-4624-4821-a66b-525e2d1cd455 0.000000 0.000000 "

\]

},

"execution_count": 6,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "4a89h8E0LcvO"

},

"source": \[

"\#\# Validation \\n",

"\\n",

"One of the goals is \\"Assess the ability to predict this value.\\"
Let's do a simple chronological validation.\\n",

"\\n",

"Validation metric: RMS to start, but for future work this might be a
good candidate to use a ranking metric, since that is probably closer to
what we care about.\\n",

"\\n",

"\#\#\# Notes\\n",

"\\n",

"- I trried to use only the last week of data, but it was partial and
there are only 20 loads, so I went back one more.\\n",

"\\n",

"\#\#\# Conclusions\\n",

"\\n",

"I am pleasantly surprised that the Wilson score has a lower RMSE than
the mean, but for this example it did! I ran it for a few different
weeks, and the results are pretty consistent. Nice!"

\]

},

{

"cell_type": "code",

"metadata": {

"id": "sdGDuPp6LcvO",

"outputId": "86355097-5b0a-4f90-cb7b-2f0817a82c1e"

},

"source": \[

"maxDate = \\"2021-02-15 00:00:00\\" \# \\"2021-02-22 00:00:00\\" \#
max(df\['CALENDAR_WEEK'\])\\n",

"validationDf = df\[df\['CALENDAR_WEEK'\] == maxDate\]\\n",

"validationDf = addScores(validationDf)\\n",

"\\n",

"print(f\\"{sum(validationDf\['LOADS'\])} loads\\")\\n",

"print(f\\"{sum(validationDf\['ESCALATIONS_DELAY'\])} delay
escalations\\")"

\],

"execution_count": null,

"outputs": \[

{

"name": "stdout",

"output_type": "stream",

"text": \[

"2963 loads\\n",

"564 delay escalations\\n"

\]

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "WEZrut4qLcvO"

},

"source": \[

"trainDf = df\[df\['CALENDAR_WEEK'\] \< maxDate\] \# I love
ISO-8601\\n",

"\\n",

"trainDf = addScores(trainDf)"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "uaKnlDJhLcvO",

"outputId": "f5db0568-f2f5-4aac-a415-7e072ded1c6f"

},

"source": \[

"metricsDf = pd.merge(validationDf, trainDf, on=\\"ID\\",
how=\\"inner\\", suffixes=(\\"\_TEST\\",\\"\_TRAIN\\"))\\n",

"metricsDf.head()"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/html": \[

"\<div\>\\n",

"\<style scoped\>\\n",

" .dataframe tbody tr th:only-of-type {\\n",

" vertical-align: middle;\\n",

" }\\n",

"\\n",

" .dataframe tbody tr th {\\n",

" vertical-align: top;\\n",

" }\\n",

"\\n",

" .dataframe thead th {\\n",

" text-align: right;\\n",

" }\\n",

"\</style\>\\n",

"\<table border=\\"1\\" class=\\"dataframe\\"\>\\n",

" \<thead\>\\n",

" \<tr style=\\"text-align: right;\\"\>\\n",

" \<th\>\</th\>\\n",

" \<th\>ESCALATIONS_DELAY_TEST\</th\>\\n",

" \<th\>LOADS_TEST\</th\>\\n",

" \<th\>DELAY_WILSON_SCORE_TEST\</th\>\\n",

" \<th\>DELAY_MEAN_TEST\</th\>\\n",

" \<th\>ESCALATIONS_DELAY_TRAIN\</th\>\\n",

" \<th\>LOADS_TRAIN\</th\>\\n",

" \<th\>DELAY_WILSON_SCORE_TRAIN\</th\>\\n",

" \<th\>DELAY_MEAN_TRAIN\</th\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>ID\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \<th\>\</th\>\\n",

" \</tr\>\\n",

" \</thead\>\\n",

" \<tbody\>\\n",

" \<tr\>\\n",

" \<th\>00e9b4c6-0e2b-4b36-8301-d2b023ec4c50\</th\>\\n",

" \<td\>3\</td\>\\n",

" \<td\>4\</td\>\\n",

" \<td\>0.300636\</td\>\\n",

" \<td\>0.750000\</td\>\\n",

" \<td\>68\</td\>\\n",

" \<td\>369\</td\>\\n",

" \<td\>0.148045\</td\>\\n",

" \<td\>0.184282\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>01149542-a459-46fc-b505-b0ef757fd539\</th\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>1\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>2\</td\>\\n",

" \<td\>25\</td\>\\n",

" \<td\>0.022220\</td\>\\n",

" \<td\>0.080000\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>014a0ffd-dbfa-4056-b38e-8a4c51056397\</th\>\\n",

" \<td\>1\</td\>\\n",

" \<td\>3\</td\>\\n",

" \<td\>0.061490\</td\>\\n",

" \<td\>0.333333\</td\>\\n",

" \<td\>26\</td\>\\n",

" \<td\>67\</td\>\\n",

" \<td\>0.280489\</td\>\\n",

" \<td\>0.388060\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>01fa8035-a806-4a09-8de0-596ed0cd4e8f\</th\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>3\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>10\</td\>\\n",

" \<td\>99\</td\>\\n",

" \<td\>0.055796\</td\>\\n",

" \<td\>0.101010\</td\>\\n",

" \</tr\>\\n",

" \<tr\>\\n",

" \<th\>020ba5a4-9b5c-46e2-8ff1-d767f9f0f515\</th\>\\n",

" \<td\>0\</td\>\\n",

" \<td\>2\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>0.000000\</td\>\\n",

" \<td\>34\</td\>\\n",

" \<td\>86\</td\>\\n",

" \<td\>0.298623\</td\>\\n",

" \<td\>0.395349\</td\>\\n",

" \</tr\>\\n",

" \</tbody\>\\n",

"\</table\>\\n",

"\</div\>"

\],

"text/plain": \[

" ESCALATIONS_DELAY_TEST LOADS_TEST \\\\\\n",

"ID \\n",

"00e9b4c6-0e2b-4b36-8301-d2b023ec4c50 3 4 \\n",

"01149542-a459-46fc-b505-b0ef757fd539 0 1 \\n",

"014a0ffd-dbfa-4056-b38e-8a4c51056397 1 3 \\n",

"01fa8035-a806-4a09-8de0-596ed0cd4e8f 0 3 \\n",

"020ba5a4-9b5c-46e2-8ff1-d767f9f0f515 0 2 \\n",

"\\n",

" DELAY_WILSON_SCORE_TEST \\\\\\n",

"ID \\n",

"00e9b4c6-0e2b-4b36-8301-d2b023ec4c50 0.300636 \\n",

"01149542-a459-46fc-b505-b0ef757fd539 0.000000 \\n",

"014a0ffd-dbfa-4056-b38e-8a4c51056397 0.061490 \\n",

"01fa8035-a806-4a09-8de0-596ed0cd4e8f 0.000000 \\n",

"020ba5a4-9b5c-46e2-8ff1-d767f9f0f515 0.000000 \\n",

"\\n",

" DELAY_MEAN_TEST \\\\\\n",

"ID \\n",

"00e9b4c6-0e2b-4b36-8301-d2b023ec4c50 0.750000 \\n",

"01149542-a459-46fc-b505-b0ef757fd539 0.000000 \\n",

"014a0ffd-dbfa-4056-b38e-8a4c51056397 0.333333 \\n",

"01fa8035-a806-4a09-8de0-596ed0cd4e8f 0.000000 \\n",

"020ba5a4-9b5c-46e2-8ff1-d767f9f0f515 0.000000 \\n",

"\\n",

" ESCALATIONS_DELAY_TRAIN LOADS_TRAIN \\\\\\n",

"ID \\n",

"00e9b4c6-0e2b-4b36-8301-d2b023ec4c50 68 369 \\n",

"01149542-a459-46fc-b505-b0ef757fd539 2 25 \\n",

"014a0ffd-dbfa-4056-b38e-8a4c51056397 26 67 \\n",

"01fa8035-a806-4a09-8de0-596ed0cd4e8f 10 99 \\n",

"020ba5a4-9b5c-46e2-8ff1-d767f9f0f515 34 86 \\n",

"\\n",

" DELAY_WILSON_SCORE_TRAIN \\\\\\n",

"ID \\n",

"00e9b4c6-0e2b-4b36-8301-d2b023ec4c50 0.148045 \\n",

"01149542-a459-46fc-b505-b0ef757fd539 0.022220 \\n",

"014a0ffd-dbfa-4056-b38e-8a4c51056397 0.280489 \\n",

"01fa8035-a806-4a09-8de0-596ed0cd4e8f 0.055796 \\n",

"020ba5a4-9b5c-46e2-8ff1-d767f9f0f515 0.298623 \\n",

"\\n",

" DELAY_MEAN_TRAIN \\n",

"ID \\n",

"00e9b4c6-0e2b-4b36-8301-d2b023ec4c50 0.184282 \\n",

"01149542-a459-46fc-b505-b0ef757fd539 0.080000 \\n",

"014a0ffd-dbfa-4056-b38e-8a4c51056397 0.388060 \\n",

"01fa8035-a806-4a09-8de0-596ed0cd4e8f 0.101010 \\n",

"020ba5a4-9b5c-46e2-8ff1-d767f9f0f515 0.395349 "

\]

},

"execution_count": 9,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "YO5uLj0iLcvO"

},

"source": \[

"\# Root Mean Square Error\\n",

"def rmse(x,y):\\n",

" return np.sqrt(np.mean((x-y)\*\*2))"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "LYGXFRhELcvP",

"outputId": "9c5b3ed7-0fa4-43d1-c1f9-a4c2f4fc9d3f"

},

"source": \[

"rmse(metricsDf\['DELAY_MEAN_TRAIN'\], metricsDf\['DELAY_MEAN_TEST'\])"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"0.3939428175239132"

\]

},

"execution_count": 11,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "RuYu_jUsLcvP",

"outputId": "2abc3b4d-e2be-493e-d535-dd535e42b559"

},

"source": \[

"rmse(metricsDf\['DELAY_WILSON_SCORE_TRAIN'\],
metricsDf\['DELAY_MEAN_TEST'\])"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"0.30223122277532216"

\]

},

"execution_count": 12,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "0fQOWFIcLcvQ"

},

"source": \[

"\# Monthly Net Revenue\\n",

"\\n",

"\#\# Goal\\n",

"\\n",

"Predict monthly net revenue per carrier.\\n",

"\\n",

"\#\# Approach\\n",

"\\n",

"1. Figure out which fields relate to revenue and cost (looks like
\`REVENUE_HAULED\` and \`ACTUAL_COST\`, but I would want to verify
that.)\\n",

"2. Do the simple predictions\\n",

"3. \<s\>Build a simple time series forecast model\</s\> Ran out of
time. I was just planning a basic regression model, but given the
failure of the average to predict I decided it was better to spend the
time debugging.\\n",

"\\n",

"\#\#\# Very simple predictions \\n",

"\\n",

"Let's try a few \*very\* simple predictions to start:\\n",

"\\n",

"1. Last Week: Predicted week = previous week (per carrier)\\n",

"2. Average: Predicted week = average of all previous weeks (per
carrier)\\n",

"3. Zero: since the mode is 0, what happens if I just predict 0 all the
time?\\n",

"\\n",

"This will establish a baseline for prediction.\\n",

"\\n",

"\#\# Notes\\n",

"\\n",

"- I decided to do weekly predictions because of the format of the data.
Just multiply by 30/7 to get monthly.\\n",

"- This would normally be a different notebook, but I kept them together
to make sharing easier.\\n",

"\\n",

"\#\# Conclusions\\n",

"\\n",

"RMSE seems very high compared to average value, meaning these are bad
predictions. The best model of the three is just predicting zero all the
time.\\n",

"\\n",

"\| Model \| RMSE \|\\n",

"\| - \| - \|\\n",

"\| Last Week \| 895 \|\\n",

"\| Average \| 398 \|\\n",

"\| Zero \| 376 \|\\n",

"\\n",

"I think we have two problems:\\n",

"\\n",

"First, looking at the time series data by carrier it seems like that is
because it is pretty sporatic - almost all values are 0. Predicting rare
events is noisy!\\n",

"\\n",

"!\[image.png\](attachment:image.png)\\n",

"\\n",

"Second, \*\*net\*\* revenue is much noisier than either gross or costs,
because the difference between two numbers that are close greatly
magnifies the noise. I was hoping the averaging would overcome that
some, but it clearly didn't (or there is a bug in the code or an error
in my understanding of the problem). Reminds me of \[this XKCD
comic\](https://www.explainxkcd.com/wiki/index.php/2295:\_Garbage_Math).\\n",

"\\n",

"It is also worth noting that these predictions may be useful for
long-term forecasting even if they have high error in single week
prediction.\\n",

"\\n",

"\\n",

"\#\# Future Work\\n",

"\\n",

"- Do proper LTV by adding sales, marketing, and support costs (see
notes at top)\\n",

"- Once there is enough history for seasonality, etc. \[Facebook
Prophet\](https://facebook.github.io/prophet/) has given me good reults
in the past.\\n",

"- ML model with carrier features (didn't attempt because I don't know
the problem area / data well enough)."

\]

},

{

"cell_type": "code",

"metadata": {

"id": "35-e8GhlLcvQ"

},

"source": \[

"\# Grab a fresh copy of the data\\n",

"df = pd.read_csv(\\"carrier_valuation_takehome_data.csv\\")\\n",

"\\n",

"\# Replace this with a more accurate net value later\\n",

"df\['NET'\] = df\['REVENUE_HAULED'\] - df\['ACTUAL_COST'\]\\n",

"\\n",

"\# Last full week as validation data\\n",

"maxDate = \\"2021-02-22 00:00:00\\" \# max(df\['CALENDAR_WEEK'\])\\n",

"testSeries = df\[df\['CALENDAR_WEEK'\] ==
maxDate\].set_index(\\"ID\\")\['NET'\]\\n",

"trainDf = df\[df\['CALENDAR_WEEK'\] \< maxDate\]"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "YacUhZ0HLcvQ"

},

"source": \[

"def rmse_join(leftSeries, rightSeries, indexName = \\"ID\\"):\\n",

" \# Pass in two Pandas series with ID index, join on indecies and do
rmse\\n",

" metricsDf = pd.merge(leftSeries, rightSeries, how=\\"inner\\",
on=indexName)\\n",

" return rmse(metricsDf.iloc\[:,0\],metricsDf.iloc\[:,1\])"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "tLToL71iLcvR",

"outputId": "c2ba2373-fd73-44af-8ad6-22e2945633b0"

},

"source": \[

"\#\# LAST WEEK\\n",

"\\n",

"\# Would have been faster to hard code this, but I wanted to show
robust coding\\n",

"weekAgoDatetime = datetime.fromisoformat(maxDate) -
timedelta(days=7)\\n",

"weekBeforeMaxDate =
datetime.isoformat(weekAgoDatetime).replace(\\"T\\", \\" \\") \# ISO
mismatch\\n",

"\\n",

"predSeries = df\[df\['CALENDAR_WEEK'\] ==
weekBeforeMaxDate\].set_index(\\"ID\\")\['NET'\]\\n",

"rmse_join(predSeries, testSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"894.7376578958671"

\]

},

"execution_count": 15,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "ON0s8tRgLcvR",

"outputId": "a086cc1f-ebb5-4f7a-9552-8e53ae1ab444"

},

"source": \[

"\#\# AVERAGE\\n",

"meanNetSeries = trainDf.groupby(\\"ID\\")\['NET'\].mean()\\n",

"rmse_join(meanNetSeries, testSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"397.79333915566735"

\]

},

"execution_count": 16,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "eFnQsjtpLcvR",

"outputId": "8b1830ac-d336-46b2-a34d-6095cf199c93"

},

"source": \[

"\#\# JUST PREDICT ZERO\\n",

"rmse_join(testSeries-testSeries, testSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"376.0191153362688"

\]

},

"execution_count": 17,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "-1bmIOTTLcvR"

},

"source": \[

"\# Forecasting \`REVENUE_HAULED\`\\n",

"\\n",

"\#\# Goal\\n",

"\\n",

"Predict future values of \`REVENUE_HAULED\`.\\n",

"\\n",

"\#\# Notes\\n",

"\\n",

"- Pros: This should be easier than net revenue because it eliminates
the subtraction issue described above.\\n",

"- Cons: Possibly not as helpful for planning?\\n",

"\\n",

"\#\#\# Changes from above work:\\n",

"\\n",

"- Looking at \`REVENUE_HAULED\` in Tableau, it seems like there are is
significant temporal correlation (\\"streaks\\"), which means that
regression models should have predictive power\\n",

"\\n",

"\#\# Conclusions\\n",

"\\n",

"Note: RMSE is higher than previous example because of change from net
to gross revenue.\\n",

"\\n",

"After updating the max date to Feb 15 (see below for motivation) I
get:\\n",

"\\n",

"\| Model \| RMSE \|\\n",

"\| - \| - \|\\n",

"\| Last Week \| 3,014 \|\\n",

"\| Average \| 2,402 \|\\n",

"\| Zero \| 2,416 \|\\n",

"\\n",

"Yes! The average now performs \*slightly\* better than predicting all
zeros. This gives me hope that this is a tractable problem."

\]

},

{

"cell_type": "code",

"metadata": {

"id": "eJXQc_dZLcvS"

},

"source": \[

"\# Last full week as validation data (fresh copy)\\n",

"maxDate = \\"2021-02-15 00:00:00\\" \# \\"2021-02-22 00:00:00\\" \#
max(df\['CALENDAR_WEEK'\])\\n",

"\\n",

"weekAgoDatetime = datetime.fromisoformat(maxDate) -
timedelta(days=7)\\n",

"weekBeforeMaxDate =
datetime.isoformat(weekAgoDatetime).replace(\\"T\\", \\" \\") \# ISO
mismatch\\n",

"\\n",

"testSeries = df\[df\['CALENDAR_WEEK'\] ==
maxDate\].set_index(\\"ID\\")\['REVENUE_HAULED'\]\\n",

"trainDf = df\[df\['CALENDAR_WEEK'\] \< maxDate\]"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "MthhZjEWLcvS",

"outputId": "35229da7-b757-4a58-c9a2-cdba36c42c3d"

},

"source": \[

"\#\# LAST WEEK\\n",

"predSeries = trainDf\[trainDf\['CALENDAR_WEEK'\] ==
weekBeforeMaxDate\].set_index(\\"ID\\")\['REVENUE_HAULED'\]\\n",

"rmse_join(predSeries, testSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"3013.860472414932"

\]

},

"execution_count": 19,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "GCX6jetzLcvS",

"outputId": "823ea26c-adeb-451e-d261-f8fc85977f57"

},

"source": \[

"\#\# AVERAGE\\n",

"meanSeries = trainDf.groupby(\\"ID\\")\['REVENUE_HAULED'\].mean()\\n",

"rmse_join(meanNetSeries, testSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"2402.2064113190945"

\]

},

"execution_count": 20,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "Eil_aPjwLcvS",

"outputId": "45203e6c-895d-4de7-b1b5-eff819d70fb6"

},

"source": \[

"\#\# JUST PREDICT ZERO\\n",

"rmse_join(meanNetSeries-meanNetSeries, testSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"2416.095746499544"

\]

},

"execution_count": 21,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "yMGlbagJLcvS"

},

"source": \[

"\#\# Debugging\\n",

"\\n",

"Ok, something seems up here. I can understand that predicting net
revenue is hard, but it seems like using the average to predict
\`REVENUE_HAULED\` should beat zero.\\n",

"\\n",

"I explored a bunch of stuff, and when it matched my expectations I
deleted it. Basically looking for where this error could come from.\\n",

"\\n",

"In the end I found that the Feb 22 data is likely also partial, which
was causing the bad predictions above. Revising work with the 15th as
the validation data."

\]

},

{

"cell_type": "code",

"metadata": {

"id": "dmsWNMcvLcvS",

"outputId": "36203c48-bc45-42de-a98b-339c6ff6caa8"

},

"source": \[

"np.mean(meanSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"194.59957212607705"

\]

},

"execution_count": 22,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "EoApsUnrLcvT",

"outputId": "3595eb87-61bc-457b-b5c3-7821e770e39b"

},

"source": \[

"\# This should be about the same as the mean if data revenue per
carrier is even sort of stationary\\n",

"np.mean(testSeries)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"157.11335875358742"

\]

},

"execution_count": 23,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "hZAV5m54LcvT"

},

"source": \[

"\# Ergodicity\\n",

"revenueList = \[\]\\n",

"dateList = \[\]\\n",

"maxDate = max(df\['CALENDAR_WEEK'\])\\n",

"for weeksAgo in range(16):\\n",

" ts = datetime.fromisoformat(maxDate) -
timedelta(days=7\*weeksAgo)\\n",

" date = datetime.isoformat(ts).replace(\\"T\\", \\" \\") \# ISO
mismatch\\n",

" dateList.append(date)\\n",

" revenueList.append(np.sum(df\[df\['CALENDAR_WEEK'\] ==
date\].set_index(\\"ID\\")\['REVENUE_HAULED'\]))"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "QwfRe5ReLcvT",

"outputId": "bf620a31-65a4-4d50-c401-60ed1e51b50b"

},

"source": \[

"plt.plot(dateList, revenueList, 'o')\\n",

"plt.xticks(rotation = 90)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"(\[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15\],\\n",

" \<a list of 16 Text xticklabel objects\>)"

\]

},

"execution_count": 25,

"metadata": {},

"output_type": "execute_result"

},

{

"data": {

"image/png":
"iVBORw0KGgoAAAANSUhEUgAAAZAAAAFaCAYAAAAwzAseAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO29fbhdVX3v+/kaoI1aSYAQIYETekhRWisvuyE9PG2tKAkej+GoVPTeErj0xOPxrb3nonCu92DVewylt7ZUoc0VaujpNXCRQk4FQgrW+3iOQHYIgryEpCqQgBBMQqVSy8vv/jHHJmsv1n5Zc4+59pjZ38/zzGet9ZtzfscY8zfW+q055nhRRGCMMcb0y6umOwPGGGPaiQOIMcaYWjiAGGOMqYUDiDHGmFo4gBhjjKmFA4gxxphaTCqASPo9SfdL+q6kr0r6WUnHSLpT0jZJ10g6KB37M+nz9rR/UYfORcm+VdKyDvvyZNsu6cIOe99pGGOMGQwTBhBJC4CPAUMR8UvALOBs4BLgCxGxGNgDnJ9OOR/YExHHAl9IxyHp+HTeLwLLgcslzZI0C/gScAZwPPD+dCz9pmGMMWZwHNDHcbMlPQ+8GngCeCvwgbR/LfBp4ApgRXoPcB3wRUlK9nUR8VPg+5K2A0vScdsj4nsAktYBKyQ92G8aMc6oyMMOOywWLVo0yeIaY4wB2Lx589MRMa/XvgkDSETslPSHwKPAc8CtwGZgb0S8kA7bASxI7xcAj6VzX5D0DHBost/RId15zmNd9lPSOf2m8XRn3iWtAlYBHH300QwPD09UXGOMMR1IemSsfZNpwppL9Y//GOBI4DVUzU3djPz71xj7ctnHS2O0IWJNRAxFxNC8eT0DqDHGmJpM5iH624DvR8SuiHgeuB74V8AcSSN3MAuBx9P7HcBRAGn/wcDuTnvXOWPZn66RhjHGmAExmQDyKLBU0qvTs4zTgAeAbwDvTcesBG5M79enz6T9t6dnE+uBs1MPqmOAxcBdwCZgcepxdRDVg/b16Zx+0zDGGDMgJvMM5E5J1wF3Ay8AW4A1wNeBdZI+l2xXplOuBP4yPSTfTRUQiIj7JV1LFXxeAD4cES8CSPoIsIGqh9dVEXF/0vpkP2kYY4wZHJopf9yHhobCD9GNMaY/JG2OiKFe+ybbjdcYY0zLuGHLTi7dsJXH9z7HkXNmc8Gy4zjzxAUTnzhJHECMMWY/5IYtO7no+vt47vkXAdi59zkuuv4+gGxBxHNhGWPMfsilG7a+HDxGeO75F7l0w9ZsaTiAGGPMfsjje5/ry14HBxBjjNkPOXLO7L7sdXAAMcaY/ZALlh3H7ANnjbLNPnAWFyw7LlsafohujDH7ISMPyt0LyxhjTN+ceeKCrAGjGzdhGWOMqYUDiDHGmFo4gBhjjKmFA4gxxphaOIAYY4yphQOIMcaYWjiAGGOMqYUDiDHGmFo4gBhjjKnFhAFE0nGS7unY/kHS70o6RNJGSdvS69x0vCRdJmm7pHslndShtTIdv03Syg77yZLuS+dcltZep04axhhjBsOEASQitkbECRFxAnAy8BPgr4ELgdsiYjFwW/oMcAawOG2rgCugCgbAxcApwBLg4pGAkI5Z1XHe8mTvKw1jjDGDo98mrNOAv4+IR4AVwNpkXwucmd6vAK6OijuAOZKOAJYBGyNid0TsATYCy9O+10XEt6NaoP3qLq1+0jDGGDMg+p1M8Wzgq+n9/Ih4AiAinpB0eLIvAB7rOGdHso1n39HDXieNJzozK2kV1R0KRx99dF8FNcaY8Wh6vfE2MOkAIukg4F3ARRMd2sMWNex10hhtiFgDrAEYGhqaSNPgL4Uxk2EQ6423gX6asM4A7o6IJ9PnJ0eajdLrU8m+Aziq47yFwOMT2Bf2sNdJw0yBkS/Fzr3PEez7UtywZed0Z82YohjEeuNtoJ8A8n72NV8BrAdGelKtBG7ssJ+TekotBZ5JzVAbgNMlzU0Pz08HNqR9P5a0NPW+OqdLq580zBRoy5fihi07OXX17Rxz4dc5dfXtDnBm4AxivfE2MKkmLEmvBt4OfLDDvBq4VtL5wKPAWcl+E/AOYDtVj63zACJit6TPApvScZ+JiN3p/YeArwCzgZvT1ncaZmo08aXI3STmpgNTAkfOmc3OHt+LnOuNt4FJBZCI+AlwaJftR1S9srqPDeDDY+hcBVzVwz4M/FIPe99pmPrk/lI08WM/3l2SA8j+Q+nP4i5Ydtyoug351xtvAx6Jbl7mgmXHMfvAWaNsU/lSNNEk5qaD/Z82PIs788QFfP7db2LBnNkIWDBnNp9/95uKCnKDwGuim5cZqfy5/vk18WPvpoP9n7bcZTa93ngbcAAxo8j5pWjix95NB/s/vstsD27CMo2Ru0kM3HQwExjrD4bvMsvDdyCmMXI3iXXqOmDsv/gusz04gJhG8Y+96Zem/niY/DiAGJOZ0rugtgH/8WgHDiDGZMQDHcvFgT0/fohuTEbaMh3MTKMNY0vaiAOIMRlxF9QycWBvBgcQYzLiLqhl4sDeDA4gxmSkibEvZuo4sDeDA4gxGfFAxzJxYG8G98IyJjPugloeHlvSDA4gxpgZgQN7ftyEZYwxphYOIMYYY2oxqQAiaY6k6yQ9JOlBSb8q6RBJGyVtS69z07GSdJmk7ZLulXRSh87KdPw2SSs77CdLui+dc1laG506aRhjjBkMk70D+RPgloh4A/Bm4EHgQuC2iFgM3JY+A5wBLE7bKuAKqIIBcDFwCrAEuHgkIKRjVnWctzzZ+0rDGGPM4JgwgEh6HfDrwJUAEfHPEbEXWAGsTYetBc5M71cAV0fFHcAcSUcAy4CNEbE7IvYAG4Hlad/rIuLbaa3zq7u0+knDmP2SG7bs5NTVt3PMhV/n1NW3ewoOUwSTuQP5eWAX8BeStkj6sqTXAPMj4gmA9Hp4On4B8FjH+TuSbTz7jh52aqQxCkmrJA1LGt61a9ckimpMeXgeJ1MqkwkgBwAnAVdExInAP7KvKakX6mGLGvbxmNQ5EbEmIoYiYmjevHkTSBpTJp7HyZTKZALIDmBHRNyZPl9HFVCeHGk2Sq9PdRx/VMf5C4HHJ7Av7GGnRhrG7Hd4HidTKhMGkIj4IfCYpJEx/6cBDwDrgZGeVCuBG9P79cA5qafUUuCZ1Py0AThd0tz08Px0YEPa92NJS1Pvq3O6tPpJw5j9Ds/jZEplsiPRPwr8laSDgO8B51EFn2slnQ88CpyVjr0JeAewHfhJOpaI2C3ps8CmdNxnImJ3ev8h4CvAbODmtAGs7ieNmYgXydn/8RrhplRUdXza/xkaGorh4eHpzkZWule/g+qHxZP37X/4j4KZLiRtjoihXvs8F1aLGe/hqn9c9i88j5MpEU9l0mL8cNUYM504gLQYP1w1xkwnDiAtxovkGGOmEz8DaTFeJMcYM504gLQcP1w1xkwXbsIyxhhTCwcQY4wxtXAAMcYYUwsHEGOMMbXwQ3Qz4/E0IcbUwwHEzGi65xMbWawJcBAxZgLchGVmNF6syZj6OICYGY3nEzOmPg4gZkbj+cSMqY8DiJnReD4xY+rjh+hmRuP5xIypz6QCiKQfAD8GXgReiIghSYcA1wCLgB8AvxURe9K65n9CteTsT4BzI+LupLMS+FSS/VxErE32k9m3pO1NwMcjIuqkYUy/eD4xY+rRTxPWb0bECR1LG14I3BYRi4Hb0meAM4DFaVsFXAGQgsHFwCnAEuBiSXPTOVekY0fOW14nDWOMMYNjKs9AVgBr0/u1wJkd9quj4g5gjqQjgGXAxojYHRF7gI3A8rTvdRHx7agWaL+6S6ufNIwxxgyIyQaQAG6VtFnSqmSbHxFPAKTXw5N9AfBYx7k7km08+44e9jppjELSKknDkoZ37do1yaIaY4yZDJN9iH5qRDwu6XBgo6SHxjlWPWxRwz4ekzonItYAawCGhoYm0jTGGNMHk7oDiYjH0+tTwF9TPcN4cqTZKL0+lQ7fARzVcfpC4PEJ7At72KmRhjHGmAExYQCR9BpJPzfyHjgd+C6wHliZDlsJ3JjerwfOUcVS4JnU/LQBOF3S3PTw/HRgQ9r3Y0lLU++qc7q0+knDGGPMgJhME9Z84K+r33YOAP6fiLhF0ibgWknnA48CZ6Xjb6LqXrudqovteQARsVvSZ4FN6bjPRMTu9P5D7OvGe3PaAFb3k4YxZnJ4BmKTA1Udn/Z/hoaGYnh4eLqzYcy00z0DMVSj7z//7jc5iEwjpQZ1SZs7hm+MwlOZGDPD8AzE5TES1HfufY5g37ICN2zZOd1ZGxcHEGNmGJ6BuDzaGtQ9F5YxM4wj58xmZ49gMZUZiEttfmkLbQ3qvgMxZoaRewbitja/lERblxVwADFmhnHmiQv4/LvfxII5sxGwYM7sKT1Ab2vzS0m0dVkBN2EZMwPJOQNxW5tfSqKtywo4gBhjpkQTz1RmIm1cVsBNWMaYKdHW5hczdXwHYoyZEm1tfjFTxwHEGDNl2tj8YqaOm7CMMcbUwgHEGGNMLRxAjDHG1MIBxBhjTC0cQIwxxtTCAcQYY0wtHECMMcbUYtIBRNIsSVsk/U36fIykOyVtk3SNpIOS/WfS5+1p/6IOjYuSfaukZR325cm2XdKFHfa+0zDGGDMY+rkD+TjwYMfnS4AvRMRiYA9wfrKfD+yJiGOBL6TjkHQ8cDbwi8By4PIUlGYBXwLOAI4H3p+O7TsNY4wxg2NSAUTSQuBfA19OnwW8FbguHbIWODO9X5E+k/aflo5fAayLiJ9GxPeB7cCStG2PiO9FxD8D64AVNdMwxhgzICZ7B/LHwCeAl9LnQ4G9EfFC+rwDGJnHYAHwGEDa/0w6/mV71zlj2eukMQpJqyQNSxretWvXJItqjDFmMkwYQCS9E3gqIjZ3mnscGhPsy2WfKP19hog1ETEUEUPz5s3rcYoxxpi6TGYyxVOBd0l6B/CzwOuo7kjmSDog3QEsBB5Px+8AjgJ2SDoAOBjY3WEfofOcXvana6RhjDFmQEx4BxIRF0XEwohYRPUQ/PaI+J+AbwDvTYetBG5M79enz6T9t0dEJPvZqQfVMcBi4C5gE7A49bg6KKWxPp3TbxrGGGMGxFSmc/8ksE7S54AtwJXJfiXwl5K2U90VnA0QEfdLuhZ4AHgB+HBEvAgg6SPABmAWcFVE3F8nDWOMMYNDM+WP+9DQUAwPD093Nrhhy04vvGOMaQ2SNkfEUK99XlBqgNywZScXXX8fzz3/IgA79z7HRdffB+AgYoxpHZ7KZIBcumHry8FjhOeef5FLN2ydphwZY0x9HEAGyON7n+vLbowxJeMAMkCOnDO7L7sxxpSMA8gAuWDZccw+cNYo2+wDZ3HBsuOmKUfGGFMfP0QfICMPyt0LyxizP+AAMmDOPHGBA4YxZr/ATVjGGGNq4QBijDGmFg4gxhhjauEAYowxphYOIMYYY2rhAGKMMaYWDiDGGGNq4QBijDGmFg4gxhhjauEAYowxphYTBhBJPyvpLknfkXS/pN9P9mMk3Slpm6Rr0nrmpDXPr5G0Pe1f1KF1UbJvlbSsw7482bZLurDD3ncaxhhjBsNk7kB+Crw1It4MnAAsl7QUuAT4QkQsBvYA56fjzwf2RMSxwBfScUg6nmrt8l8ElgOXS5olaRbwJeAM4Hjg/elY+k3DGGPM4JgwgETFs+njgWkL4K3Adcm+FjgzvV+RPpP2nyZJyb4uIn4aEd8HtgNL0rY9Ir4XEf8MrANWpHP6TcMYY8yAmNQzkHSncA/wFLAR+Htgb0S8kA7ZAYxMMbsAeAwg7X8GOLTT3nXOWPZDa6TRne9VkoYlDe/atWsyRTXGGDNJJhVAIuLFiDgBWEh1x/DGXoel1153ApHRPl4aow0RayJiKCKG5s2b1+MUY4wxdemrF1ZE7AX+DlgKzJE0sp7IQuDx9H4HcBRA2n8wsLvT3nXOWPana6RhjDFmQEymF9Y8SXPS+9nA24AHgW8A702HrQRuTO/Xp8+k/bdHRCT72akH1THAYuAuYBOwOPW4OojqQfv6dE6/aRhjjBkQk1mR8Ahgbeot9Srg2oj4G0kPAOskfQ7YAlyZjr8S+EtJ26nuCs4GiIj7JV0LPAC8AHw4Il4EkPQRYAMwC7gqIu5PWp/sJw1jjDGDQzPlj/vQ0FAMDw9PdzaMMaZVSNocEUO99nkkujHGmFo4gBhjjKmFA4gxxphaOIAYY4yphQOIMcaYWjiAGGOMqYUDiDHGmFo4gBhjjKmFA4gxxphaOIAYY4yphQOIMcaYWjiAGGOMqYUDiDHGmFo4gBhjjKmFA4gxxphaOIAYY4ypxWSWtD1K0jckPSjpfkkfT/ZDJG2UtC29zk12SbpM0nZJ90o6qUNrZTp+m6SVHfaTJd2XzrlMkuqmYYwxZjBM5g7kBeA/RsQbgaXAhyUdD1wI3BYRi4Hb0meAM6jWO18MrAKugCoYABcDpwBLgItHAkI6ZlXHecuTva80jDHGDI4JA0hEPBERd6f3PwYeBBYAK4C16bC1wJnp/Qrg6qi4A5gj6QhgGbAxInZHxB5gI7A87XtdRHw7qvV1r+7S6icNY4wxA6KvZyCSFgEnAncC8yPiCaiCDHB4OmwB8FjHaTuSbTz7jh52aqRhjDFmQEw6gEh6LfA14Hcj4h/GO7SHLWrYx83OZM6RtErSsKThXbt2TSBpjDGmHyYVQCQdSBU8/ioirk/mJ0eajdLrU8m+Aziq4/SFwOMT2Bf2sNdJYxQRsSYihiJiaN68eZMpqjHGmEkymV5YAq4EHoyIP+rYtR4Y6Um1Erixw35O6im1FHgmNT9tAE6XNDc9PD8d2JD2/VjS0pTWOV1a/aRhjDFmQBwwiWNOBX4buE/SPcn2n4DVwLWSzgceBc5K+24C3gFsB34CnAcQEbslfRbYlI77TETsTu8/BHwFmA3cnDb6TcMYY8zgUNXxaf9naGgohoeHpzsbxhjTKiRtjoihXvs8Et0YY0wtHECMMcbUwgHEGGNMLRxAjDHG1MIBxBhjTC0cQIwxxtTCAcQYY0wtHECMMcbUwgHEGGNMLRxAjDHG1MIBxBhjTC0cQIwxxtTCAcQYY0wtHECMMcbUwgHEGGNMLRxAjDHG1MIBxBhjTC0msyb6VZKekvTdDtshkjZK2pZe5ya7JF0mabukeyWd1HHOynT8NkkrO+wnS7ovnXNZWhe9VhrGGGMGx2TuQL4CLO+yXQjcFhGLgdvSZ4AzgMVpWwVcAVUwAC4GTgGWABePBIR0zKqO85bXScMYY8xgmTCARMT/B+zuMq8A1qb3a4EzO+xXR8UdwBxJRwDLgI0RsTsi9gAbgeVp3+si4ttRLc5+dZdWP2kYY4wZIHWfgcyPiCcA0uvhyb4AeKzjuB3JNp59Rw97nTRegaRVkoYlDe/atauvAhpjjBmf3A/R1cMWNex10nilMWJNRAxFxNC8efMmkDXGGNMPdQPIkyPNRun1qWTfARzVcdxC4PEJ7At72OukYYwxZoDUDSDrgZGeVCuBGzvs56SeUkuBZ1Lz0wbgdElz08Pz04ENad+PJS1Nva/O6dLqJw1jjDED5ICJDpD0VeAtwGGSdlD1ploNXCvpfOBR4Kx0+E3AO4DtwE+A8wAiYrekzwKb0nGfiYiRB/MfourpNRu4OW30m4YxxpjBoqrz0/7P0NBQDA8PT3c2jDGmVUjaHBFDvfZ5JLoxxphaOIAYY4yphQOIMcaYWjiAGGOMqYUDiDHGmFo4gBhjjKmFA4gxxphaOIAYY4yphQOIMcaYWjiAGGOMqYUDiDHGmFo4gBhjjKmFA4gxxphaOIAYY4yphQOIMcaYWjiAGGOMqYUDiDHGmFpMuKRtqUhaDvwJMAv4ckSszp3GDVt2cumGrTy+9zmOnDObC5Ydx5knLsidjDHGtJJWBhBJs4AvAW8HdgCbJK2PiAdypXHDlp1cdP19PPf8iwDs3PscF11/H4CDiDHG0N4mrCXA9oj4XkT8M7AOWJEzgUs3bH05eIzw3PMvcumGrTmTMcaY1tLWALIAeKzj845kG4WkVZKGJQ3v2rWrrwQe3/tcX3ZjjJlptDWAqIctXmGIWBMRQxExNG/evL4SOHLO7L7sxhgz02hrANkBHNXxeSHweM4ELlh2HLMPnDXKNvvAWVyw7LicyRhjTGtp5UN0YBOwWNIxwE7gbOADORMYeVDuXljGGNObVgaQiHhB0keADVTdeK+KiPtzp3PmiQscMIwxZgxaGUAAIuIm4KbpzocxxsxU2voMxBhjzDTjAGKMMaYWDiDGGGNq4QBijDGmFop4xfi7/RJJu4BHap5+GPB0xuw0oTnT9JrQLF2vCc2ZpteEZul6U9X8FxHRcyT2jAkgU0HScEQMlaw50/Sa0CxdrwnNmabXhGbpek1pgpuwjDHG1MQBxBhjTC0cQCbHmhZozjS9JjRL12tCc6bpNaFZul5Tmn4GYowxph6+AzHGGFMLBxBjjDG1aO1kik0iaT7VCocBPB4RT05zlhpnJpa5Ddgv5WGf7MPPQDqQdALwZ8DBVOuMQLVY1V7gP0TE3TV1DwaW01HpgA0RsXe69ZoosyRRrVvfmb+7YgqVrYFrmDWPDeSveL+U7pPcmm3wSdLM6pdx03IA2Yeke4APRsSdXfalwJ9HxJtraJ4DXAzcyuhK93bg9yPi6mnWy1pmSacDlwPbuvJ3LNWX7NZ+9JJm7jJnzWPu/CXNov1Suk+a0CzdJ0kze10cl4jwljZg2zj7ttfU3ArM6WGfCzxcgF7WMgMPAot62I8BHizkGmbNY+78tcEvpfukoTIX7ZMm/DLR5mcgo7lZ0teBq4HHku0o4BzglpqaorqN7OaltG+69XKX+QCqNeu72QkcWEMP8pc5dx5z5w/K90vpPmlCs3SfQDN1cUwcQDqIiI9JOgNYQdV+KCoHfymqFRDr8H8Cd0u6lX2V7miqW8rPTrdeA2W+CtgkaR2jv2RnA1fW0IP81zB3HnPnrw1+Kd0n2TVb4BNooC6Oh5+BDABJc4FljK50GyJiTwl6uZF0PPAuRudvfUQ8MAXN3Ncwax5L9wmUX+aG6k12zZy04bsybloOIPtIvRcuovqHcXgyPwXcCKyOKfRiyN31L5dew2U+BIhcFbeJ7pM585gzf23xS+k+yanZFp8kvcF0Nc79UKXNG7AB+CTw+g7b64ELgY01NU8A7qB6YLYR+FvgoWQ7qQC9rGWmul1eR/XF2pa2p5JtUSHXMGsec+evDX4p3ScNlblonzRVF8dNL7dgmzdga519E2jeA5zSw74U+E4BelnLDHwbeB8wq8M2i6pd945CrmHWPObOXxv8UrpPGipz0T5pqi6Om15uwTZvVH2nPwHM77DNp/rX8bc1NXN3/cutl7XME+RvzH0FXcO+85g7f23wS+k+aajMRfukCb9MtLkX1mjeR3U7+s3UhhjAk8B64Ldqaubu+pdbL3eZN0u6HFjblb+VwJYaepC/zLnz2ET379L9UrpPmtAs3SfQTF0cEz9EHwBjdP1bHzW7BufWy4mkg4DzGZ2/x4D/BlwZET+tqZutzE3ksWSfQPllbih/jdTFXLThuzJhWg4gvZF0UnTMbdP9eX9kJpa5Ddgv5WGfVHg697H50ASf+0bSqvE+T7cemcss6Z3jfa6pmfsaZs1jAz6Bwv1Suk8a0izaJ0mjibo4CgeQMYiIfzfe55p0TyUw1akFsuo1UOZfmeBzHXJfw9x5zJ2/NvildJ9k12yBT6CBuviKBNyENRoNcCrkUpiJZW4D9kt52CejcQDpoKmpkCUtA85kdKW7MSJq9YrIqddEmSW9gX0P8Ubytz4iHuxXq0Mz9zXMmscG8le8X0r3SW7NNvgkaWb1y7hpOYDsQ9JWqkE4e7vsc4E7I+IXamj+MfALVN3qRmbeXEjVrW5bRHx8mvWyllnSJ4H3U42m7czf2cC6iFjdj

17SzF3mrHnMnb+kWbRfSvdJE5ql+yRpZq+L45J7YEmbN+Bh4OAe9oOpP7Cn5xz8VO2RdQYzZdfLWeakd2AP+0ElXcOcecydvzb4pXSfNFXmkn3ShF8m2jyQcDRNTIX8T5KWRMRdXfZfAf6pAL3cZX4JOBJ4pMt+RNpXh9xlzp3H3PmD8v1Suk+a0CzdJ9BMXRwTN2F1ofxTVJ8EXAH8HPtuKY8C/oFq2crN06mXNLOVWdJy4ItUE8N1fsmOBT4S9Z7T5L6GWfPYhE+SbrF+Kd0nDWoW65Ok2UhdHDM9B5BXomamqH49HZUuIn5YmF7OqchfBSxh9JdsU0S8OMU8ZitzE3nM7ZOkWbRfWuCTJjSL9knSzV4Xe+EmrA4knQD8GVWb5g6qi79Q0l6q6F1rpGnq+vcbdFQ6SbW7/uXUa6jM0bG91PFam9zXMHceG/Bx8X4p3Se5Ndvgk5TP3H4ZOy3fgexD0j3AByPizi77UuDPI+LNNTSzdv1rQC9rmSWdDlxOdVvemb9jqb5kt/ajlzRzlzlrHhvq3lm0X0r3SROapfskaTYyFGFMcj+Vb/NGM9NybwXm9LDPZYweEwPWyz0t94P0WAwHOAZ4sJBrmDWPufPXBr+U7pOGyly0T5rwy0Sbm7BG08RUyKK6jezmpbRvuvVyl/kA9j2862QncGANPchf5tx5zJ0/KN8vpfukCc3SfQLN1MUxcQDpICI+pt5TIX8p6k+FnLvrX1a9Bsp8FbBJ0jpGf8nOBq6soQf5r2HuPGbv/t0Cv5Tuk+yaLfAJNDMUYUz8DGQANNA1OKtebiS9kd7rETwwBc3c1zBrHkv3CZRf5obqTXbNnLThuzJuWg4gvZG0KiLWjPV5f2QmlrkN2C/lYZ9UeDr3sck+FbKkNeN9nm49MpdZ0qfH+1xTM/c1/PR4n2vo5fYJFO6X0n3SkGbRPkkaTdTF0eR+Ku9t3B4SJ4/3ebr1Gijvvxnvcwllzp3H0n3ShjI3VG+ya5bskyb80mtzE1YXGuBUyKUwE8vcBuyX8rBPRuMA0oGamZb7YOAiqko3L5mfAm4EVkefo0Mb0Ms9LfcBwPnAv6WaKO7lLxlwZUQ8349e0sxd5qx5zJ2/pFm0X0r3SROapfskaWavi+Om56gVLfEAABMWSURBVACyD0kPR485/SWJahDO4hqaG4DbgbWR5qNRNU/NSuBtEfH2adbLWmZJXwX2AmsZ/SVbCRwSEe/rRy9p5i5z1jzmzl86v2i/lO6TJjRL90nSzF4Xx2W62/5K2oB7gSU97EuA+2pqbq2zb4B6Wcs8Qf7qjsoe5DWsNTtAzvy1wS+l+6ShMhftkyb8MtHmgYSjORe4QlKvqZDPran5iKRPUP0jeBJA1Wye57JvoM906p1L3jLvkXQW8LWIeCnl71XAWUDdfui5y5w7j7nzB+X7pXSfNKF5LmX7BJqpi2OTOyLtDxvweuBkYAh4/RS15gKXAA9RVYo9VHPgXEJ1mzqternLDCwCrgF2Ua24ti29vwY4ppBrmDWPTfmkZL+U7pOmNEv2SdN1sdfmZyBdpIdQyxndy6KRqZBLoakySzqU6jnb01PPZTOUnMeZ6pcm8pdLc6b6ZCwcQDpQQ1Mh5+76l1OviTJLegP7pmfozN9D/Wp1aOa+hlnz2ED+ivdL6T7JrdkGnyTNgXU1dgDpQNJW4JTufxOq5pa5M3r0wJiEZu6uf7n1spZZ0ieB9wPruvJ3NrAuIlb3o5c0c5c5ax4b6v5dtF9K90kTmqX7JGlmr4vjkrtNrM0bVTvkwT3sBzPOWgATaY5hVx3NJvRyljnpHdjDflBJ1zBnHnPnrw1+Kd0nTZW5ZJ804ZeJNvfCGk0TUyH/k6QlEXFXl/1XgH8qQC93mV+iGhT1SJf9COov1Zm7zLnzmDt/UL5fSvdJE5ql+wSaqYtj4iasLpR/iuqTgCuAXl3//kNEbJ5OvaSZrcySlgNfpOpR0vklOxb4SNR7TpP7GmbNYxM+SbrF+qV0nzSoWaxPkmYjdXHM9BxABkMaDfpypYs0SrQUvZykvuxLGP0l2xQRL05RN1uZm8hjyT6B8svcUP4aqYu5aMN3Zdx0HECMMcbUweuBGGOMqYUDiHkFkg6b7jyYV5Lat00BSDokPQ+Z0TiATBJJN093HsZC0iFTOPcMSd+X9C1JJ0q6H7hT0g5Jp2XMZlamUuZ0/gEd718raWiqmmOk89qa553UtZ0MrE8+qhVISi9zD5135dDJhaSjJa2TtAu4E9gk6alkW5Qpjaz+kHSspPdIOj6n7svk7hfc5g04aYztZOCJmppvAu6g6mWxBpjbse+uGnqf6nh/PFVf8u8DP6Aa5NSv3j3AG4FfBX4ELE32NwJ3Z76+dWc0zl3mc1NZHwbOAL4H3JZ89P7MZX605nkvAf8D+EbH9lx6vX1/KzPw7q7tPcAPRz7XzMdu4MvAaaTnvVMs17eB9wGzOmyzqAb+3VFD71SqearuB04BNia/PAb8as08fgM4LL3/7eTvLwP3AR/N6ecIz4U1CkkvAt+EnusbL42I2TU0vwV8jiqI/A5wHvCuiPh7SVsi4sQ+9e6OiJPS+68DX4yImyUtAf44Iv7VFPQei4ijOvbdExEn9Kn37rF2AX8WEfPG2D/ZPOYo833Ab1J1dfwOcGLyx3xgY0T8cp96/+tYu4D/PSL6/lcp6b3AR4FLIuKmZPt+RBzTr1Y6t+gyS3oBuIVq8aOR7997geuAiIj/pR+9pLkV+FOq0d6LktZXI+KOfrWS3rYYY82P8faNo3cX1YJSrwX+G3BmRHwr3WH+aUScWiOP342IX0rvNwHLI+JHkl5NFeT68vNEeCDhaB4EPhgR27p3SKo7FfJrY19/7j+UtBm4RdJvU81TMxWOjIibASLiLkl9Bzhgr6QPAq+jml7694BrgbcBz9bQuwb4K3qX7Wdr6HWTo8wvRjVp3dOSno2Iv096T0q9/jtMyH8BLgVe6LGvVjNxRFwn6Rbgs5LOA/4jU6svpZf5V4HVwCaqPxoh6S0RcV6dzCX+MSK+CHxR0tFUdwqXS5pDNVXIf+pTb7Oky6kWgBr5PTiKarGmLTXyd2BE3AcgaVdEfAsgIu6uWa8Bnpe0ICJ2Un1//zHZf0p1t5QVB5DRfJqxK/9Ha2pK0sER8QxARHxD0nuArwF12jt/XtJ6qn9pCyW9OiJ+kvYdWENvJfApqh+n06n+rW2gGh3772ro3Qv8YUR8t3uHpLfV0IP8ZX5U0uep/o0/JOn/Aq6nCppP1NC7G7ghegzSkvQ7NfQAiIhngd+TdCLVj9bP1dWi8DJHxCZJb6f6nt2uap6oqf7BejkyRsSjwB8AfyDpOKpg0i/nUN0x/D6jx22sB66sodf5W3NR176DaugB/B5wq6SvUTWN3Z7+iPwa8Bc1NcfETVgNI+kDwPe6b5vTP6L/IyL6+pGW9Btdps0R8WxqinhvRHxpajmeGpJ+DXgkfWG79w1FxHANzaxllvQ64MNUP1BfpBpZfB5V0PxcRPT1g5p+kH4UPabiljQ/0sI+U0HVbcLPRcQ/1Dy/iTLvjohdPfZNqcySjgT+GBiKiJ+fgs4fRcRYTW3TTuok8Lcdf4ZG7P8SeE9E/EFN3YOBD1BNqngAVZCb0gy/Y6blADIaDXAq5BJIPXPO55VlvgG4KiKen8bszVg6/PJvqeZLerkuAlfaL4MnPUf4CJUv/pTqgfp7qBZv+ky6Y5xRuBtvB6qmQv441YP0P6Bq4/0m8DFJf1JTc5akD0r6rKRTu/Z9aqp57tJbU+O0vwROoLotfwfwr9P7E4D/WiMPB6Ty3iLpXknfkXSzpH8vqU5zUxuuYRN6I375NKP98mbq+SXrNWyDTxqoi18B5gPHAF+nmqDwD6masq6okb+BXcOkmbVug+9ARiHp4egxp39qPni4314W6dwvA68G7qLqVvfNkdtqdfQu6kNvrOcmAr4TEQv71NsaEceNsa/n9ZhA76vAXqo2+871CFZSLan5vn70kmbp1zCrXtLM7Zfc17BonyTNrHVRqVdi+j14AjgiPewfyWO/PdmyXsN0XvbrOB5+iD6aJqZCXjJSsSR9kaoXyPVUD6vrdH/ZRdVu3XlupM+H19DbI+ks4GsR8VLK56uAs6jWU+6Xk3r88O0A7pD0cA09KP8a5taD/H7JfQ1L9wk0UxdJQeOmSP++0+c6/8RzX0No5jqOiQPIaM4FrpDUayrkc2tqvtybIiJeAFZJ+s/A7VT9v/vle8BpYzykrtPV+GzgEqrKO/LDNDflr05Pldw/fFD+NcytB/n9kvsalu4TyF8XhyW9NiKe7RyXkh56/7iGXu5rCM1cx7GJzCMT94cNeD3V6PMh4PVT1PqvVIN5uu2/AzxfQ+/DwJvH2DelkabAoaRRrFPQWEQ1FmQX1SjYh9P7a4Bj9sdr2KRPMvol9zUs2ifpvOx1cZy0+h7pnvsaNnUdx9v8DGQCJH06Ij493fkYJJLWRMSqDDqHUn2xXtG91fRPLr/MRJqqizPdJ+6FNTHZJ3QrqKfPWAzlEImIH0XE0430/ij8GjZRZjL5ZYTSy5xTr8G6WLRPmtIcwQFkYuo+zBqPrJWuAb2nMuvlzl8TmqXrQfl+KV2vCc3SfdKUJuAAMhlObkAzd6XLqhcRy3Pqkb+8TWiWrtcGv5Sul12zBT5pShPwOJBR6JUjTc+mmk56vx1pKmkW1UO7hcAtEfHfO/Z9KiI+N22ZMz2Z6e3uJTJTfeI7kNF8hdEjTYeYwkhTaMUI4D8HfoNqrYjLJP1Rx76xpmYfZP4mSq/Ydvep6Kla8a7XdijVyPR+9Yquh03UmwbymNUnE6TVimchvgPpIPdI06RZ+gjge2PfYKYDgMuBw6gGM90R/a9XUvzo2tL1kuaLjD0gbEFE9DVbawvqYRP1Jncec/ukiXoz0JHoDiAdqGMBJUlXxejBQt+JiDfX0Mz9A51b76GIeEOX7T9TzdZ6ePS/SE7W/CWd3F/covWS5jbGGRAWHQt/TVKv9HrYRL3JncfcPmmi3mTXHA83YY1mWGk958gz0hS6RpumdtJ7yDQCOIPesKRRDwIj4jNUawcsKiB/UI2ufUtEHNOx/XxUq/PVmTa8dD2opjOfO8a+OtN8l14Pm6g3uTVz+6SJetOE5thMdsThTN+oMdI0nVf0COAGrlPxo2tL12uDX0rXa0ozs0+aGH3vkejTiaQ3ACsYvTbG+oh4cFoz1iAzscxtwH5pD5LeHhEbpzsfg8YBpANVy2i+H1jH6Omfz6ZaQ3l1Td2sPwQ59ZoocxM/fCVfw4b0GqmLY6SV9cevdL0mNCU9GhFHZ9QrvszgADIKVVM8/2J0rfYm6SDg/qi3HkjWH4IG9LKWuaGAVPo1bKLM2eviOGnl/vErWq+upqT1Y+0C3hoRr5l6zl5Oq4gyT6jpALIPSQ8ByyLikS77vwBujTEW+JlAM/cPdG69rGVuKAiXfg2bKHNuv2T98StdrwlNVdPq/89A94BiAddExPzpzF9TmuPh9UBG87vAbam73sjc+UcDx1KNUK/DS1RrWj/SZT8i7Ztuvdxlzp2/JjRL14P8fvk1xv7xW7If6jWheQfwk4j4ZvcOSVtr6LWhzOPiANJBRNwi6ReoLvQCqou+A9gUES/WlM39Q5BVr4EyNxGEi76GDeg14ZfcP36l62XXjIgzxtn36/3q0YIyT4SbsCZA0qqImNIUAKpWQcsWlHLr9dCfUpmbyF/p17Bpn6Q0plwXTV4kvTMi/ma68zFt5O4XvL9twN0NaK4qXC9rmXPnryXXsIky5/bLO2eSXkN5LNonTWmObB6JPjFNrAfy7wvXy13m3PlrQrN0Pcjvl8/MML0mNEv3SVOagKcymQz/pgHN3JUut17uMjcRhEu/hk2UuXS/lK7XhOYHM+u1ocz7hNMtjpkASedFxF9k0loYETsmPnJ69Dp0s5S5ifyVfg2b8knSzuWXJRFxV448tUEvh2bTswOUWOZxtR1AJkdDA3uyBaWG9HIPCMuavyY0S9dLmrX80oLR90XPYDBTZ20YNy0HkH1IunesXcAvRMTPZE5v2kfsDrLMbRhdW4pebr+UPvq+JTMYzLhZGyZMzwFkH5KepFoHY0/3LuB/RMSRNTRz/xDk1sta5iYCUguuYRNlzu2Xokfft2QGgxk3a8NEeCDhaP4GeG1E3NO9Q9Lf1dSczzg/BAXo5S5z7vw1oVm6HuT3S+mj79swg8FMnLVhXBxAOoiI88fZ94Gasrl/CLLqNVDmJoJw0dewAb0m/FL66PviZzCImTlrw7i4CcuYGULpo+/bMINBbtpeZgeQDiT9MrCG6sLfDHwyIvakfXdFRPbJyKabmVjmNmC/lId98ko8kHA0lwOfBt4EPAx8S9V66AAH1hGU9MuS7pD0mKQ1kuZ27Ou7b3ZuPTKXuYH8FX8NmygzhfuldL2GNIv2SVOa45JzXpS2b8A9XZ9/E9gGLKXmnDfAt4DlwBzgfwPuB/5l2relAL2sZc6dv5ZcwybKXLRfStdrKI9F+6QpzXHTyy3Y5g34DnBwl+2XUyX5UU3N3JUut17WMufOX0uuYRNlLtovpes1lMeifdKU5rjp5RZs8wZ8AFjaw3408H/X1Mxd6XLrZS1z7vy15Bo2Ueai/VK6XkN5LNonTWmOm15uQW+vcGjuSpc9yJVc3jZcw9J90oYyt6HelO6T6Size2F1IOlg4CLgTGBeMj8F3Aisjoi905W3ppiJZW4D9kt52CevxL2wRnMt1Wjit0TEoRFxKFUb4h7g/60jKOlgSaslPSTpR2l7MNnmTLcemcvcQP6Kv4ZNlJnC/VK6XkOaRfukKc3xcAAZzaKIuCQifjhiiIgfRsQlVLeAdcgdlHLr5S5z9iDcgGbpelC+X0rXa0KzdJ80pTk2090OWNIG3Ap8ApjfYZsPfBL425qaW+vsG6Be1jLnzl9LrmETZS7aL6XrNZTHon3SlOZ4m+9ARvM+4FDgm5J2S9oN/B1wCPBbNTUfkfQJSfNHDJLmq5p2+bFxzhuUXu4y585fE5ql60H5fildrwnN0n3SlObY5I5I3l4R9ecClwAPAbvT9mCyHTLdeqWXtw3XsHSftKHMbag3pftkOsrsXlhdqFrNawFwR0T8Y4d9eUTcMn05a46ZWOY2YL+Uh30yGjdhdSDpY1Rd8j4K3C9pRcfu/zIF3TdIOk3Sa7rsy6dbr4ky5y5vE5ot0CveL6Xr5dZsg0+a0hyT6b6NK2kD7qNa1wFgETAMfDx9rjs3zceArcANwA+AFR376kynkFsva5lz568l17CJMhftl9L1Gspj0T5pSnPc9HILtnkDHuj6/FrgFuCP6Jpjpg/N3JUut17WMufOX0uuYRNlLtovpes1lMeifdKU5nibVyQczQ8lnRBpZbmIeFbSO4GrqKZwrsOsiHg26f1A0luA61Sto6wC9HKXOXf+mtAsXQ/K90vpek1olu6TpjTHxM9ARnMO8MNOQ0S8EBHnAL9eU/OHkk7o0HsWeCdwGPUqXW693GXOnb8mNEvXg/L9UrpeE5ql+6QpzTFxL6yGkbQQeCE6Rq927Ds1Iv77dOrlpon8lX4NS/dJykfRZW5DvcnN/lBmBxBjjDG1cBOWMcaYWjiAGGOMqYUDiDHGmFo4gBhjjKnF/w+kKhHj3czREQAAAABJRU5ErkJggg==\\n",

"text/plain": \[

"\<Figure size 432x288 with 1 Axes\>"

\]

},

"metadata": {

"needs_background": "light"

},

"output_type": "display_data"

}

\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "0QS7VdsiLcvT"

},

"source": \[

"\# Regression model\\n",

"\\n",

"I could have just stopped where I was, but I wanted to see if a simple
linear regression model would beat a simple average. It also seemed
weird to turn this in with no ML at all..."

\]

},

{

"cell_type": "code",

"metadata": {

"id": "JyN9VV64LcvT"

},

"source": \[

"from sklearn.linear_model import LinearRegression\\n",

"from sklearn.model_selection import train_test_split"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "mAMt1827LcvT"

},

"source": \[

"\# Grab a fresh copy of the data\\n",

"df = pd.read_csv(\\"carrier_valuation_takehome_data.csv\\")"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "dx7u6NM3LcvT"

},

"source": \[

"At one point I decided to remove the carriers with zero revenue for all
time - there are 26k out of 37k total and I was worried they were
causing the model to revert to all zeros! Common theme with this sparse
data. Prediction for those could be handled independently, or ignored,
depending on goals of the metric. \\n",

"\\n",

"It didn't help performance, so I reverted."

\]

},

{

"cell_type": "code",

"metadata": {

"id": "eDGBs6fLLcvT"

},

"source": \[

"\# \# Get rid of entries with all zeros in training data\\n",

"\# revenueSum = df.groupby(\\"ID\\")\['REVENUE_HAULED'\].sum()\\n",

"\# nonzeroCarriers = revenueSum\[revenueSum!=0\].index\\n",

"\# df = df\[df\['ID'\].isin(nonzeroCarriers)\]"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "G_ouSkTLLcvU"

},

"source": \[

"weeksHistory = 6\\n",

"\\n",

"\# Make list of dates so I can assure things are sorted\\n",

"dateWeeksOffset = 2\\n",

"dateList = \[\]\\n",

"for weeksAgo in range(dateWeeksOffset, weeksHistory+dateWeeksOffset+1):
\# Offset for partial data\\n",

" ts = datetime.fromisoformat(max(df\['CALENDAR_WEEK'\])) -
timedelta(days=7\*weeksAgo)\\n",

" date = datetime.isoformat(ts).replace(\\"T\\", \\" \\") \# ISO
mismatch\\n",

" dateList.append(date)"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "uSna7m4lLcvU"

},

"source": \[

"\# Extract and format revenue as a matrix\\n",

"\# I orignially did this with a dictionary and tons of indexing, but it
was slow and this is way better!\\n",

"revenueDf = df.pivot(index='ID',
columns='CALENDAR_WEEK',values='REVENUE_HAULED').fillna(0)"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "XtyCZLtNLcvU"

},

"source": \[

"y = np.array(revenueDf\[dateList\[0\]\]) \# target\\n",

"X = np.array(revenueDf\[dateList\[1:\]\])\\n",

"\\n",

"X_train, X_test, y_train, y_test = train_test_split(\\n",

" X, y, test_size=0.4, random_state=1234)\\n",

"\\n",

"lr = LinearRegression().fit(X_train, y_train)"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "YE2WMOMNLcvU",

"outputId": "41ccdc64-561b-4516-f69c-bad08d916aec"

},

"source": \[

"X_train.shape, y_train.shape"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"((22345, 6), (22345,))"

\]

},

"execution_count": 32,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "LDXp4j27LcvU"

},

"source": \[

"y_pred = lr.predict(X_test)"

\],

"execution_count": null,

"outputs": \[\]

},

{

"cell_type": "code",

"metadata": {

"id": "AzMwg2V9LcvU",

"outputId": "69d90e89-2fc0-4054-a6a4-74188ca69b57"

},

"source": \[

"lr.coef\_"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"array(\[0.16734361, 0.07489673, 0.19985379, 0.05014435,
0.11859244,\\n",

" 0.02379543\])"

\]

},

"execution_count": 34,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "s9LW_RA4LcvU",

"outputId": "2e3fc64c-d9a2-4dfe-9424-e68c3ebf3dc3"

},

"source": \[

"rmse(y_pred, y_test)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"2233.0484502677114"

\]

},

"execution_count": 35,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "RgEbJYEJLcvU",

"outputId": "a6c6725a-33b5-439c-d54b-5499779802d2"

},

"source": \[

"\# Zeros (cross-validation style)\\n",

"rmse(0, y_test)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"2252.3253419055095"

\]

},

"execution_count": 36,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "e_B6Qp7ILcvU",

"outputId": "50f71ff4-f4dd-4cfd-b8eb-60ef4c3616c9"

},

"source": \[

"\# Last Week (cross-validation style)\\n",

"rmse(X_test\[:,0\], y_test)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"3092.3305113499446"

\]

},

"execution_count": 37,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "code",

"metadata": {

"id": "ac73LwDuLcvV",

"outputId": "f036a9d2-1e37-442c-db76-f1ad2d0b4c0a"

},

"source": \[

"\# Test average prediction in same cross-validation style \\n",

"\# (omits the ragged start, so calling this \\"window average\\")\\n",

"rmse(np.mean(X_test, 1), y_test)"

\],

"execution_count": null,

"outputs": \[

{

"data": {

"text/plain": \[

"2379.142783003256"

\]

},

"execution_count": 38,

"metadata": {},

"output_type": "execute_result"

}

\]

},

{

"cell_type": "markdown",

"metadata": {

"id": "oHF5_ZHDLcvV"

},

"source": \[

"\#\# Regression conclusions\\n",

"\\n",

"With six weeks of history:\\n",

"\\n",

"\| Model \| RMSE \|\\n",

"\| - \| - \|\\n",

"\| Zeros \| 2,252 \|\\n",

"\| Last Week \| 3,092 \|\\n",

"\| Window average \| 2,379 \|\\n",

"\| Regression \| 2,233 \|\\n",

"\\n",

"Results were pretty similar with 52 weeks history, but vary some
depending on the random seed for test/train split. \\n",

"\\n",

"Bottom line: I am not convinced that regression significantly beats
window average, and I think they are all beat by the ragged average
above. None are accurate predictors of weekly revenue."

\]

},

{

"cell_type": "code",

"metadata": {

"id": "BHiJq52bLcvV"

},

"source": \[

""

\],

"execution_count": null,

"outputs": \[\]

}

\]

}