# 5 Modeling<a id='5_Modeling'></a>

## 5.1 Contents<a id='5.1_Contents'></a>
* [5 Modeling](#5_Modeling)
  * [5.1 Contents](#5.1_Contents)
  * [5.2 Introduction](#5.2_Introduction)
  * [5.3 Imports](#5.3_Imports)
  * [5.4 Load Model](#5.4_Load_Model)
  * [5.5 Load Data](#5.5_Load_Data)
  * [5.6 Refit Model On All Available Data (excluding Big Mountain)](#5.6_Refit_Model_On_All_Available_Data_(excluding_Big_Mountain))
  * [5.7 Calculate Expected Big Mountain Ticket Price From The Model](#5.7_Calculate_Expected_Big_Mountain_Ticket_Price_From_The_Model)
  * [5.8 Big Mountain Resort In Market Context](#5.8_Big_Mountain_Resort_In_Market_Context)
    * [5.8.1 Ticket price](#5.8.1_Ticket_price)
    * [5.8.2 Vertical drop](#5.8.2_Vertical_drop)
    * [5.8.3 Snow making area](#5.8.3_Snow_making_area)
    * [5.8.4 Total number of chairs](#5.8.4_Total_number_of_chairs)
    * [5.8.5 Fast quads](#5.8.5_Fast_quads)
    * [5.8.6 Runs](#5.8.6_Runs)
    * [5.8.7 Longest run](#5.8.7_Longest_run)
    * [5.8.8 Trams](#5.8.8_Trams)
    * [5.8.9 Skiable terrain area](#5.8.9_Skiable_terrain_area)
  * [5.9 Modeling scenarios](#5.9_Modeling_scenarios)
    * [5.9.1 Scenario 1](#5.9.1_Scenario_1)
    * [5.9.2 Scenario 2](#5.9.2_Scenario_2)
    * [5.9.3 Scenario 3](#5.9.3_Scenario_3)
    * [5.9.4 Scenario 4](#5.9.4_Scenario_4)
  * [5.10 Summary](#5.10_Summary)
  * [5.11 Further work](#5.11_Further_work)


## 5.2 Introduction<a id='5.2_Introduction'></a>

In this notebook, we now take our model for ski resort ticket price and leverage it to gain some insights into what price Big Mountain's facilities might actually support as well as explore the sensitivity of changes to various resort parameters. Note that this relies on the implicit assumption that all other resorts are largely setting prices based on how much people value certain facilities. Essentially this assumes prices are set by a free market.

We can now use our model to gain insight into what Big Mountain's ideal ticket price could/should be, and how that might change under various scenarios.

## 5.3 Imports<a id='5.3_Imports'></a>

In [1]:
import pandas as pd
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import __version__ as sklearn_version
from sklearn.model_selection import cross_validate

## 5.5 Load Data<a id='5.5_Load_Data'></a>

In [14]:
ski_data = pd.read_csv(r'C:\Users\fahiy\Documents\Springboard\Capstone\Unit-6\Unit-6-Step3\step3_output.csv')

In [23]:
del ski_data['Unnamed: 0']
del ski_data['Unnamed: 0.1']
ski_data.head()
ski_data.columns

Index(['Name', 'state', 'summit_elev', 'vertical_drop', 'trams', 'fastEight',
       'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',
       'total_chairs', 'Runs', 'TerrainParks', 'LongestRun_mi',
       'SkiableTerrain_ac', 'Snow Making_ac', 'daysOpenLastYear', 'yearsOpen',
       'averageSnowfall', 'AdultWeekday', 'AdultWeekend', 'projectedDaysOpen',
       'NightSkiing_ac', 'clusters'],
      dtype='object')

In [24]:
dumm=pd.get_dummies(ski_data.state)
merged=pd.concat([ski_data,dumm],axis=1)
final=merged.drop(['state'], axis=1)
df=final
df.head()

Unnamed: 0,Name,summit_elev,vertical_drop,trams,fastEight,fastSixes,fastQuads,quad,triple,double,...,Rhode Island,South Dakota,Tennessee,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,Eaglecrest Ski Area,2600,1540,0,0,0,0,0,0,4,...,0,0,0,0,0,0,0,0,0,0
1,Hilltop Ski Area,2090,294,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Sunrise Park Resort,11100,1800,0,0,0,1,2,3,1,...,0,0,0,0,0,0,0,0,0,0
3,Yosemite Ski & Snowboard Area,7800,600,0,0,0,0,0,1,3,...,0,0,0,0,0,0,0,0,0,0
4,Boreal Mountain Resort,7700,500,0,0,0,1,1,3,1,...,0,0,0,0,0,0,0,0,0,0


In [25]:
from  sklearn import preprocessing

X = df.drop(['Name','AdultWeekend'], axis=1)
y = df['AdultWeekend']
scaler = preprocessing.StandardScaler().fit(X)
X_scaled=scaler.transform(X) 

In [26]:
from sklearn.model_selection import train_test_split
y = y.ravel()
X_train, X_test, y_train,y_test= train_test_split(X_scaled, y, test_size=0.25, random_state=1)

In [27]:
from sklearn import linear_model
from sklearn.metrics import explained_variance_score,mean_absolute_error
lm = linear_model.LinearRegression()
model = lm.fit(X_train,y_train)

In [41]:
from sklearn.model_selection import train_test_split
y = y.ravel()
X_train, X_test, y_train,y_test= train_test_split(X_scaled, y, test_size=0.25, random_state=1)

In [42]:
from sklearn import linear_model
from sklearn.metrics import explained_variance_score,mean_absolute_error
lm = linear_model.LinearRegression()
model = lm.fit(X_train,y_train)

In [43]:
y_pred=model.predict(X_test)
y_pred

array([ 6.16091003e+01,  7.00431824e+01,  5.28254089e+01,  4.09748230e+01,
        5.42951355e+01, -1.33196483e+13,  5.54484558e+01,  5.12013855e+01,
        3.92326355e+01,  3.66857605e+01,  6.02990417e+01,  8.91369324e+01,
        4.79767761e+01,  6.37424011e+01,  6.85221863e+01,  7.26115417e+01,
       -1.33194122e+13,  8.26862488e+01,  6.26310730e+01,  5.66779480e+01,
        4.84220886e+01,  4.38727722e+01,  4.70939636e+01,  5.40138855e+01,
        5.80373230e+01,  8.69591980e+01,  5.12038269e+01,  4.51535339e+01,
       -1.33194122e+13,  4.48151550e+01,  4.48517761e+01,  4.94928894e+01,
        3.14113464e+01,  5.17306824e+01,  7.88459167e+01,  3.12121277e+01,
        5.15920105e+01,  7.55695496e+01,  5.94469910e+01,  6.13517761e+01])

In [44]:
from sklearn.metrics import  explained_variance_score
from sklearn.metrics import  mean_absolute_error
print(explained_variance_score(y_test,y_pred))
print(mean_absolute_error(y_test,y_pred))

-4.802399571012608e+22
998961818934.5469


In [45]:
print(lm.intercept_)

-254512565274.57205


In [46]:
pd.DataFrame(abs(lm.coef_), X.columns, columns=['Coefficient']).sort_values(by=['Coefficient'],ascending=False)

Unnamed: 0,Coefficient
fastSixes,15382660000000.0
trams,9426654000000.0
total_chairs,5030271000000.0
New York,3915392000000.0
California,3670611000000.0
Pennsylvania,3538833000000.0
Michigan,3538833000000.0
Wisconsin,3399838000000.0
New Hampshire,3252700000000.0
Minnesota,2929028000000.0


In [47]:
df.columns

Index(['Name', 'summit_elev', 'vertical_drop', 'trams', 'fastEight',
       'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',
       'total_chairs', 'Runs', 'TerrainParks', 'LongestRun_mi',
       'SkiableTerrain_ac', 'Snow Making_ac', 'daysOpenLastYear', 'yearsOpen',
       'averageSnowfall', 'AdultWeekday', 'AdultWeekend', 'projectedDaysOpen',
       'NightSkiing_ac', 'clusters', 'Alaska', 'Arizona', 'California',
       'Colorado', 'Connecticut', 'Idaho', 'Illinois', 'Indiana', 'Iowa',
       'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota',
       'Missouri', 'Montana', 'Nevada', 'New Hampshire', 'New Mexico',
       'New York', 'North Carolina', 'Ohio', 'Oregon', 'Pennsylvania',
       'Rhode Island', 'South Dakota', 'Tennessee', 'Utah', 'Vermont',
       'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype='object')

In [48]:
from  sklearn import preprocessing
X = df.drop(['Name','AdultWeekend'], axis=1)
X =  X.drop(X.loc[:,'Alaska':'Wyoming'],axis=1)
y = df['AdultWeekend']

scaler = preprocessing.StandardScaler().fit(X)
X_scaled=scaler.transform(X) 

In [49]:
from sklearn.model_selection import train_test_split
y = y.ravel()
X_train, X_test, y_train,y_test= train_test_split(X_scaled, y, test_size=0.25, random_state=1)

In [50]:
from sklearn import linear_model
from sklearn.metrics import explained_variance_score,mean_absolute_error
lm = linear_model.LinearRegression()
model = lm.fit(X_train,y_train)

In [51]:
y_pred=model.predict(X_test)
y_pred

array([62.35468583, 68.15508923, 60.9246815 , 47.61702255, 46.23764793,
       79.36458953, 49.39901008, 50.01363356, 46.87747133, 36.20314359,
       63.88880621, 88.0492567 , 47.6948717 , 63.02090573, 66.59702617,
       74.98708249, 45.54448524, 77.89568635, 62.6870567 , 58.37833728,
       53.69476771, 45.01639712, 44.7386535 , 56.06401917, 60.21440226,
       89.32209615, 46.96398469, 34.7795009 , 57.59308542, 51.30780299,
       55.24182645, 55.83423405, 41.05293366, 63.91814655, 76.88143409,
       34.15945909, 47.75027568, 70.72287029, 61.76095994, 52.91256685])

In [52]:
from sklearn.metrics import  explained_variance_score
from sklearn.metrics import  mean_absolute_error
print(explained_variance_score(y_test,y_pred))
print(mean_absolute_error(y_test,y_pred))

0.7528700021729586
6.207972322027411


In [53]:
print(lm.intercept_)

57.81334552107014


In [55]:
pd.DataFrame(abs(lm.coef_), X.columns, columns=['Coefficient']).sort_values(by=['Coefficient'],ascending=False)

Unnamed: 0,Coefficient
AdultWeekday,10.65109
clusters,3.397534
vertical_drop,2.643725
quad,1.237157
triple,1.109286
TerrainParks,1.10723
SkiableTerrain_ac,1.104818
summit_elev,0.9426123
Runs,0.7982729
yearsOpen,0.7431705


Next Model

In [56]:
from  sklearn import preprocessing
X = df.drop(['Name','AdultWeekend'], axis=1)
X =  X.drop(X.loc[:,'Alaska':'Wyoming'],axis=1)
X =  X.drop(['summit_elev','vertical_drop'],axis=1)

y = df['AdultWeekend']
scaler = preprocessing.StandardScaler().fit(X)
X_scaled=scaler.transform(X) 


In [57]:
from sklearn.model_selection import train_test_split
y = y.ravel()
X_train, X_test, y_train,y_test= train_test_split(X_scaled, y, test_size=0.25, random_state=1)

In [58]:
from sklearn import linear_model
from sklearn.metrics import explained_variance_score,mean_absolute_error
lm = linear_model.LinearRegression()
model = lm.fit(X_train,y_train)

In [59]:
y_pred=model.predict(X_test)
y_pred

array([62.86861137, 67.64153457, 61.92694132, 47.98171821, 45.88141418,
       78.74490719, 49.17773809, 47.87736528, 46.50622605, 36.26281917,
       65.27953921, 85.23929435, 48.09147002, 63.9621801 , 70.1044346 ,
       78.23943327, 47.80997653, 81.03041672, 62.44132197, 57.88983875,
       53.9861783 , 44.91430143, 44.31278009, 55.28605822, 59.0316183 ,
       90.71472614, 47.22366194, 34.13383441, 59.39179068, 53.51871601,
       54.46227661, 54.85264496, 43.11810894, 64.09289711, 77.17643591,
       34.10883639, 45.08455945, 66.24473364, 63.32951519, 51.24020957])

In [60]:
from sklearn.metrics import  explained_variance_score
from sklearn.metrics import  mean_absolute_error
print(explained_variance_score(y_test,y_pred))
print(mean_absolute_error(y_test,y_pred))

0.7609932533178463
6.157755374149225


In [61]:
print(lm.intercept_)

57.847493024287765


In [62]:
pd.DataFrame(abs(lm.coef_), X.columns, columns=['Coefficient']).sort_values(by=['Coefficient'],ascending=False)

Unnamed: 0,Coefficient
AdultWeekday,11.404809
clusters,3.044546
quad,1.33256
TerrainParks,1.198856
triple,1.032486
yearsOpen,0.889017
LongestRun_mi,0.886525
total_chairs,0.580642
averageSnowfall,0.577816
Snow Making_ac,0.367334


In [63]:
df.to_csv(r'C:\Users\fahiy\Documents\Springboard\Capstone\Unit-6\Unit-6-Step5\step5_output.csv')