# Time Series to observe DAILY temperature variations 
## Daily temperature prediction using Random Forest

Going one step deeper than a tree, we use in the notebook below 

**Random Forests ** : We begin by loading all necessary libraries and paths to read the "pickles" as well as store image for the graph towards the end of our code. 
The pickles are read and the data is fed into an RF model.

Finally, we have two graphs showing the RF results vs. the fitted model as well as predicted results vs. actuals and test data

In [11]:
%run helper_functions.py
%matplotlib inline

Create a folder for every run of the Random forest to store images

In [12]:
IMAGE_DIR = create_results_perrun()
print("Path of the results Image directory",IMAGE_DIR )

Path of the results Image directory ../Images/RESULTS/RUN-15


Here we are importing the train and test Data from pickle files created through the EDA file

In [13]:
city='Los_Angeles'
X_train = pd.read_pickle(f'{PICKLE_PATH}/X_train_{city}.pkl')
Y_train = pd.read_pickle(f'{PICKLE_PATH}/Y_train_{city}.pkl')

X_test  = pd.read_pickle(f'{PICKLE_PATH}/X_test_{city}.pkl')
Y_test  = pd.read_pickle(f'{PICKLE_PATH}/Y_test_{city}.pkl')

print("Shape of Training Dataset " , X_train.shape)
print("Shape of Testing Dataset " , X_test.shape)

Shape of Training Dataset  (1432, 31)
Shape of Testing Dataset  (90, 31)


In [14]:
# Fitting a decision tree regressor with max depth and n_estimators
max_depth = 8
n_estimators = 50
fitted_model = RandomForestRegressor(max_depth=max_depth, random_state=0, n_estimators=n_estimators)
fitted_model.fit(X_train, Y_train)

# Dataframe to show features and their importances
top_features = 10
features_importances_df= show_feature_importances(X_train.columns.values.tolist(),
                                                  fitted_model.feature_importances_,top_features)
features_importances_df

Unnamed: 0,Features,Feature Importances in (%)
6,temperature_mavg2,98.128928
0,temperature_lag1,1.137553
1,temperature_lag2,0.196494
21,enhanced_nao_value_lag365,0.044277
13,enhanced_ao_value_lag30,0.041907
3,temperature_lag30,0.032616
7,temperature_mavg7,0.030422
19,enhanced_nao_value_lag7,0.027545
18,enhanced_nao_value_lag2,0.02621
2,temperature_lag7,0.026082


In [15]:
# Run the model on the training dataset
Y_train_pred = fitted_model.predict(X_train)
# Calculate mean squared error for the predicted values
mse_train = mean_squared_error(Y_train, Y_train_pred)
print('Mean Squared Error for the training dataset: %.3f' % mse_train)

Mean Squared Error for the training dataset: 0.060


In [16]:
# Run the model on the testing dataset
Y_test_pred = fitted_model.predict(X_test)
# Calculate mean squared error for the test vs predicted values
mse_test = mean_squared_error(Y_test, Y_test_pred)
print('Mean Squared Error for the testing dataset: %.3f' % mse_test)  

Mean Squared Error for the testing dataset: 0.366


In [17]:
# Creating a dataframe for predicted/fitted values
future_forecast = pd.DataFrame(Y_test_pred,index = Y_test.index,columns=['Fitted'])

# Concatenate the predicted/fitted values with actual values to display graphs
predictions = pd.concat([Y_test,future_forecast],axis=1)
predictions.columns = ["Actual","Fitted"]

# Displaying few of the predicted values
predictions.head(10)

Unnamed: 0_level_0,Actual,Fitted
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-09-02,30.18375,29.218853
2017-09-03,30.973333,29.532528
2017-09-04,25.004167,27.531416
2017-09-05,23.995833,23.977647
2017-09-06,24.89625,24.269558
2017-09-07,23.748333,24.276287
2017-09-08,23.3725,23.226363
2017-09-09,22.320417,22.229922
2017-09-10,23.954583,23.437974
2017-09-11,25.469583,25.203804


In [18]:
city = city.replace('_',' ')
# Plotting the daily predicted temperature vs Actual Temperature - Decision Tree
fig = charter_helper_fitted("Daily Predicted Temperature using Decision Tree for "+ city, predictions)
iplot(fig)
#py.image.save_as(fig, f'{IMAGE_DIR}/Daily_DT_actual_vs_predict.png')

In [19]:
# Plotting the training data for past year, Actual/test data and predicted temperature - Decision Tree
fig = charter_helper_prediction("Daily Predicted Temperature using Decision Tree for " + city, 
                     X_train,Y_train,X_test,Y_test,future_forecast)

iplot(fig)
#py.image.save_as(fig, f'{IMAGE_DIR}/Daily_DT_predict.png')

In [20]:
results = update_results_function('RANDOM FOREST',f'{city}',X_train._metadata["feature_set_type"],
                                  {'max_depth': max_depth , 'n_estimators':n_estimators, 'Info': X_train._metadata}, 
                                  {'features' : X_train.columns.values.tolist(),
                                   'importances':fitted_model.feature_importances_,
                                   'mse_train' : mse_train}, 
                                    mse_test) 
results

Unnamed: 0,RUN NO,DATETIME,MODEL NAME,CITY,FEATURE_TYPE,HOST_MACHINE,PARAMETERS,RESULTS,MEAN SQUARED ERROR
0,RUN-1,2018-08-11 02:35:23.882031,DECISION TREE,New York,,ELS-F25750M,"{'max_depth': 8, 'Info': {'city': 'New York', ...","{'features': ['temperature_lag1', 'temperature...",0.519263
1,RUN-2,2018-08-11 02:42:21.539158,DECISION TREE,New York,Enhanced Features,ELS-F25750M,"{'max_depth': 8, 'Info': {'feature_set_type': ...","{'features': ['temperature_lag1', 'temperature...",0.507592
2,RUN-3,2018-08-11 02:55:40.130331,DECISION TREE,New York,Enhanced Features,ELS-F25750M,"{'max_depth': 8, 'Info': {'feature_set_type': ...","{'features': ['temperature_lag1', 'temperature...",0.512219
3,RUN-4,2018-08-11 02:58:39.851712,DECISION TREE,New York,Enhanced Features,ELS-F25750M,"{'max_depth': 8, 'Info': {'feature_set_type': ...","{'features': ['temperature_lag1', 'temperature...",0.544546
4,RUN-5,2018-08-11 02:59:40.073802,RANDOM FOREST,New York,Enhanced Features,ELS-F25750M,"{'max_depth': 8, 'n_estimators': 50, 'Info': {...","{'features': ['temperature_lag1', 'temperature...",0.321614
5,RUN-6,2018-08-11 03:42:53.491706,RNN,New York,Enhanced Features,ELS-F25750M,"{'epochs': 50, 'Info': {'feature_set_type': 'E...","{'features': ['temperature_lag1', 'temperature...",19.882019
6,RUN-7,2018-08-11 03:55:27.381404,ARIMA,New York,Enhanced Features,ELS-F25750M,"{'order': (7, 0, 1)}",{'features': ['temperature']},27.11741
7,RUN-8,2018-08-11 04:08:13.465006,DECISION TREE,New York,Enhanced Features,ELS-F25750M,"{'max_depth': 8, 'Info': {'feature_set_type': ...","{'features': ['temperature_lag1', 'temperature...",0.511456
8,RUN-9,2018-08-11 04:08:20.505215,RANDOM FOREST,New York,Enhanced Features,ELS-F25750M,"{'max_depth': 8, 'n_estimators': 50, 'Info': {...","{'features': ['temperature_lag1', 'temperature...",0.321614
9,RUN-10,2018-08-11 04:08:48.142332,RNN,New York,Enhanced Features,ELS-F25750M,"{'epochs': 50, 'Info': {'feature_set_type': 'E...","{'features': ['temperature_lag1', 'temperature...",11.20659
