## *DISCLAIMER*
<p style="font-size:16px; color:#117d30;">
 By accessing this code, you acknowledge the code is made available for presentation and demonstration purposes only and that the code: (1) is not subject to SOC 1 and SOC 2 compliance audits; (2) is not designed or intended to be a substitute for the professional advice, diagnosis, treatment, or judgment of a certified financial services professional; (3) is not designed, intended or made available as a medical device; and (4) is not designed or intended to be a substitute for professional medical advice, diagnosis, treatment or judgement. Do not use this code to replace, substitute, or provide professional financial advice or judgment, or to replace, substitute or provide medical advice, diagnosis, treatment or judgement. You are solely responsible for ensuring the regulatory, legal, and/or contractual compliance of any use of the code, including obtaining any authorizations or consents, and any solution you choose to build that incorporates this code in whole or in part.
</p>

# Hospital Wait Time Forecast

In this notebook we use Azure AutoML to forecast the average wait time of patients in each city

$*****$ For Demonstration purpose only, Please customize as per your enterprise security needs and compliances.License agreement: https://github.com/microsoft/Azure-Analytics-and-AI-Engagement/blob/main/HealthCare/License.md$*****$ 

## Legal Notices 

This presentation, demonstration, and demonstration model are for informational purposes only. Microsoft makes no warranties, express or implied, in this presentation demonstration, and demonstration model. Nothing in this presentation, demonstration, or demonstration model modifies any of the terms and conditions of Microsoft’s written and signed agreements. This is not an offer and applicable terms and the information provided is subject to revision and may be changed at any time by Microsoft.

This presentation, demonstration, and/or demonstration model do not give you or your organization any license to any patents, trademarks, copyrights, or other intellectual property covering the subject matter in this presentation, demonstration, and demonstration model.

The information contained in this presentation, demonstration and demonstration model represent the current view of Microsoft on the issues discussed as of the date of presentation and/or demonstration, and the duration of your access to the demonstration model. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of presentation and/or demonstration and for the duration of your access to the demonstration model.

No Microsoft technology, nor any of its component technologies, including the demonstration model, is intended or made available: (1) as a medical device; (2) for the diagnosis of disease or other conditions, or in the cure, mitigation, treatment or prevention of a disease or other conditions; or (3) as a substitute for the professional clinical advice, opinion, or judgment of a treating healthcare professional. Partners or customers are responsible for ensuring the regulatory compliance of any solution they build using Microsoft technologies.

© 2020 Microsoft Corporation. All rights reserved

## Setting up the workspace

In [4]:
import azureml.core
print("SDK Version:", azureml.core.VERSION)

from azureml.core import Workspace, Datastore, Dataset
ws = Workspace.from_config()
ws

SDK Version: 1.19.0


Workspace.create(name='mlw-healthcare-dev', subscription_id='6f6a71d2-83bb-42b0-9912-2e243ef214c4', resource_group='rg-healthcare-dev')

#### Create new datastore for Datasets

In [5]:
import GlobalVariables

In [6]:
from azureml.core import Datastore

blob_datastore_name=GlobalVariables.WAIT_TIME_DATASTORE_NAME # Name of the datastore in workspace
container_name=GlobalVariables.GLOBAL_CONTAINER_NAME
account_name=GlobalVariables.STORAGE_ACCOUNT_NAME
account_key=GlobalVariables.STORAGE_ACCOUNT_KEY # Storage account access key

blob_datastore = Datastore.register_azure_blob_container(workspace=ws, 
                                                         datastore_name=blob_datastore_name, 
                                                         container_name=container_name, 
                                                         account_name=account_name,
                                                         account_key=account_key)

dstore = Datastore.get(ws, datastore_name=blob_datastore_name)

In [7]:
from azureml.data.datapath import DataPath
filepath = GlobalVariables.WAIT_TIME_INPUT_FILE_NAME
print(filepath)

# Set the path to the storage account containing the file
datastore_path = [DataPath(dstore, filepath)]
patientdataset = Dataset.Tabular.from_delimited_files(path=datastore_path)
patientdataset.take(5).to_pandas_dataframe()

/pbiPatientPredictiveSet.csv


Unnamed: 0,encounter_id,hospital_id,department_id,city,patient_id,patient_age,risk_level,acute_type,patient_category,doctor_id,...,drug_cost,hospital_expense,follow_up,readmitted_patient,payment_type,date,month,year,disease,reason_for_readmission
0,21059,1,1,Los Angeles,738311d9-2f2c-11eb-aa27-70b5e8b8edbb,61,5,Acute,InPatient,9542,...,840,6300,0,0,Medicaid,2016-06-28 17:46:00,Jun,2016,,radiotherapy
1,2305342,2,6,Chicago,0e930c2e-2f31-11eb-8d13-70b5e8b8edbb,59,2,Non Acute,InPatient,3127,...,754,6074,0,0,Medicaid,2019-06-18 20:47:00,Jun,2019,,alzheimer
2,426911,1,1,Los Angeles,c7ae99aa-2f2c-11eb-88b4-70b5e8b8edbb,79,1,Non Acute,InPatient,7261,...,1017,7037,0,0,Private Insurance,2017-08-10 23:47:00,Aug,2017,,radiotherapy
3,797146,1,2,Los Angeles,4a63175e-2f2d-11eb-bf71-70b5e8b8edbb,14,2,Non Acute,InPatient,11029,...,691,6069,0,0,Medicare,2019-11-04 02:59:00,Nov,2019,,scoliosis
4,2847178,21,7,Miami,79430da9-2f32-11eb-87ff-70b5e8b8edbb,50,4,Acute,InPatient,12480,...,751,5659,1,0,Private Insurance,2016-06-11 17:28:00,Jun,2016,,flu


#### Convert to Pandas DataFrame to do data preparation

In [48]:
patient_df = patientdataset.to_pandas_dataframe()
patient_df.head()

Unnamed: 0,encounter_id,hospital_id,department_id,city,patient_id,patient_age,risk_level,acute_type,patient_category,doctor_id,...,drug_cost,hospital_expense,follow_up,readmitted_patient,payment_type,date,month,year,disease,reason_for_readmission
0,21059,1,1,Los Angeles,738311d9-2f2c-11eb-aa27-70b5e8b8edbb,61,5,Acute,InPatient,9542,...,840,6300,0,0,Medicaid,2016-06-28 17:46:00,Jun,2016,,radiotherapy
1,2305342,2,6,Chicago,0e930c2e-2f31-11eb-8d13-70b5e8b8edbb,59,2,Non Acute,InPatient,3127,...,754,6074,0,0,Medicaid,2019-06-18 20:47:00,Jun,2019,,alzheimer
2,426911,1,1,Los Angeles,c7ae99aa-2f2c-11eb-88b4-70b5e8b8edbb,79,1,Non Acute,InPatient,7261,...,1017,7037,0,0,Private Insurance,2017-08-10 23:47:00,Aug,2017,,radiotherapy
3,797146,1,2,Los Angeles,4a63175e-2f2d-11eb-bf71-70b5e8b8edbb,14,2,Non Acute,InPatient,11029,...,691,6069,0,0,Medicare,2019-11-04 02:59:00,Nov,2019,,scoliosis
4,2847178,21,7,Miami,79430da9-2f32-11eb-87ff-70b5e8b8edbb,50,4,Acute,InPatient,12480,...,751,5659,1,0,Private Insurance,2016-06-11 17:28:00,Jun,2016,,flu


In [49]:
# View info to see what the column names and types are
patient_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4369400 entries, 0 to 4369399
Data columns (total 25 columns):
encounter_id              int64
hospital_id               int64
department_id             int64
city                      object
patient_id                object
patient_age               int64
risk_level                int64
acute_type                object
patient_category          object
doctor_id                 int64
length_of_stay            int64
wait_time                 int64
type_of_stay              object
treatment_cost            int64
claim_cost                int64
drug_cost                 int64
hospital_expense          int64
follow_up                 int64
readmitted_patient        int64
payment_type              object
date                      datetime64[ns]
month                     object
year                      int64
disease                   object
reason_for_readmission    object
dtypes: datetime64[ns](1), int64(15), object(9)
memory usage: 833.4+ 

## Data Preparation for AutoML

In [55]:
import pandas as pd

In [56]:
timeseries_df = patient_df[['city','date', 'wait_time']]
timeseries_df

Unnamed: 0,city,date,wait_time
0,Los Angeles,2016-06-28 17:46:00,31
1,Chicago,2019-06-18 20:47:00,38
2,Los Angeles,2017-08-10 23:47:00,35
3,Los Angeles,2019-11-04 02:59:00,42
4,Miami,2016-06-11 17:28:00,50
...,...,...,...
4369395,Miami,2018-11-11 20:06:00,41
4369396,Miami,2018-11-14 15:48:00,44
4369397,Miami,2018-11-07 05:06:00,43
4369398,Miami,2018-11-02 07:03:00,44


#### Remove time dimension from the date column

In [57]:
timeseries_df['date'] = pd.to_datetime(timeseries_df['date'].dt.date)

In [58]:
timeseries_df.head()

Unnamed: 0,city,date,wait_time
0,Los Angeles,2016-06-28,31
1,Chicago,2019-06-18,38
2,Los Angeles,2017-08-10,35
3,Los Angeles,2019-11-04,42
4,Miami,2016-06-11,50


In [59]:
timeseries_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4369400 entries, 0 to 4369399
Data columns (total 3 columns):
city         object
date         datetime64[ns]
wait_time    int64
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 100.0+ MB


In [61]:
timeseries_df_grouped = timeseries_df.groupby(['city','date'])['wait_time'].mean().reset_index()
timeseries_df_grouped = timeseries_df_grouped.sort_values(['city','date']).reset_index(drop=True)
timeseries_df_grouped

Unnamed: 0,city,date,wait_time
0,Anchorage,2015-12-17,37.00
1,Anchorage,2015-12-18,46.50
2,Anchorage,2015-12-19,41.67
3,Anchorage,2015-12-20,39.00
4,Anchorage,2015-12-21,43.00
...,...,...,...
9043,Miami,2020-11-24,38.27
9044,Miami,2020-11-25,38.53
9045,Miami,2020-11-26,39.00
9046,Miami,2020-11-27,37.87


## Split Data based on Cities

In [66]:
city_wise_dfs = {}

cities = list(timeseries_df_grouped['city'].unique())
for city in cities:
    city_df = timeseries_df_grouped[timeseries_df_grouped['city'] == city]
    city_wise_dfs[city] = city_df[['date', 'wait_time']]
    
city_wise_dfs['Honolulu']

Unnamed: 0,date,wait_time
3618,2015-12-16,36.00
3619,2015-12-17,39.25
3620,2015-12-18,45.00
3621,2015-12-19,37.73
3622,2015-12-20,40.67
...,...,...
5423,2020-11-24,40.34
5424,2020-11-25,39.33
5425,2020-11-26,39.58
5426,2020-11-27,40.29


## Prepare Training and Testing set

Since we plan on predicting whether patients would be readmitted in October, November or December, we split the training and testing data based on the date

#### Split data based on time

In [67]:
import pandas as pd
date_cutoff = pd.to_datetime('2020-10-01')

all_train_dfs = {}
for city, df in city_wise_dfs.items():
    train_df = df[df['date'] < date_cutoff]
    all_train_dfs[city] = train_df

all_train_dfs[city]

Unnamed: 0,date,wait_time
7238,2015-12-16,43.00
7239,2015-12-17,42.11
7240,2015-12-18,40.44
7241,2015-12-19,41.05
7242,2015-12-20,40.50
...,...,...
8984,2020-09-26,40.55
8985,2020-09-27,41.04
8986,2020-09-28,41.85
8987,2020-09-29,38.56


In [68]:
all_test_dfs = {}
for city, df in city_wise_dfs.items():
    test_df = df[df['date'] >= date_cutoff]
    all_test_dfs[city] = test_df
    
all_test_dfs['Honolulu']

Unnamed: 0,date,wait_time
5369,2020-10-01,42.8
5370,2020-10-02,41.12
5371,2020-10-03,41.1
5372,2020-10-04,42.31
5373,2020-10-05,42.03
5374,2020-10-06,41.45
5375,2020-10-07,41.72
5376,2020-10-08,40.79
5377,2020-10-09,41.38
5378,2020-10-10,40.93


#### Upload training and testing set to the Storage Account

In [75]:
import os

local_data_folder = 'wait_time_data/'
if not os.path.exists(local_data_folder):
    os.mkdir(local_data_folder)

base_train_file = 'wait_time_data_train_'
base_test_file = 'wait_time_data_test_'

local_files = []
for city, train_df in all_train_dfs.items():
    city_without_spaces = '-'.join(city.split(' '))
  
    # Save train file
    train_file = base_train_file + city_without_spaces + '.csv'
    train_df.to_csv(local_data_folder + train_file, index=False)
    local_files.append(local_data_folder + train_file)
    
    # Save test file
    test_file = base_test_file + city_without_spaces + '.csv'
    test_df = all_test_dfs[city]
    test_df.to_csv(local_data_folder + test_file, index=False)
    local_files.append(local_data_folder + test_file)


In [76]:
# Upload the data
print(local_files)

dstore.upload_files(
    files = local_files,
    relative_root = local_data_folder,
    target_path = '/',
    overwrite=True,
    show_progress=True
)

['wait_time_data/wait_time_data_train_Anchorage.csv', 'wait_time_data/wait_time_data_test_Anchorage.csv', 'wait_time_data/wait_time_data_train_Chicago.csv', 'wait_time_data/wait_time_data_test_Chicago.csv', 'wait_time_data/wait_time_data_train_Honolulu.csv', 'wait_time_data/wait_time_data_test_Honolulu.csv', 'wait_time_data/wait_time_data_train_Los-Angeles.csv', 'wait_time_data/wait_time_data_test_Los-Angeles.csv', 'wait_time_data/wait_time_data_train_Miami.csv', 'wait_time_data/wait_time_data_test_Miami.csv']
Uploading an estimated of 10 files
Uploading wait_time_data/wait_time_data_train_Anchorage.csv
Uploaded wait_time_data/wait_time_data_train_Anchorage.csv, 1 files out of an estimated total of 10
Uploading wait_time_data/wait_time_data_test_Anchorage.csv
Uploaded wait_time_data/wait_time_data_test_Anchorage.csv, 2 files out of an estimated total of 10
Uploading wait_time_data/wait_time_data_train_Chicago.csv
Uploaded wait_time_data/wait_time_data_train_Chicago.csv, 3 files out of 

$AZUREML_DATAREFERENCE_readmission_prediction_store

### Set up AutoML Experiment

#### Set the Data Types for each column. 
This needs to be done explicitly since some ID columns are automatically inferred as integers, when they should be treated as strings

In [77]:
from azureml.data import DataType

data_types = {
    'city': DataType.to_string(),
    'wait_time': DataType.to_long(),
    'date': DataType.to_datetime("%Y-%m-%d"),
}

print(len(data_types))

3


#### Load Training data from Storage Blob as a TabularDataSet

In [79]:
all_train_datasets = {}
for city in all_train_dfs.keys():
    filepath = base_train_file + city_without_spaces + '.csv'

    datastore_path = [DataPath(dstore, filepath)]
    traindataset = Dataset.Tabular.from_delimited_files(path=datastore_path, set_column_types=data_types)
    traindataset.to_pandas_dataframe().info()
    all_train_datasets[city] = traindataset
    


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1751 entries, 0 to 1750
Data columns (total 2 columns):
date         1751 non-null datetime64[ns]
wait_time    1751 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 27.5 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1751 entries, 0 to 1750
Data columns (total 2 columns):
date         1751 non-null datetime64[ns]
wait_time    1751 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 27.5 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1751 entries, 0 to 1750
Data columns (total 2 columns):
date         1751 non-null datetime64[ns]
wait_time    1751 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 27.5 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1751 entries, 0 to 1750
Data columns (total 2 columns):
date         1751 non-null datetime64[ns]
wait_time    1751 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 27.5 KB
<class 'pandas.core.frame.DataFrame'>
RangeI

In [80]:
y_variable = "wait_time"

#### Setup Computer Instances

In [81]:
from azureml.core.compute import AmlCompute

compute = AmlCompute(ws, "health-cluster")

#### Configure the AutoML model and run it

In [84]:
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig

for city, traindataset in all_train_datasets.items():
    city_without_spaces = '-'.join(city.split(' '))
    experiment_name = 'Waittime-Forecasting-Experiment_' + city_without_spaces
    experiment = Experiment(ws, experiment_name)

    automl_config = AutoMLConfig(task = 'forecasting',
                         debug_log = 'automl_errors.log',
                         iteration_timeout_minutes = 15,
                         n_cross_validations=3,
                         experiment_timeout_minutes = 15,
                         label_column_name=y_variable,
                         time_column_name='date',
                         enable_early_stopping=True,
                         compute_target = compute,
                         training_data = traindataset,
                         model_explainability=True)

    training_run = experiment.submit(automl_config, show_output = False)

Running on remote.
Running on remote.
Running on remote.
Running on remote.
Running on remote.


#### Retrieve model to predict the test set

In [85]:
import azureml.core
from azureml.core import Workspace, Datastore, Dataset, Experiment

ws = Workspace.from_config()
blob_datastore_name=GlobalVariables.GLOBAL_DATASTORE_NAME
dstore = Datastore.get(ws, datastore_name=blob_datastore_name)
#ws_ds = ws.get_default_datastore()

print('Workspace Name: ' + ws.name, 
      'Resource Group: ' + ws.resource_group,
      'Default Storage Account Name: ' + dstore.account_name,
      'AzureML Core Version: ' + azureml.core.VERSION,
      sep = '\n')

Workspace Name: mlw-healthcare-dev
Resource Group: rg-healthcare-dev
Default Storage Account Name: sthealthcaredev001
AzureML Core Version: 1.19.0


In [109]:
autoMLRunIds = {
    'Miami': 'AutoML_7cf8ec97-fb9e-4576-a10e-753ec8c27dbb',
    'Los Angeles': 'AutoML_abe874cc-0dcb-4895-93c7-530fb0059153',
    'Honolulu': 'AutoML_d1bbc763-b8aa-4494-8131-ccec316fc5b3',
    'Chicago': 'AutoML_9a85a9f5-d2db-4819-bb3f-ba8701812e8a',
    'Anchorage': 'AutoML_cb7fa5fa-bce9-4b14-b65b-3f7a4f795ed3',    
}

In [110]:
from azureml.train.automl.run import AutoMLRun

all_automl_runs = {}
for city, autoMLRunId in autoMLRunIds.items():
    city_without_spaces = '-'.join(city.split(' '))
    experiment_name = 'Waittime-Forecasting-Experiment_' + city_without_spaces

    experiment = Experiment(workspace = ws, name = experiment_name)
    automl_run = AutoMLRun(experiment, autoMLRunId, outputs = None)
    display(automl_run)
    all_automl_runs[city] = automl_run

Experiment,Id,Type,Status,Details Page,Docs Page
Waittime-Forecasting-Experiment_Miami,AutoML_7cf8ec97-fb9e-4576-a10e-753ec8c27dbb,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


Experiment,Id,Type,Status,Details Page,Docs Page
Waittime-Forecasting-Experiment_Los-Angeles,AutoML_abe874cc-0dcb-4895-93c7-530fb0059153,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


Experiment,Id,Type,Status,Details Page,Docs Page
Waittime-Forecasting-Experiment_Honolulu,AutoML_d1bbc763-b8aa-4494-8131-ccec316fc5b3,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


Experiment,Id,Type,Status,Details Page,Docs Page
Waittime-Forecasting-Experiment_Chicago,AutoML_9a85a9f5-d2db-4819-bb3f-ba8701812e8a,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


Experiment,Id,Type,Status,Details Page,Docs Page
Waittime-Forecasting-Experiment_Anchorage,AutoML_cb7fa5fa-bce9-4b14-b65b-3f7a4f795ed3,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [111]:
all_models = {}

for city, automl_run in all_automl_runs.items():
    best_run, fitted_model = automl_run.get_output()
    # print(fitted_model.steps)
    model_name = best_run.properties['model_name']
    print(model_name)
    all_models[city] = fitted_model

AutoML7cf8ec97f0
AutoMLabe874cc00
AutoMLd1bbc763b0
AutoML9a85a9f5d0
AutoMLcb7fa5fab4


In [112]:
all_models['Honolulu']

ForecastingPipelineWrapper(pipeline=Pipeline(memory=None,
                                             steps=[('timeseriestransformer',
                                                     TimeSeriesTransformer(featurization_config=None,
                                                                           pipeline_type=<TimeSeriesPipelineType.FULL: 1>)),
                                                    ('AutoArima',
                                                     <azureml.automl.runtime.shared._auto_arima.AutoArima object at 0x7f45fc0a6ef0>)],
                                             verbose=False),
                           stddev=None)

#### Upload predictions to storage account

The test_df also contains the y_variable which needs to be dropped

In [113]:
X_test_df = pd.DataFrame({'date': pd.date_range(start='2020-10-01', end='2020-12-31')})
X_test_df

Unnamed: 0,date
0,2020-10-01
1,2020-10-02
2,2020-10-03
3,2020-10-04
4,2020-10-05
...,...
87,2020-12-27
88,2020-12-28
89,2020-12-29
90,2020-12-30


In [114]:
all_predictions = {}
for city, fitted_model in all_models.items():
    predictions = fitted_model.predict(X_test_df)
    display(predictions)
    all_predictions[city] = predictions

array([40.23881717, 40.45922721, 40.28689296, 40.37154738, 40.32154208,
       40.34896927, 40.33363142, 40.34232316, 40.33761261, 40.34040093,
       40.33899921, 40.33994008, 40.33957157, 40.33993501, 40.33988931,
       40.34007231, 40.34012747, 40.3402541 , 40.34034078, 40.34044978,
       40.34054631, 40.34064981, 40.34074941, 40.34085119, 40.34095176,
       40.341053  , 40.34115386, 40.34125494, 40.3413559 , 40.34145692,
       40.34155791, 40.34165892, 40.34175991, 40.34186091, 40.34196191,
       40.34206291, 40.34216391, 40.34226491, 40.34236591, 40.34246691,
       40.34256792, 40.34266892, 40.34276992, 40.34287092, 40.34297192,
       40.34307292, 40.34317392, 40.34327492, 40.34337592, 40.34347692,
       40.34357792, 40.34367892, 40.34377992, 40.34388092, 40.34398192,
       40.34408292, 40.34418392, 40.34428492, 40.34438592, 40.34448692,
       40.34458792, 40.34468892, 40.34478992, 40.34489092, 40.34499192,
       40.34509292, 40.34519392, 40.34529492, 40.34539592, 40.34

array([40.23881717, 40.45922721, 40.28689296, 40.37154738, 40.32154208,
       40.34896927, 40.33363142, 40.34232316, 40.33761261, 40.34040093,
       40.33899921, 40.33994008, 40.33957157, 40.33993501, 40.33988931,
       40.34007231, 40.34012747, 40.3402541 , 40.34034078, 40.34044978,
       40.34054631, 40.34064981, 40.34074941, 40.34085119, 40.34095176,
       40.341053  , 40.34115386, 40.34125494, 40.3413559 , 40.34145692,
       40.34155791, 40.34165892, 40.34175991, 40.34186091, 40.34196191,
       40.34206291, 40.34216391, 40.34226491, 40.34236591, 40.34246691,
       40.34256792, 40.34266892, 40.34276992, 40.34287092, 40.34297192,
       40.34307292, 40.34317392, 40.34327492, 40.34337592, 40.34347692,
       40.34357792, 40.34367892, 40.34377992, 40.34388092, 40.34398192,
       40.34408292, 40.34418392, 40.34428492, 40.34438592, 40.34448692,
       40.34458792, 40.34468892, 40.34478992, 40.34489092, 40.34499192,
       40.34509292, 40.34519392, 40.34529492, 40.34539592, 40.34

array([40.23881717, 40.45922721, 40.28689296, 40.37154738, 40.32154208,
       40.34896927, 40.33363142, 40.34232316, 40.33761261, 40.34040093,
       40.33899921, 40.33994008, 40.33957157, 40.33993501, 40.33988931,
       40.34007231, 40.34012747, 40.3402541 , 40.34034078, 40.34044978,
       40.34054631, 40.34064981, 40.34074941, 40.34085119, 40.34095176,
       40.341053  , 40.34115386, 40.34125494, 40.3413559 , 40.34145692,
       40.34155791, 40.34165892, 40.34175991, 40.34186091, 40.34196191,
       40.34206291, 40.34216391, 40.34226491, 40.34236591, 40.34246691,
       40.34256792, 40.34266892, 40.34276992, 40.34287092, 40.34297192,
       40.34307292, 40.34317392, 40.34327492, 40.34337592, 40.34347692,
       40.34357792, 40.34367892, 40.34377992, 40.34388092, 40.34398192,
       40.34408292, 40.34418392, 40.34428492, 40.34438592, 40.34448692,
       40.34458792, 40.34468892, 40.34478992, 40.34489092, 40.34499192,
       40.34509292, 40.34519392, 40.34529492, 40.34539592, 40.34

array([40.23881717, 40.45922721, 40.28689296, 40.37154738, 40.32154208,
       40.34896927, 40.33363142, 40.34232316, 40.33761261, 40.34040093,
       40.33899921, 40.33994008, 40.33957157, 40.33993501, 40.33988931,
       40.34007231, 40.34012747, 40.3402541 , 40.34034078, 40.34044978,
       40.34054631, 40.34064981, 40.34074941, 40.34085119, 40.34095176,
       40.341053  , 40.34115386, 40.34125494, 40.3413559 , 40.34145692,
       40.34155791, 40.34165892, 40.34175991, 40.34186091, 40.34196191,
       40.34206291, 40.34216391, 40.34226491, 40.34236591, 40.34246691,
       40.34256792, 40.34266892, 40.34276992, 40.34287092, 40.34297192,
       40.34307292, 40.34317392, 40.34327492, 40.34337592, 40.34347692,
       40.34357792, 40.34367892, 40.34377992, 40.34388092, 40.34398192,
       40.34408292, 40.34418392, 40.34428492, 40.34438592, 40.34448692,
       40.34458792, 40.34468892, 40.34478992, 40.34489092, 40.34499192,
       40.34509292, 40.34519392, 40.34529492, 40.34539592, 40.34

array([40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 39.68181818, 39.68181818,
       39.68181818, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 39.68181818,
       39.68181818, 40.71428571, 40.71428571, 40.71428571, 40.71428571,
       40.71428571, 40.71428571, 40.71428571, 40.71428571, 40.71

In [115]:
predicted_dfs = {}

for city, predictions in all_predictions.items():
    df = X_test_df.copy()
    df['wait_time'] = predictions
    display(df)
    predicted_dfs[city] = df

Unnamed: 0,date,wait_time
0,2020-10-01,40.24
1,2020-10-02,40.46
2,2020-10-03,40.29
3,2020-10-04,40.37
4,2020-10-05,40.32
...,...,...
87,2020-12-27,40.35
88,2020-12-28,40.35
89,2020-12-29,40.35
90,2020-12-30,40.35


Unnamed: 0,date,wait_time
0,2020-10-01,40.24
1,2020-10-02,40.46
2,2020-10-03,40.29
3,2020-10-04,40.37
4,2020-10-05,40.32
...,...,...
87,2020-12-27,40.35
88,2020-12-28,40.35
89,2020-12-29,40.35
90,2020-12-30,40.35


Unnamed: 0,date,wait_time
0,2020-10-01,40.24
1,2020-10-02,40.46
2,2020-10-03,40.29
3,2020-10-04,40.37
4,2020-10-05,40.32
...,...,...
87,2020-12-27,40.35
88,2020-12-28,40.35
89,2020-12-29,40.35
90,2020-12-30,40.35


Unnamed: 0,date,wait_time
0,2020-10-01,40.24
1,2020-10-02,40.46
2,2020-10-03,40.29
3,2020-10-04,40.37
4,2020-10-05,40.32
...,...,...
87,2020-12-27,40.35
88,2020-12-28,40.35
89,2020-12-29,40.35
90,2020-12-30,40.35


Unnamed: 0,date,wait_time
0,2020-10-01,40.71
1,2020-10-02,40.71
2,2020-10-03,40.71
3,2020-10-04,40.71
4,2020-10-05,40.71
...,...,...
87,2020-12-27,40.93
88,2020-12-28,40.93
89,2020-12-29,39.68
90,2020-12-30,39.68


#### Upload predictions to storage account

In [116]:
final_dfs = []

for city, predicted_df in predicted_dfs.items():
    print(city)
    train_df = all_train_dfs[city]
    final_df = pd.concat([train_df, predicted_df])
    city_list = [city]*len(final_df)
    final_df['city'] = city_list
    display(final_df)
    final_dfs.append(final_df)

Miami


Unnamed: 0,date,wait_time,city
7238,2015-12-16,43.00,Miami
7239,2015-12-17,42.11,Miami
7240,2015-12-18,40.44,Miami
7241,2015-12-19,41.05,Miami
7242,2015-12-20,40.50,Miami
...,...,...,...
87,2020-12-27,40.35,Miami
88,2020-12-28,40.35,Miami
89,2020-12-29,40.35,Miami
90,2020-12-30,40.35,Miami


Los Angeles


Unnamed: 0,date,wait_time,city
5428,2015-12-16,39.00,Los Angeles
5429,2015-12-17,42.50,Los Angeles
5430,2015-12-18,41.00,Los Angeles
5431,2015-12-19,40.38,Los Angeles
5432,2015-12-20,38.80,Los Angeles
...,...,...,...
87,2020-12-27,40.35,Los Angeles
88,2020-12-28,40.35,Los Angeles
89,2020-12-29,40.35,Los Angeles
90,2020-12-30,40.35,Los Angeles


Honolulu


Unnamed: 0,date,wait_time,city
3618,2015-12-16,36.00,Honolulu
3619,2015-12-17,39.25,Honolulu
3620,2015-12-18,45.00,Honolulu
3621,2015-12-19,37.73,Honolulu
3622,2015-12-20,40.67,Honolulu
...,...,...,...
87,2020-12-27,40.35,Honolulu
88,2020-12-28,40.35,Honolulu
89,2020-12-29,40.35,Honolulu
90,2020-12-30,40.35,Honolulu


Chicago


Unnamed: 0,date,wait_time,city
1808,2015-12-16,45.33,Chicago
1809,2015-12-17,41.33,Chicago
1810,2015-12-18,42.75,Chicago
1811,2015-12-19,42.07,Chicago
1812,2015-12-20,42.62,Chicago
...,...,...,...
87,2020-12-27,40.35,Chicago
88,2020-12-28,40.35,Chicago
89,2020-12-29,40.35,Chicago
90,2020-12-30,40.35,Chicago


Anchorage


Unnamed: 0,date,wait_time,city
0,2015-12-17,37.00,Anchorage
1,2015-12-18,46.50,Anchorage
2,2015-12-19,41.67,Anchorage
3,2015-12-20,39.00,Anchorage
4,2015-12-21,43.00,Anchorage
...,...,...,...
87,2020-12-27,40.93,Anchorage
88,2020-12-28,40.93,Anchorage
89,2020-12-29,39.68,Anchorage
90,2020-12-30,39.68,Anchorage


In [117]:
full_df = pd.concat(final_dfs)
print(full_df.shape)
full_df.to_csv(local_data_folder+'wait_time_forecasted.csv',index=False)

(9214, 3)


In [118]:
# Upload the data
local_files = [local_data_folder + 'wait_time_forecasted.csv']
print(local_files)

dstore.upload_files(
    files = local_files,
    relative_root = local_data_folder,
    target_path = '/',
    overwrite=True,
    show_progress=True
)

['wait_time_data/wait_time_forecasted.csv']
Uploading an estimated of 1 files
Uploading wait_time_data/wait_time_forecasted.csv
Uploaded wait_time_data/wait_time_forecasted.csv, 1 files out of an estimated total of 1
Uploaded 1 files


$AZUREML_DATAREFERENCE_predictiveanalytics_store