<a href="https://colab.research.google.com/github/corinneah/AutoML-examples/blob/main/automl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Install Packages**

In [1]:

!pip install tpot mljar-supervised

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tpot
  Downloading TPOT-0.11.7-py3-none-any.whl (87 kB)
[K     |████████████████████████████████| 87 kB 1.6 MB/s 
[?25hCollecting mljar-supervised
  Downloading mljar-supervised-0.11.3.tar.gz (112 kB)
[K     |████████████████████████████████| 112 kB 11.2 MB/s 
Collecting update-checker>=0.16
  Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Collecting deap>=1.2
  Downloading deap-1.3.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139 kB)
[K     |████████████████████████████████| 139 kB 10.8 MB/s 
[?25hCollecting stopit>=1.1.1
  Downloading stopit-1.1.2.tar.gz (18 kB)
Collecting xgboost>=1.1.0
  Downloading xgboost-1.7.1-py3-none-manylinux2014_x86_64.whl (193.6 MB)
[K     |████████████████████████████████| 193.6 MB 60 kB/s 
Collecting lightgbm>=3.0.0
  Downloading lightgbm-3.3.3-py3-none-manylinux1_x86_64.whl 

In [2]:

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML

### Options Available
- mode — the package ships with four built-in models.
  - The Explain mode is ideal for explaining and understanding the data. It results in visualizations of feature importance as well as tree visualizations.
  - The Perform is used when building ML models for production.
  - The Compete is meant to build models used in machine learning competitions.
  - The Optuna mode is used to search for highly-tuned ML models.
- algorithms — specifies the algorithms you would like to use. They are usually passed in as a list.
- results_path — the path where the results will be stored
- total_time_limit — the total time in seconds for training the model
- train_ensemble — dictates if an ensemble will be created at the end of the training process
- stack_models — determines if a models stack will be created
- eval_metric — the metric that will be optimized. If auto the logloss is used for classification problems while the rmse is used for regression problems

In [None]:
#automl = AutoML(
    # mode="Explain"
    # algorithms=""
    # results_path="AutoML_22",
    # total_time_limit=30 * 60,
    # train_ensemble=True,
    # stack_models="",
    # eval_metric=""
#)

# Healthcare Dataset - SPARCS

## Load in dataset

In [3]:
import pandas as pd
sparcs = pd.read_csv('https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_sparcs.csv')
sparcs

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,,,0,N,Y,4757.01,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,5090.25,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,4948.50,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,4719.75,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,50384.75,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,50833.00,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Medical,Private Health Insurance,,,3200,N,N,10948.00,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Minor,Medical,Medicaid,,,0,N,N,46421.00,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,Medicare,,0,N,Y,46122.00,7951.26


In [4]:
sparcs.columns

Index(['Health Service Area', 'Hospital County',
       'Operating Certificate Number', 'Facility Id', 'Facility Name',
       'Age Group', 'Zip Code - 3 digits', 'Gender', 'Race', 'Ethnicity',
       'Length of Stay', 'Type of Admission', 'Patient Disposition',
       'Discharge Year', 'CCS Diagnosis Code', 'CCS Diagnosis Description',
       'CCS Procedure Code', 'CCS Procedure Description', 'APR DRG Code',
       'APR DRG Description', 'APR MDC Code', 'APR MDC Description',
       'APR Severity of Illness Code', 'APR Severity of Illness Description',
       'APR Risk of Mortality', 'APR Medical Surgical Description',
       'Payment Typology 1', 'Payment Typology 2', 'Payment Typology 3',
       'Birth Weight', 'Abortion Edit Indicator',
       'Emergency Department Indicator', 'Total Charges', 'Total Costs'],
      dtype='object')

# Potential variables of interest

- APR Risk of Mortality (categorical)
- Total costs (continuous)
- Total charges (continuous)
- Length of Stay
- Race


In [5]:
sparcs['Total Charges'].describe() # continuous

count    2.358300e+04
mean     4.344052e+04
std      8.434949e+04
min      1.000000e+00
25%      1.226175e+04
50%      2.375403e+04
75%      4.702837e+04
max      4.410671e+06
Name: Total Charges, dtype: float64

In [7]:
sparcs['Type of Admission'].value_counts() #categorical

Emergency        14968
Elective          4508
Newborn           2285
Urgent            1743
Trauma              63
Not Available       16
Name: Type of Admission, dtype: int64

In [6]:
sparcs['Race'].value_counts()

White                     13433
Other Race                 5442
Black/African American     4467
Multi-racial                241
Name: Race, dtype: int64

# MLJar Examples

Binary Classifier Example 1 - SPARCS

Create new model

In [35]:

X = sparcs.drop(columns=['Type of Admission'])

In [36]:
y = sparcs["Type of Admission"]

In [37]:
X

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,,,0,N,Y,4757.01,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,5090.25,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,4948.50,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,4719.75,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,50384.75,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,50833.00,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Medical,Private Health Insurance,,,3200,N,N,10948.00,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Minor,Medical,Medicaid,,,0,N,N,46421.00,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,Medicare,,0,N,Y,46122.00,7951.26


In [38]:
y

0           Urgent
1           Urgent
2          Newborn
3        Emergency
4        Emergency
           ...    
23578    Emergency
23579      Newborn
23580       Urgent
23581    Emergency
23582    Emergency
Name: Type of Admission, Length: 23583, dtype: object

In [39]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

In [40]:
X_test

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
11645,New York City,Kings,7001009.0,1294.0,Coney Island Hospital,70 or Older,112,F,Black/African American,Not Span/Hispanic,...,Major,Medical,Medicare,Medicare,Medicaid,0,N,Y,63814.95,31547.11
8236,Hudson Valley,Westchester,5902001.0,1045.0,White Plains Hospital Center,0 to 17,105,M,Other Race,Not Span/Hispanic,...,Minor,Medical,Medicaid,Self-Pay,Self-Pay,2900,N,Y,4192.00,2711.59
15963,New York City,Manhattan,7002054.0,1458.0,New York Presbyterian Hospital - New York Weil...,0 to 17,100,M,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,Self-Pay,,3500,N,N,9033.74,986.28
15025,New York City,Manhattan,7002020.0,1453.0,Memorial Hospital for Cancer and Allied Diseases,18 to 29,OOS,M,White,Not Span/Hispanic,...,Moderate,Surgical,Private Health Insurance,,,0,N,N,75740.56,37491.35
3230,Southern Tier,Broome,301001.0,43.0,Our Lady of Lourdes Memorial Hospital Inc,70 or Older,137,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,,,0,N,Y,14183.58,7805.65
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4965,Central NY,St Lawrence,4401000.0,798.0,Claxton-Hepburn Medical Center,30 to 49,136,F,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,,,0,N,Y,11129.00,6148.99
3531,Southern Tier,Chenango,824000.0,128.0,Chenango Memorial Hospital Inc,18 to 29,138,F,White,Not Span/Hispanic,...,Minor,Medical,Medicaid,,,0,N,N,7067.50,3379.77
15548,New York City,Manhattan,7002024.0,1456.0,Mount Sinai Hospital,50 to 69,105,M,White,Not Span/Hispanic,...,Minor,Surgical,Private Health Insurance,Self-Pay,,0,N,N,28011.61,12321.12
19196,New York City,Queens,7003013.0,1638.0,Forest Hills Hospital,70 or Older,113,M,White,Not Span/Hispanic,...,Minor,Surgical,Medicare,Medicaid,,0,N,N,62919.56,22668.02


In [65]:
automl = AutoML(results_path="sparcs_type_of_admission", mode="Explain")
     

In [66]:

automl.fit(X_train, y_train)

Linear algorithm was disabled.
AutoML directory: sparcs_type_of_admission
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 2 models
1_Baseline logloss 1.047051 trained in 1.18 seconds
2_DecisionTree logloss 0.371394 trained in 28.59 seconds
* Step default_algorithms will try to check up to 3 models
3_Default_Xgboost logloss 0.214198 trained in 61.86 seconds
4_Default_NeuralNetwork logloss 0.358817 trained in 8.42 seconds
5_Default_RandomForest logloss 0.337978 trained in 44.71 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.214198 trained in 0.61 seconds
AutoML fit time: 155.99 seconds
AutoML best model: 3_Default_Xgboost


AutoML(results_path='sparcs_type_of_admission')

In [67]:
pred = automl.predict(X_test)
pred

array(['Emergency', 'Emergency', 'Newborn', ..., 'Elective', 'Emergency',
       'Emergency'], dtype=object)

In [68]:

automl.report()

Best model,name,model_type,metric_type,metric_value,train_time
,1_Baseline,Baseline,logloss,1.04705,1.9
,2_DecisionTree,Decision Tree,logloss,0.371394,29.83
the best,3_Default_Xgboost,Xgboost,logloss,0.214198,63.14
,4_Default_NeuralNetwork,Neural Network,logloss,0.358817,9.44
,5_Default_RandomForest,Random Forest,logloss,0.337978,46.0
,Ensemble,Ensemble,logloss,0.214198,0.61

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.827586,0.960155,0.995349,0.8,0,0.728346,0.923146,0.718573,0.918319,0.214198
recall,0.880473,0.970075,1.0,0.8,0,0.565749,0.923146,0.702716,0.923146,0.214198
f1-score,0.853211,0.965089,0.997669,0.8,0,0.636833,0.923146,0.7088,0.919805,0.214198
support,845.0,2807.0,428.0,5.0,12,327.0,0.923146,4424.0,4424.0,0.214198

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,744,58,0,0,0,43
Labeled as Emergency,55,2723,2,1,0,26
Labeled as Newborn,0,0,428,0,0,0
Labeled as Not Available,0,1,0,4,0,0
Labeled as Trauma,1,11,0,0,0,0
Labeled as Urgent,99,43,0,0,0,185

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.721264,0.938904,0.981567,0,0,0.438776,0.880651,0.513419,0.860887,0.358817
recall,0.891124,0.952618,0.995327,0,0,0.131498,0.880651,0.495095,0.880651,0.358817
f1-score,0.797247,0.945712,0.988399,0,0,0.202353,0.880651,0.488952,0.862905,0.358817
support,845.0,2807.0,428.0,5,12,327.0,0.880651,4424.0,4424.0,0.358817

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,753,65,2,0,0,25
Labeled as Emergency,103,2674,2,0,0,28
Labeled as Newborn,0,0,426,0,0,2
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,2,10,0,0,0,0
Labeled as Urgent,186,94,4,0,0,43

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.591236,0.973277,0.990741,0,0,0.722222,0.853752,0.546246,0.879698,0.371394
recall,0.973964,0.895262,1.0,0,0,0.0397554,0.853752,0.48483,0.853752,0.371394
f1-score,0.735807,0.932641,0.995349,0,0,0.0753623,0.853752,0.456526,0.834162,0.371394
support,845.0,2807.0,428.0,5,12,327.0,0.853752,4424.0,4424.0,0.371394

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,823,19,1,0,0,2
Labeled as Emergency,289,2513,2,0,0,3
Labeled as Newborn,0,0,428,0,0,0
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,1,11,0,0,0,0
Labeled as Urgent,279,34,1,0,0,13

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.732714,0.904825,0.990741,0,0,0.722222,0.874774,0.558417,0.863289,0.337978
recall,0.840237,0.96865,1.0,0,0,0.0397554,0.874774,0.474774,0.874774,0.337978
f1-score,0.7828,0.93565,0.995349,0,0,0.0753623,0.874774,0.46486,0.845047,0.337978
support,845.0,2807.0,428.0,5,12,327.0,0.874774,4424.0,4424.0,0.337978

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,710,132,1,0,0,2
Labeled as Emergency,83,2719,2,0,0,3
Labeled as Newborn,0,0,428,0,0,0
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,1,11,0,0,0,0
Labeled as Urgent,175,138,1,0,0,13

Model,Weight
3_Default_Xgboost,1

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.827586,0.960155,0.995349,0.8,0,0.728346,0.923146,0.718573,0.918319,0.214198
recall,0.880473,0.970075,1.0,0.8,0,0.565749,0.923146,0.702716,0.923146,0.214198
f1-score,0.853211,0.965089,0.997669,0.8,0,0.636833,0.923146,0.7088,0.919805,0.214198
support,845.0,2807.0,428.0,5.0,12,327.0,0.923146,4424.0,4424.0,0.214198

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,744,58,0,0,0,43
Labeled as Emergency,55,2723,2,1,0,26
Labeled as Newborn,0,0,428,0,0,0
Labeled as Not Available,0,1,0,4,0,0
Labeled as Trauma,1,11,0,0,0,0
Labeled as Urgent,99,43,0,0,0,185

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0,0.634494,0,0,0,0,0.634494,0.105749,0.402582,1.04705
recall,0,1.0,0,0,0,0,0.634494,0.166667,0.634494,1.04705
f1-score,0,0.776379,0,0,0,0,0.634494,0.129397,0.492608,1.04705
support,845,2807.0,428,5,12,327,0.634494,4424.0,4424.0,1.04705

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,0,845,0,0,0,0
Labeled as Emergency,0,2807,0,0,0,0
Labeled as Newborn,0,428,0,0,0,0
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,0,12,0,0,0,0
Labeled as Urgent,0,327,0,0,0,0


# Regression 
continuous 

In [45]:

import numpy as np
import pandas as pd
from supervised.automl import AutoML

In [47]:

x_cols = [c for c in sparcs.columns if c != 'Total Charges']
x = sparcs[x_cols]
y = sparcs['Total Costs']

In [48]:

x_cols

['Health Service Area',
 'Hospital County',
 'Operating Certificate Number',
 'Facility Id',
 'Facility Name',
 'Age Group',
 'Zip Code - 3 digits',
 'Gender',
 'Race',
 'Ethnicity',
 'Length of Stay',
 'Type of Admission',
 'Patient Disposition',
 'Discharge Year',
 'CCS Diagnosis Code',
 'CCS Diagnosis Description',
 'CCS Procedure Code',
 'CCS Procedure Description',
 'APR DRG Code',
 'APR DRG Description',
 'APR MDC Code',
 'APR MDC Description',
 'APR Severity of Illness Code',
 'APR Severity of Illness Description',
 'APR Risk of Mortality',
 'APR Medical Surgical Description',
 'Payment Typology 1',
 'Payment Typology 2',
 'Payment Typology 3',
 'Birth Weight',
 'Abortion Edit Indicator',
 'Emergency Department Indicator',
 'Total Costs']

In [49]:
x

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Severity of Illness Description,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Minor,Medical,Private Health Insurance,,,0,N,Y,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Extreme,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Medical,Private Health Insurance,,,3200,N,N,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Moderate,Minor,Medical,Medicaid,,,0,N,N,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Major,Medical,Medicare,Medicare,,0,N,Y,7951.26


In [50]:

y

0         4747.83
1         2985.64
2         2129.67
3         8454.41
4        34565.03
           ...   
23578     8961.40
23579     2214.06
23580    11083.24
23581     7951.26
23582     6212.95
Name: Total Costs, Length: 23583, dtype: float64

In [51]:

automl2 = AutoML(
      results_path="sparcs_total_costs",
      mode="Explain"
)

In [52]:

automl2.fit(x, y)

Linear algorithm was disabled.
AutoML directory: sparcs_total_costs
The task is regression with evaluation metric rmse
AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 2 models
1_Baseline rmse 23541.268908 trained in 2.34 seconds
2_DecisionTree rmse 6405.785233 trained in 11.67 seconds
* Step default_algorithms will try to check up to 3 models
3_Default_Xgboost rmse 6672.006997 trained in 20.58 seconds
4_Default_NeuralNetwork rmse 9293.443885 trained in 6.1 seconds
5_Default_RandomForest rmse 8876.605692 trained in 16.69 seconds
* Step ensemble will try to check up to 1 model
Ensemble rmse 4785.876276 trained in 0.27 seconds
AutoML fit time: 67.7 seconds
AutoML best model: Ensemble


AutoML(results_path='sparcs_total_costs')

In [53]:

sparcs["predictions"] = automl2.predict(x)


In [56]:

print("Predictions")
print(sparcs[['Total Costs', 'predictions']].head())

Predictions
   Total Costs   predictions
0      4747.83   5015.534887
1      2985.64   4126.185277
2      2129.67   3710.800267
3      8454.41   6865.572972
4     34565.03  34270.075173


# Download outputs

In [57]:

# get current working directory
import os
os.getcwd()

'/content'

In [58]:

folders = os.listdir()
foldersML = [x for x in folders if x.startswith('sparcs')]
print(foldersML)

['sparcs_total_costs']


In [62]:

!zip -r /content/sparcs.zip /content/sparcs_total_costs

  adding: content/sparcs_total_costs/ (stored 0%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/ (stored 0%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/learner_fold_0_shap_importance.csv (deflated 47%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/predictions_validation.csv (deflated 62%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/learning_curves.png (deflated 10%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/learner_fold_0_importance.csv (deflated 44%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/learner_fold_0_shap_dependence.png (deflated 6%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/learner_fold_0_shap_best_decisions.png (deflated 13%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/learner_fold_0.xgboost (deflated 62%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/predicted_vs_residuals.png (deflated 15%)
  adding: content/sparcs_total_costs/3_Default_Xgboost/README.md (deflated 56%)
  addin

In [69]:

folders = os.listdir()
foldersML = [x for x in folders if x.startswith('sparcs')]
print(foldersML)

['sparcs.zip', 'sparcs_total_costs', 'sparcs_type_of_admission']


In [70]:

!zip -r /content/sparcs.zip /content/sparcs_type_of_admission

  adding: content/sparcs_type_of_admission/ (stored 0%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/ (stored 0%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/learner_fold_0_shap_importance.csv (deflated 44%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/learner_fold_0_sample_0_worst_decisions.png (deflated 7%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/predictions_validation.csv (deflated 60%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/learning_curves.png (deflated 8%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/learner_fold_0_importance.csv (deflated 45%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/learner_fold_0_shap_dependence_class_Emergency.png (deflated 6%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/learner_fold_0_shap_dependence_class_Not Available.png (deflated 5%)
  adding: content/sparcs_type_of_admission/3_Default_Xgboost/learner_fold_0_shap