<a href="https://colab.research.google.com/github/alicewu1/AutoML-examples/blob/main/AutoML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
!pip install tpot mljar-supervised

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tpot
  Using cached TPOT-0.11.7-py3-none-any.whl (87 kB)
Collecting mljar-supervised
  Downloading mljar-supervised-0.11.3.tar.gz (112 kB)
[K     |████████████████████████████████| 112 kB 5.1 MB/s 
[?25hCollecting update-checker>=0.16
  Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Collecting deap>=1.2
  Downloading deap-1.3.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139 kB)
[K     |████████████████████████████████| 139 kB 42.5 MB/s 
[?25hCollecting stopit>=1.1.1
  Downloading stopit-1.1.2.tar.gz (18 kB)
Collecting xgboost>=1.1.0
  Downloading xgboost-1.6.2-py3-none-manylinux2014_x86_64.whl (255.9 MB)
[K     |████████████████████████████████| 255.9 MB 43 kB/s 
Collecting lightgbm>=3.0.0
  Downloading lightgbm-3.3.3-py3-none-manylinux1_x86_64.whl (2.0 MB)
[K     |████████████████████████████████| 2.0 MB

In [4]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML

# Options Available

- mode — the package ships with four built-in models. 
  - The Explain mode is ideal for explaining and understanding the data. It results in visualizations of feature importance as well as tree visualizations.
  - The Perform is used when building ML models for production. 
  - The Compete is meant to build models used in machine learning competitions. 
  - The Optuna mode is used to search for highly-tuned ML models.
- algorithms — specifies the algorithms you would like to use. They are usually passed in as a list.
- results_path — the path where the results will be stored
- total_time_limit — the total time in seconds for training the model
- train_ensemble — dictates if an ensemble will be created at the end of the training process
- stack_models — determines if a models stack will be created
- eval_metric — the metric that will be optimized. If auto the logloss is used for classification problems while the rmse is used for regression problems


In [None]:
#automl = AutoML(
    # mode="Explain"
    # algorithms=""
    # results_path="AutoML_22",
    # total_time_limit=30 * 60,
    # train_ensemble=True,
    # stack_models="",
    # eval_metric=""
#)

# Healthcare Dataset - [Modified SPARCS dataset](https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_sparcs.csv)

## Load In Dataset

In [122]:
import pandas as pd
sparcs = pd.read_csv('https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_sparcs.csv')
sparcs

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,,,0,N,Y,4757.01,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,5090.25,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,4948.50,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,4719.75,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,50384.75,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,50833.00,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Medical,Private Health Insurance,,,3200,N,N,10948.00,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Minor,Medical,Medicaid,,,0,N,N,46421.00,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,Medicare,,0,N,Y,46122.00,7951.26


In [6]:
sparcs.columns

Index(['Health Service Area', 'Hospital County',
       'Operating Certificate Number', 'Facility Id', 'Facility Name',
       'Age Group', 'Zip Code - 3 digits', 'Gender', 'Race', 'Ethnicity',
       'Length of Stay', 'Type of Admission', 'Patient Disposition',
       'Discharge Year', 'CCS Diagnosis Code', 'CCS Diagnosis Description',
       'CCS Procedure Code', 'CCS Procedure Description', 'APR DRG Code',
       'APR DRG Description', 'APR MDC Code', 'APR MDC Description',
       'APR Severity of Illness Code', 'APR Severity of Illness Description',
       'APR Risk of Mortality', 'APR Medical Surgical Description',
       'Payment Typology 1', 'Payment Typology 2', 'Payment Typology 3',
       'Birth Weight', 'Abortion Edit Indicator',
       'Emergency Department Indicator', 'Total Charges', 'Total Costs'],
      dtype='object')

## Potential Variables Of Interest
- Type of Admission (categorical)
- Total Charges (continuous)
- Gender (categorical)
- Race (categorical)

In [115]:
sparcs['Total Charges'].describe()

count    2.358300e+04
mean     4.344052e+04
std      8.434949e+04
min      1.000000e+00
25%      1.226175e+04
50%      2.375403e+04
75%      4.702837e+04
max      4.410671e+06
Name: Total Charges, dtype: float64

In [120]:

sparcs['Gender'].describe()

count     23583
unique        3
top           F
freq      13187
Name: Gender, dtype: object

In [118]:
sparcs['Type of Admission'].value_counts()

Emergency        14968
Elective          4508
Newborn           2285
Urgent            1743
Trauma              63
Not Available       16
Name: Type of Admission, dtype: int64

## Create Some Simplified Binary Versions

In [123]:
sparcs['Race'] = pd.to_numeric(sparcs['Race'], errors='coerce')
sparcs['sparcs_race'] = sparcs['Race'].apply(lambda x: 'long' if x > 3 else 'short')
sparcs.drop('Race', axis=1, inplace=True)
sparcs['sparcs_race'].value_counts()

short    23583
Name: sparcs_race, dtype: int64

# MLJar Examples
## Binary Classifier Example 1 - SPARCS
### **Create new model**

In [17]:
x = sparcs.drop(columns=['Race'])

In [74]:
y = sparcs["sparcs_race"]

In [19]:
x

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Ethnicity,Type of Admission,...,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs,sparcs_los
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,Not Span/Hispanic,Urgent,...,Medical,Private Health Insurance,,,0,N,Y,4757.01,4747.83,short
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,Not Span/Hispanic,Urgent,...,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,5090.25,2985.64,short
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,Not Span/Hispanic,Newborn,...,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,4948.50,2129.67,short
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,Not Span/Hispanic,Emergency,...,Medical,Medicare,Medicare,Self-Pay,0,N,Y,4719.75,8454.41,short
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,Not Span/Hispanic,Emergency,...,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,50384.75,34565.03,long
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,Not Span/Hispanic,Emergency,...,Medical,Medicare,Private Health Insurance,,0,N,Y,50833.00,8961.40,long
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Spanish/Hispanic,Newborn,...,Medical,Private Health Insurance,,,3200,N,N,10948.00,2214.06,short
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,Not Span/Hispanic,Urgent,...,Medical,Medicaid,,,0,N,N,46421.00,11083.24,long
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,Not Span/Hispanic,Emergency,...,Medical,Medicare,Medicare,,0,N,Y,46122.00,7951.26,long


In [75]:
y

0        short
1        short
2        short
3        short
4        short
         ...  
23578    short
23579    short
23580    short
23581    short
23582    short
Name: sparcs_race, Length: 23583, dtype: object

In [76]:
x_train, x_test, y_train, y_test = train_test_split(x, y, stratify=y, test_size=0.25)

In [77]:
x_test

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Ethnicity,Type of Admission,...,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs,sparcs_los
14664,New York City,Manhattan,7002012.0,1447.0,Hospital for Special Surgery,50 to 69,OOS,M,Not Span/Hispanic,Elective,...,Surgical,Blue Cross/Blue Shield,Self-Pay,,0,N,N,65781.96,23760.02,short
2586,Finger Lakes,Monroe,2701005.0,413.0,Strong Memorial Hospital,0 to 17,145,F,Spanish/Hispanic,Emergency,...,Medical,Medicaid,Medicaid,Self-Pay,0,N,Y,15239.22,8454.02,short
23561,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,Not Span/Hispanic,Emergency,...,Medical,Medicaid,,,0,N,Y,17524.00,2890.73,short
6343,Capital/Adirond,Saratoga,4501000.0,818.0,Saratoga Hospital,50 to 69,128,M,Not Span/Hispanic,Emergency,...,Medical,Private Health Insurance,Unknown,Unknown,0,N,Y,9668.24,2829.45,short
23577,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,Not Span/Hispanic,Emergency,...,Surgical,Medicare,Blue Cross/Blue Shield,,0,N,Y,65016.23,13235.35,short
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
132,Western NY,Chautauqua,602001.0,103.0,Woman's Christian Association,0 to 17,147,M,Not Span/Hispanic,Newborn,...,Medical,Medicaid,,,3100,N,N,2320.86,1930.69,short
4872,Central NY,Onondaga,3301008.0,636.0,Crouse Hospital,70 or Older,132,M,Not Span/Hispanic,Emergency,...,Medical,Medicare,Blue Cross/Blue Shield,Federal/State/Local/VA,0,N,Y,18892.51,7817.18,short
7600,Hudson Valley,Putnam,3950000.0,752.0,Putnam Hospital Center,70 or Older,105,M,Not Span/Hispanic,Emergency,...,Medical,Private Health Insurance,Medicare,,0,N,Y,21982.35,6394.49,short
2403,Finger Lakes,Monroe,2701005.0,413.0,Strong Memorial Hospital,30 to 49,144,M,Not Span/Hispanic,Urgent,...,Medical,Medicare,Medicaid,Self-Pay,0,N,N,7216.07,4353.27,short


In [78]:

automl = AutoML(results_path="Race", mode="Explain")

In [84]:
automl.fit(X_train, y_train)

This model has already been fitted. You can use predict methods or select a new 'results_path' for a new a 'fit()'.


In [85]:
pred = automl.predict(x_test)
pred

array(['White', 'Other Race', 'White', ..., 'White', 'White',
       'Other Race'], dtype=object)

In [86]:
automl.report()

Best model,name,model_type,metric_type,metric_value,train_time
,1_Baseline,Baseline,logloss,1.0208,1.71
,2_DecisionTree,Decision Tree,logloss,0.824643,18.64
the best,3_Default_Xgboost,Xgboost,logloss,0.631893,43.45
,4_Default_NeuralNetwork,Neural Network,logloss,0.795246,9.71
,5_Default_RandomForest,Random Forest,logloss,0.768391,46.64
,Ensemble,Ensemble,logloss,0.631893,0.53

Model,Weight
3_Default_Xgboost,1

Unnamed: 0,Black/African American,Multi-racial,Other Race,White,accuracy,macro avg,weighted avg,logloss
precision,0.655172,0.357143,0.688172,0.784594,0.745138,0.62127,0.733477,0.631893
recall,0.476134,0.111111,0.627451,0.893609,0.745138,0.527076,0.745138,0.631893
f1-score,0.551486,0.169492,0.65641,0.835561,0.745138,0.553237,0.733625,0.631893
support,838.0,45.0,1020.0,2519.0,0.745138,4422.0,4422.0,0.631893

Unnamed: 0,Predicted as Black/African American,Predicted as Multi-racial,Predicted as Other Race,Predicted as White
Labeled as Black/African American,399,1,112,326
Labeled as Multi-racial,0,5,23,17
Labeled as Other Race,101,4,640,275
Labeled as White,109,4,155,2251

Unnamed: 0,Black/African American,Multi-racial,Other Race,White,accuracy,macro avg,weighted avg,logloss
precision,0.581602,0,0.580466,0.702889,0.668702,0.466239,0.644513,0.795246
recall,0.23389,0,0.512745,0.888448,0.668702,0.408771,0.668702,0.795246
f1-score,0.333617,0,0.544508,0.78485,0.668702,0.415744,0.635913,0.795246
support,838.0,45,1020.0,2519.0,0.668702,4422.0,4422.0,0.795246

Unnamed: 0,Predicted as Black/African American,Predicted as Multi-racial,Predicted as Other Race,Predicted as White
Labeled as Black/African American,196,0,162,480
Labeled as Multi-racial,1,0,14,30
Labeled as Other Race,61,0,523,436
Labeled as White,79,0,202,2238

Unnamed: 0,Black/African American,Multi-racial,Other Race,White,accuracy,macro avg,weighted avg,logloss
precision,0,0,0,0.569652,0.569652,0.142413,0.324503,1.0208
recall,0,0,0,1.0,0.569652,0.25,0.569652,1.0208
f1-score,0,0,0,0.725832,0.569652,0.181458,0.413471,1.0208
support,838,45,1020,2519.0,0.569652,4422.0,4422.0,1.0208

Unnamed: 0,Predicted as Black/African American,Predicted as Multi-racial,Predicted as Other Race,Predicted as White
Labeled as Black/African American,0,0,0,838
Labeled as Multi-racial,0,0,0,45
Labeled as Other Race,0,0,0,1020
Labeled as White,0,0,0,2519

Unnamed: 0,Black/African American,Multi-racial,Other Race,White,accuracy,macro avg,weighted avg,logloss
precision,0.577348,0,0.661743,0.684008,0.671868,0.480775,0.651699,0.768391
recall,0.249403,0,0.439216,0.918618,0.671868,0.401809,0.671868,0.768391
f1-score,0.348333,0,0.527991,0.784141,0.671868,0.415116,0.634488,0.768391
support,838.0,45,1020.0,2519.0,0.671868,4422.0,4422.0,0.768391

Unnamed: 0,Predicted as Black/African American,Predicted as Multi-racial,Predicted as Other Race,Predicted as White
Labeled as Black/African American,209,0,76,553
Labeled as Multi-racial,0,0,11,34
Labeled as Other Race,90,0,448,482
Labeled as White,63,0,142,2314

Unnamed: 0,Black/African American,Multi-racial,Other Race,White,accuracy,macro avg,weighted avg,logloss
precision,0.492386,0,0.649852,0.698765,0.663727,0.460251,0.641261,0.824643
recall,0.347255,0,0.429412,0.875744,0.663727,0.413103,0.663727,0.824643
f1-score,0.407278,0,0.517119,0.777308,0.663727,0.425426,0.639258,0.824643
support,838.0,45,1020.0,2519.0,0.663727,4422.0,4422.0,0.824643

Unnamed: 0,Predicted as Black/African American,Predicted as Multi-racial,Predicted as Other Race,Predicted as White
Labeled as Black/African American,291,0,76,471
Labeled as Multi-racial,0,0,11,34
Labeled as Other Race,136,0,438,446
Labeled as White,164,0,149,2206

Unnamed: 0,Black/African American,Multi-racial,Other Race,White,accuracy,macro avg,weighted avg,logloss
precision,0.655172,0.357143,0.688172,0.784594,0.745138,0.62127,0.733477,0.631893
recall,0.476134,0.111111,0.627451,0.893609,0.745138,0.527076,0.745138,0.631893
f1-score,0.551486,0.169492,0.65641,0.835561,0.745138,0.553237,0.733625,0.631893
support,838.0,45.0,1020.0,2519.0,0.745138,4422.0,4422.0,0.631893

Unnamed: 0,Predicted as Black/African American,Predicted as Multi-racial,Predicted as Other Race,Predicted as White
Labeled as Black/African American,399,1,112,326
Labeled as Multi-racial,0,5,23,17
Labeled as Other Race,101,4,640,275
Labeled as White,109,4,155,2251


### **Test New (not really) Data**

In [32]:
# load in the data model 

automl_sparcs_race = AutoML(results_path="Race")

In [34]:
# create a new dataset that follows the same data structure as the training set
X_withrace = sparcs.sample(25)
X_withoutrace = X_withrace.drop(columns=['Race'])

In [35]:

X_withrace

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs,sparcs_los
1590,Western NY,Erie,1401014.0,3067.0,Millard Fillmore Suburban Hospital,18 to 29,141.0,F,Black/African American,Not Span/Hispanic,...,Medical,Medicaid,Self-Pay,,0,N,N,5106.31,4125.25,short
2535,Finger Lakes,Monroe,2701005.0,413.0,Strong Memorial Hospital,30 to 49,146.0,F,White,Not Span/Hispanic,...,Medical,Blue Cross/Blue Shield,Self-Pay,,0,N,N,4381.04,2923.71,short
19712,New York City,Richmond,7004003.0,1740.0,Staten Island University Hosp-North,70 or Older,103.0,F,White,Spanish/Hispanic,...,Medical,Medicare,Medicare,Medicaid,0,N,Y,52853.32,12163.04,long
17090,New York City,Manhattan,7002054.0,1464.0,New York Presbyterian Hospital - Columbia Pres...,30 to 49,119.0,M,Other Race,Not Span/Hispanic,...,Surgical,Medicare,Self-Pay,,0,N,N,40940.2,18591.88,short
6391,Capital/Adirond,Saratoga,4501000.0,818.0,Saratoga Hospital,70 or Older,120.0,M,White,Not Span/Hispanic,...,Medical,Medicare,Private Health Insurance,Unknown,0,N,Y,25129.31,6297.68,short
1089,Western NY,Erie,1401013.0,218.0,Sisters of Charity Hospital,50 to 69,142.0,M,White,Not Span/Hispanic,...,Surgical,Blue Cross/Blue Shield,,,0,N,N,21086.16,9848.17,short
17922,New York City,Queens,7003001.0,1628.0,Flushing Hospital Medical Center,18 to 29,114.0,F,Other Race,Not Span/Hispanic,...,Medical,Medicaid,,,0,N,Y,15428.0,5644.05,short
223,Western NY,Erie,1401014.0,207.0,Buffalo General Medical Center,50 to 69,142.0,M,Black/African American,Not Span/Hispanic,...,Medical,Medicare,Medicaid,Self-Pay,0,N,Y,15713.96,7624.69,long
4212,Central NY,Onondaga,3301003.0,630.0,St Josephs Hospital Health Center,70 or Older,130.0,M,White,Not Span/Hispanic,...,Surgical,Medicare,Private Health Insurance,,0,N,Y,126157.67,45431.16,long
10040,New York City,Bronx,7000006.0,1169.0,Montefiore Medical Center - Henry & Lucy Moses...,30 to 49,100.0,M,Black/African American,Not Span/Hispanic,...,Surgical,Self-Pay,Self-Pay,,0,N,N,50740.74,14377.37,short


In [36]:
X_withoutrace

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Ethnicity,Type of Admission,...,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs,sparcs_los
1590,Western NY,Erie,1401014.0,3067.0,Millard Fillmore Suburban Hospital,18 to 29,141.0,F,Not Span/Hispanic,Elective,...,Medical,Medicaid,Self-Pay,,0,N,N,5106.31,4125.25,short
2535,Finger Lakes,Monroe,2701005.0,413.0,Strong Memorial Hospital,30 to 49,146.0,F,Not Span/Hispanic,Urgent,...,Medical,Blue Cross/Blue Shield,Self-Pay,,0,N,N,4381.04,2923.71,short
19712,New York City,Richmond,7004003.0,1740.0,Staten Island University Hosp-North,70 or Older,103.0,F,Spanish/Hispanic,Emergency,...,Medical,Medicare,Medicare,Medicaid,0,N,Y,52853.32,12163.04,long
17090,New York City,Manhattan,7002054.0,1464.0,New York Presbyterian Hospital - Columbia Pres...,30 to 49,119.0,M,Not Span/Hispanic,Urgent,...,Surgical,Medicare,Self-Pay,,0,N,N,40940.2,18591.88,short
6391,Capital/Adirond,Saratoga,4501000.0,818.0,Saratoga Hospital,70 or Older,120.0,M,Not Span/Hispanic,Emergency,...,Medical,Medicare,Private Health Insurance,Unknown,0,N,Y,25129.31,6297.68,short
1089,Western NY,Erie,1401013.0,218.0,Sisters of Charity Hospital,50 to 69,142.0,M,Not Span/Hispanic,Elective,...,Surgical,Blue Cross/Blue Shield,,,0,N,N,21086.16,9848.17,short
17922,New York City,Queens,7003001.0,1628.0,Flushing Hospital Medical Center,18 to 29,114.0,F,Not Span/Hispanic,Emergency,...,Medical,Medicaid,,,0,N,Y,15428.0,5644.05,short
223,Western NY,Erie,1401014.0,207.0,Buffalo General Medical Center,50 to 69,142.0,M,Not Span/Hispanic,Emergency,...,Medical,Medicare,Medicaid,Self-Pay,0,N,Y,15713.96,7624.69,long
4212,Central NY,Onondaga,3301003.0,630.0,St Josephs Hospital Health Center,70 or Older,130.0,M,Not Span/Hispanic,Emergency,...,Surgical,Medicare,Private Health Insurance,,0,N,Y,126157.67,45431.16,long
10040,New York City,Bronx,7000006.0,1169.0,Montefiore Medical Center - Henry & Lucy Moses...,30 to 49,100.0,M,Not Span/Hispanic,Elective,...,Surgical,Self-Pay,Self-Pay,,0,N,N,50740.74,14377.37,short


In [37]:
predict = automl.predict(X_withoutrace)
predict

array(['White', 'White', 'Other Race', 'White', 'White', 'White',
       'Other Race', 'White', 'White', 'Other Race', 'Other Race',
       'White', 'Black/African American', 'Black/African American',
       'Other Race', 'White', 'Black/African American', 'White',
       'Black/African American', 'Other Race', 'White', 'White', 'White',
       'Other Race', 'Black/African American'], dtype=object)

In [38]:
# actual values from X_withrace
values_actual = X_withrace['Race'].values.tolist()
values_predicted = predict.tolist()
output = pd.DataFrame({'actual': values_actual, 'predicted': values_predicted})
output

Unnamed: 0,actual,predicted
0,Black/African American,White
1,White,White
2,White,Other Race
3,Other Race,White
4,White,White
5,White,White
6,Other Race,Other Race
7,Black/African American,White
8,White,White
9,Black/African American,Other Race


## **Binary Classifier Example 2 - GENERIC**

In [39]:
import pandas as pd
from supervised.automl import AutoML
import os

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

df = pd.read_csv("https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_binary_bank.csv")

X = df[df.columns[:-1]]
y = df["y"]

In [40]:
X

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed
0,56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0
1,57,services,married,high.school,unknown,no,no,telephone,may,mon,149,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0
2,37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0
3,40,admin.,married,basic.6y,no,no,no,telephone,may,mon,151,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0
4,56,services,married,high.school,no,no,yes,telephone,may,mon,307,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41183,73,retired,married,professional.course,no,yes,no,cellular,nov,fri,334,1,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6
41184,46,blue-collar,married,professional.course,no,no,no,cellular,nov,fri,383,1,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6
41185,56,retired,married,university.degree,no,yes,no,cellular,nov,fri,189,2,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6
41186,44,technician,married,professional.course,no,no,no,cellular,nov,fri,442,1,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6


In [41]:
y

0         no
1         no
2         no
3         no
4         no
        ... 
41183    yes
41184     no
41185     no
41186    yes
41187     no
Name: y, Length: 41188, dtype: object

In [42]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

In [43]:
automl = AutoML(
    # results_path="AutoML_22",
    # total_time_limit=30 * 60,
    # start_random_models=10,
    # hill_climbing_steps=3,
    # top_models_to_improve=3,
    # train_ensemble=True,
    mode="Explain"
)

automl.fit(X_train, y_train)

Linear algorithm was disabled.
AutoML directory: AutoML_1
The task is binary_classification with evaluation metric logloss
AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 2 models
1_Baseline logloss 0.352021 trained in 1.19 seconds
2_DecisionTree logloss 0.209987 trained in 19.98 seconds
* Step default_algorithms will try to check up to 3 models
3_Default_Xgboost logloss 0.173248 trained in 15.62 seconds
4_Default_NeuralNetwork logloss 0.197463 trained in 18.59 seconds
5_Default_RandomForest logloss 0.191215 trained in 16.22 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.171901 trained in 2.18 seconds
AutoML fit time: 85.88 seconds
AutoML best model: Ensemble


AutoML()

In [44]:
pred = automl.predict(X_test)
pred
# print("Test accuracy", accuracy_score(y_test, pred["label"]))

array(['no', 'no', 'no', ..., 'no', 'no', 'no'], dtype=object)

In [45]:
automl.report()

Best model,name,model_type,metric_type,metric_value,train_time
,1_Baseline,Baseline,logloss,0.352021,2.26
,2_DecisionTree,Decision Tree,logloss,0.209987,21.48
,3_Default_Xgboost,Xgboost,logloss,0.173248,17.15
,4_Default_NeuralNetwork,Neural Network,logloss,0.197463,19.86
,5_Default_RandomForest,Random Forest,logloss,0.191215,17.8
the best,Ensemble,Ensemble,logloss,0.171901,2.18

Model,Weight
3_Default_Xgboost,3
4_Default_NeuralNetwork,1

Unnamed: 0,score,threshold
logloss,0.171901,
auc,0.950163,
f1,0.665366,0.299938
accuracy,0.919332,0.449205
precision,0.662338,0.536245
recall,1.0,0.000219012
mcc,0.625009,0.299938

Unnamed: 0,score,threshold
logloss,0.171901,
auc,0.950163,
f1,0.638841,0.449205
accuracy,0.919332,0.449205
precision,0.644444,0.449205
recall,0.633333,0.449205
mcc,0.593468,0.449205

Unnamed: 0,Predicted as no,Predicted as yes
Labeled as no,6549,304
Labeled as yes,319,551

Unnamed: 0,score,threshold
logloss,0.197463,
auc,0.942946,
f1,0.661864,0.267054
accuracy,0.9148,0.49136
precision,0.697581,0.521716
recall,1.0,1.64415e-22
mcc,0.622039,0.267054

Unnamed: 0,score,threshold
logloss,0.197463,
auc,0.942946,
f1,0.626984,0.49136
accuracy,0.9148,0.49136
precision,0.618568,0.49136
recall,0.635632,0.49136
mcc,0.57897,0.49136

Unnamed: 0,Predicted as no,Predicted as yes
Labeled as no,6512,341
Labeled as yes,317,553

Unnamed: 0,score,threshold
logloss,0.352021,
auc,0.5,
f1,0.20249,0.10139
accuracy,0.112651,0.10139
precision,0.112651,0.10139
recall,1.0,0.10139
mcc,0.0,0.10139

Unnamed: 0,score,threshold
logloss,0.352021,
auc,0.5,
f1,0.20249,0.10139
accuracy,0.112651,0.10139
precision,0.112651,0.10139
recall,1.0,0.10139
mcc,0.0,0.10139

Unnamed: 0,Predicted as no,Predicted as yes
Labeled as no,0,6853
Labeled as yes,0,870

Unnamed: 0,score,threshold
logloss,0.191215,
auc,0.931324,
f1,0.628462,0.285839
accuracy,0.912987,0.420024
precision,0.654459,0.541212
recall,1.0,0.0104041
mcc,0.589513,0.198775

Unnamed: 0,score,threshold
logloss,0.191215,
auc,0.931324,
f1,0.612903,0.420024
accuracy,0.912987,0.420024
precision,0.614319,0.420024
recall,0.611494,0.420024
mcc,0.56389,0.420024

Unnamed: 0,Predicted as no,Predicted as yes
Labeled as no,6519,334
Labeled as yes,338,532

Unnamed: 0,score,threshold
logloss,0.209987,
auc,0.900315,
f1,0.605839,0.260852
accuracy,0.905995,0.413197
precision,0.655093,0.506173
recall,1.0,0.0139786
mcc,0.560038,0.260852

Unnamed: 0,score,threshold
logloss,0.209987,
auc,0.900315,
f1,0.583716,0.413197
accuracy,0.905995,0.413197
precision,0.58238,0.413197
recall,0.585057,0.413197
mcc,0.530733,0.413197

Unnamed: 0,Predicted as no,Predicted as yes
Labeled as no,6488,365
Labeled as yes,361,509

Unnamed: 0,score,threshold
logloss,0.173248,
auc,0.949644,
f1,0.662389,0.312964
accuracy,0.91739,0.424204
precision,0.704403,0.595047
recall,1.0,0.000292015
mcc,0.622322,0.277397

Unnamed: 0,score,threshold
logloss,0.173248,
auc,0.949644,
f1,0.638322,0.424204
accuracy,0.91739,0.424204
precision,0.629754,0.424204
recall,0.647126,0.424204
mcc,0.591771,0.424204

Unnamed: 0,Predicted as no,Predicted as yes
Labeled as no,6522,331
Labeled as yes,307,563


## **Regression - Example - GENERIC**

In [94]:
import numpy as np
import pandas as pd
from supervised.automl import AutoML

df = pd.read_csv("https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_sparcs.csv")
x_cols = [c for c in df.columns if c != "Total Charges"]
X = df[x_cols]
y = df["Total Charges"]

In [96]:
df

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,,,0,N,Y,4757.01,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,5090.25,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,4948.50,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,4719.75,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,50384.75,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,50833.00,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Medical,Private Health Insurance,,,3200,N,N,10948.00,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Minor,Medical,Medicaid,,,0,N,N,46421.00,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,Medicare,,0,N,Y,46122.00,7951.26


In [97]:
x_cols

['Health Service Area',
 'Hospital County',
 'Operating Certificate Number',
 'Facility Id',
 'Facility Name',
 'Age Group',
 'Zip Code - 3 digits',
 'Gender',
 'Race',
 'Ethnicity',
 'Length of Stay',
 'Type of Admission',
 'Patient Disposition',
 'Discharge Year',
 'CCS Diagnosis Code',
 'CCS Diagnosis Description',
 'CCS Procedure Code',
 'CCS Procedure Description',
 'APR DRG Code',
 'APR DRG Description',
 'APR MDC Code',
 'APR MDC Description',
 'APR Severity of Illness Code',
 'APR Severity of Illness Description',
 'APR Risk of Mortality',
 'APR Medical Surgical Description',
 'Payment Typology 1',
 'Payment Typology 2',
 'Payment Typology 3',
 'Birth Weight',
 'Abortion Edit Indicator',
 'Emergency Department Indicator',
 'Total Costs']

In [99]:
X

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Severity of Illness Description,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Minor,Medical,Private Health Insurance,,,0,N,Y,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Extreme,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Medical,Private Health Insurance,,,3200,N,N,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Moderate,Minor,Medical,Medicaid,,,0,N,N,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Major,Medical,Medicare,Medicare,,0,N,Y,7951.26


In [100]:
y

0         4757.01
1         5090.25
2         4948.50
3         4719.75
4        50384.75
           ...   
23578    50833.00
23579    10948.00
23580    46421.00
23581    46122.00
23582    32225.00
Name: Total Charges, Length: 23583, dtype: float64

In [142]:
automl = AutoML(results_path="sparcs_regression", mode="Explain")
automl.fit(X, y)

There are samples with missing target values in the data which will be excluded for further analysis


AutoML directory: sparcs_regression
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 3 models
1_Baseline logloss 1.098612 trained in 1.45 seconds
2_DecisionTree logloss 1.500057 trained in 6.78 seconds
3_Linear logloss 0.252079 trained in 5.42 seconds
* Step default_algorithms will try to check up to 3 models
4_Default_Xgboost logloss 0.376001 trained in 7.05 seconds
5_Default_NeuralNetwork logloss 0.404506 trained in 1.2 seconds
6_Default_RandomForest logloss 0.217595 trained in 7.43 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.216842 trained in 0.28 seconds
AutoML fit time: 39.5 seconds
AutoML best model: Ensemble


AutoML(results_path='sparcs_regression')

In [104]:
df["Predictions"] = automl.predict(X)

In [105]:
print("Predictions")
print(df[["Total Charges", "Predictions"]].head())

Predictions
   Total Charges   Predictions
0        4757.01   7439.961726
1        5090.25   7661.853523
2        4948.50   7600.419929
3        4719.75   7525.814851
4       50384.75  58561.460695


## **Multiclass Classifier - GENERIC**

In [124]:
import pandas as pd
import numpy as np
from supervised.automl import AutoML
import supervised


import warnings

from sklearn import datasets
from sklearn.pipeline import make_pipeline
from sklearn.decomposition import PCA

from supervised import AutoML
from supervised.exceptions import AutoMLException

# warnings.filterwarnings('error')
warnings.filterwarnings(
    "error", category=pd.core.common.SettingWithCopyWarning
)  # message="*ndarray*")

df = pd.read_csv("https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_classes_iris.csv")
X = df[["feature_1", "feature_2", "feature_3", "feature_4"]]
y = df["class"]

In [125]:
df

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,class
0,5.1,3.5,1.4,0.2,1.0
1,4.9,3.0,1.4,0.2,1.0
2,4.7,3.2,1.3,,1.0
3,4.6,3.1,1.5,,1.0
4,5.0,3.6,1.4,0.2,1.0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,121.0
146,6.3,2.5,5.0,1.9,121.0
147,6.5,3.0,5.2,2.0,121.0
148,6.2,3.4,5.4,2.3,121.0


In [126]:
X

Unnamed: 0,feature_1,feature_2,feature_3,feature_4
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,
3,4.6,3.1,1.5,
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [127]:
y.value_counts()

121.0    50
1.0      49
2.0      49
Name: class, dtype: int64

In [131]:
automl = AutoML()

In [132]:
automl.fit(X, y)

There are samples with missing target values in the data which will be excluded for further analysis


AutoML directory: AutoML_5
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 3 models
1_Baseline logloss 1.098612 trained in 1.25 seconds
2_DecisionTree logloss 1.500057 trained in 13.94 seconds
3_Linear logloss 0.252079 trained in 5.32 seconds
* Step default_algorithms will try to check up to 3 models
4_Default_Xgboost logloss 0.376001 trained in 7.08 seconds
5_Default_NeuralNetwork logloss 0.404506 trained in 1.09 seconds
6_Default_RandomForest logloss 0.217595 trained in 7.17 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.216842 trained in 0.29 seconds
AutoML fit time: 45.72 seconds
AutoML best model: Ensemble


AutoML()

In [133]:
predictions = automl.predict_all(X)

In [134]:
print(predictions.head())
print(predictions.tail())

print(X.shape)
print(predictions.shape)

   prediction_1.0  prediction_2.0  prediction_121.0  label
0        0.997160        0.002840      6.605212e-08    1.0
1        0.984784        0.015215      2.997777e-07    1.0
2        0.681572        0.312921      5.506837e-03    1.0
3        0.676756        0.317721      5.522294e-03    1.0
4        0.998470        0.001530      3.552659e-08    1.0
     prediction_1.0  prediction_2.0  prediction_121.0  label
145        0.000014        0.008776          0.991210  121.0
146        0.004885        0.138837          0.856278  121.0
147        0.000091        0.023626          0.976283  121.0
148        0.000142        0.007597          0.992261  121.0
149        0.000930        0.053527          0.945544  121.0
(150, 4)
(150, 4)


## **Download Outputs**

### **Binary Classifications Download**

In [135]:
# get current working directory
import os
os.getcwd()

'/content'

In [136]:
folders = os.listdir()
foldersML = [x for x in folders if x.startswith('Race')]
print(foldersML)

['Race.zip', 'Race']


In [137]:
!zip -r /content/Race.zip /content/Race

updating: content/Race/ (stored 0%)
updating: content/Race/leaderboard.csv (deflated 43%)
updating: content/Race/params.json (deflated 61%)
updating: content/Race/README.html (deflated 37%)
updating: content/Race/Ensemble/ (stored 0%)
updating: content/Race/Ensemble/precision_recall_curve.png (deflated 6%)
updating: content/Race/Ensemble/status.txt (stored 0%)
updating: content/Race/Ensemble/confusion_matrix_normalized.png (deflated 12%)
updating: content/Race/Ensemble/ensemble.json (deflated 44%)
updating: content/Race/Ensemble/learning_curves.png (deflated 10%)
updating: content/Race/Ensemble/predictions_ensemble.csv (deflated 58%)
updating: content/Race/Ensemble/README.md (deflated 72%)
updating: content/Race/Ensemble/roc_curve.png (deflated 7%)
updating: content/Race/Ensemble/confusion_matrix.png (deflated 12%)
updating: content/Race/4_Default_NeuralNetwork/ (stored 0%)
updating: content/Race/4_Default_NeuralNetwork/learner_fold_0.neural_network (deflated 0%)
updating: content/Race

##**Regression Download**

In [139]:
os.getcwd()

'/content'

In [143]:
folders = os.listdir()
foldersML = [x for x in folders if x.startswith('sparcs_regression')]
print(foldersML)

['sparcs_regression']


In [145]:
!zip -r /content/Regression.zip /content/sparcs_regression

  adding: content/sparcs_regression/ (stored 0%)
  adding: content/sparcs_regression/leaderboard.csv (deflated 42%)
  adding: content/sparcs_regression/params.json (deflated 62%)
  adding: content/sparcs_regression/Ensemble/ (stored 0%)
  adding: content/sparcs_regression/Ensemble/precision_recall_curve.png (deflated 7%)
  adding: content/sparcs_regression/Ensemble/status.txt (stored 0%)
  adding: content/sparcs_regression/Ensemble/confusion_matrix_normalized.png (deflated 16%)
  adding: content/sparcs_regression/Ensemble/ensemble.json (deflated 50%)
  adding: content/sparcs_regression/Ensemble/learning_curves.png (deflated 15%)
  adding: content/sparcs_regression/Ensemble/predictions_ensemble.csv (deflated 51%)
  adding: content/sparcs_regression/Ensemble/README.md (deflated 67%)
  adding: content/sparcs_regression/Ensemble/roc_curve.png (deflated 9%)
  adding: content/sparcs_regression/Ensemble/confusion_matrix.png (deflated 20%)
  adding: content/sparcs_regression/5_Default_NeuralNe