<a href="https://colab.research.google.com/github/Chinchilla7/AutoML-examples/blob/main/autoML/autoML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Pacakges

In [1]:
!pip install tpot mljar-supervised

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [12]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML


# Options Available

- mode — the package ships with four built-in models. 
  - The Explain mode is ideal for explaining and understanding the data. It results in visualizations of feature importance as well as tree visualizations.
  - The Perform is used when building ML models for production. 
  - The Compete is meant to build models used in machine learning competitions. 
  - The Optuna mode is used to search for highly-tuned ML models.
- algorithms — specifies the algorithms you would like to use. They are usually passed in as a list.
- results_path — the path where the results will be stored
- total_time_limit — the total time in seconds for training the model
- train_ensemble — dictates if an ensemble will be created at the end of the training process
- stack_models — determines if a models stack will be created
- eval_metric — the metric that will be optimized. If auto the logloss is used for classification problems while the rmse is used for regression problems

In [None]:
#automl = AutoML(
    # mode="Explain"
    # algorithms=""
    # results_path="AutoML_22",
    # total_time_limit=30 * 60,
    # train_ensemble=True,
    # stack_models="",
    # eval_metric=""
#)

# Healthcare Dataset - SPARCS

## Load in dataset

In [2]:
import pandas as pd
sparcs = pd.read_csv('https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_sparcs.csv')
sparcs

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,,,0,N,Y,4757.01,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,5090.25,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,4948.50,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,4719.75,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,50384.75,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,50833.00,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Medical,Private Health Insurance,,,3200,N,N,10948.00,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Minor,Medical,Medicaid,,,0,N,N,46421.00,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,Medicare,,0,N,Y,46122.00,7951.26


In [3]:
sparcs.columns

Index(['Health Service Area', 'Hospital County',
       'Operating Certificate Number', 'Facility Id', 'Facility Name',
       'Age Group', 'Zip Code - 3 digits', 'Gender', 'Race', 'Ethnicity',
       'Length of Stay', 'Type of Admission', 'Patient Disposition',
       'Discharge Year', 'CCS Diagnosis Code', 'CCS Diagnosis Description',
       'CCS Procedure Code', 'CCS Procedure Description', 'APR DRG Code',
       'APR DRG Description', 'APR MDC Code', 'APR MDC Description',
       'APR Severity of Illness Code', 'APR Severity of Illness Description',
       'APR Risk of Mortality', 'APR Medical Surgical Description',
       'Payment Typology 1', 'Payment Typology 2', 'Payment Typology 3',
       'Birth Weight', 'Abortion Edit Indicator',
       'Emergency Department Indicator', 'Total Charges', 'Total Costs'],
      dtype='object')

## Potential variables of interest

- APR Risk of Mortality (categorical) 
- Total costs (continuous) 
- Length of Stay
-Total charges (continuous)
-Type of Admission (categorical)


In [4]:
sparcs['Total Charges'].describe()

count    2.358300e+04
mean     4.344052e+04
std      8.434949e+04
min      1.000000e+00
25%      1.226175e+04
50%      2.375403e+04
75%      4.702837e+04
max      4.410671e+06
Name: Total Charges, dtype: float64

In [5]:
sparcs['Type of Admission'].value_counts()

Emergency        14968
Elective          4508
Newborn           2285
Urgent            1743
Trauma              63
Not Available       16
Name: Type of Admission, dtype: int64

In [6]:
sparcs['Gender'].value_counts()

F    13187
M    10395
U        1
Name: Gender, dtype: int64

### **Create new model**

In [7]:
X = sparcs.drop(columns=['Type of Admission'])

In [8]:
y = sparcs["Type of Admission"]

In [9]:
X

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
0,Western NY,Allegany,226700.0,37.0,Cuba Memorial Hospital Inc,30 to 49,147,M,White,Not Span/Hispanic,...,Minor,Medical,Private Health Insurance,,,0,N,Y,4757.01,4747.83
1,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,18 to 29,148,F,White,Not Span/Hispanic,...,Minor,Medical,Blue Cross/Blue Shield,Self-Pay,Self-Pay,0,N,N,5090.25,2985.64
2,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,0 to 17,147,M,White,Not Span/Hispanic,...,Minor,Medical,Self-Pay,Self-Pay,Self-Pay,2900,N,N,4948.50,2129.67
3,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,70 or Older,148,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Medicare,Self-Pay,0,N,Y,4719.75,8454.41
4,Western NY,Allegany,228000.0,39.0,Memorial Hosp of Wm F & Gertrude F Jones A/K/A...,50 to 69,148,M,White,Not Span/Hispanic,...,Major,Medical,Blue Cross/Blue Shield,Medicare,Self-Pay,0,N,Y,50384.75,34565.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23578,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,F,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Private Health Insurance,,0,N,Y,50833.00,8961.40
23579,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,0 to 17,117,F,Other Race,Spanish/Hispanic,...,Minor,Medical,Private Health Insurance,,,3200,N,N,10948.00,2214.06
23580,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,30 to 49,117,M,White,Not Span/Hispanic,...,Minor,Medical,Medicaid,,,0,N,N,46421.00,11083.24
23581,Long Island,Suffolk,5157003.0,943.0,St Catherine of Siena Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,Medicare,,0,N,Y,46122.00,7951.26


In [10]:
y

0           Urgent
1           Urgent
2          Newborn
3        Emergency
4        Emergency
           ...    
23578    Emergency
23579      Newborn
23580       Urgent
23581    Emergency
23582    Emergency
Name: Type of Admission, Length: 23583, dtype: object

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

In [16]:
X_test

Unnamed: 0,Health Service Area,Hospital County,Operating Certificate Number,Facility Id,Facility Name,Age Group,Zip Code - 3 digits,Gender,Race,Ethnicity,...,APR Risk of Mortality,APR Medical Surgical Description,Payment Typology 1,Payment Typology 2,Payment Typology 3,Birth Weight,Abortion Edit Indicator,Emergency Department Indicator,Total Charges,Total Costs
10276,New York City,Bronx,7000008.0,1172.0,Lincoln Medical & Mental Health Center,50 to 69,104,M,Black/African American,Not Span/Hispanic,...,Moderate,Medical,Medicaid,Medicaid,,0,N,Y,13551.43,7057.63
6774,Capital/Adirond,Warren,5601000.0,1005.0,Glens Falls Hospital,30 to 49,128,F,White,Not Span/Hispanic,...,Minor,Surgical,Blue Cross/Blue Shield,,,0,N,N,13646.39,5909.82
14200,New York City,Manhattan,7002002.0,1439.0,Mount Sinai Beth Israel,50 to 69,112,M,Black/African American,Spanish/Hispanic,...,Minor,Medical,Medicaid,Self-Pay,,0,N,N,8340.00,1954.91
18400,New York City,Queens,7003004.0,1630.0,Long Island Jewish Medical Center,70 or Older,113,F,White,Not Span/Hispanic,...,Moderate,Surgical,Medicare,Medicaid,Self-Pay,0,N,N,76029.38,15652.39
296,Western NY,Erie,1401014.0,207.0,Buffalo General Medical Center,70 or Older,143,M,White,Not Span/Hispanic,...,Moderate,Medical,Medicare,Blue Cross/Blue Shield,Self-Pay,0,N,Y,23254.89,8745.09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2725,Finger Lakes,Monroe,2701005.0,413.0,Strong Memorial Hospital,50 to 69,146,M,Black/African American,Not Span/Hispanic,...,Major,Medical,Medicaid,Self-Pay,,0,N,Y,21684.22,12870.19
9614,New York City,Bronx,7000006.0,1168.0,Montefiore Medical Center-Wakefield Hospital,50 to 69,104,M,Black/African American,Not Span/Hispanic,...,Minor,Medical,Medicare,Medicare,Blue Cross/Blue Shield,0,N,Y,65350.80,15120.01
22037,Long Island,Nassau,2952006.0,551.0,St. Joseph Hospital,70 or Older,117,M,White,Not Span/Hispanic,...,Major,Medical,Medicare,Blue Cross/Blue Shield,,0,N,N,20346.96,5279.29
19149,New York City,Queens,7003010.0,1637.0,NewYork-Presbyterian/Queens,70 or Older,113,F,White,Not Span/Hispanic,...,Major,Medical,Medicare,Medicaid,Self-Pay,0,N,Y,52589.75,14030.30


In [14]:
automl = AutoML(results_path="sparcs_classification", mode="Explain")

In [15]:
automl.fit(X_train, y_train)

Linear algorithm was disabled.
AutoML directory: sparcs_classification
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 2 models
1_Baseline logloss 1.047051 trained in 0.71 seconds




2_DecisionTree logloss 0.376179 trained in 26.96 seconds
* Step default_algorithms will try to check up to 3 models
3_Default_Xgboost logloss 0.221934 trained in 60.95 seconds
4_Default_NeuralNetwork logloss 0.359239 trained in 8.78 seconds
5_Default_RandomForest logloss 0.343617 trained in 31.48 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 0.221934 trained in 0.6 seconds
AutoML fit time: 139.93 seconds
AutoML best model: 3_Default_Xgboost


AutoML(results_path='sparcs_classification')

In [17]:
pred = automl.predict(X_test)
pred

array(['Emergency', 'Elective', 'Urgent', ..., 'Emergency', 'Emergency',
       'Elective'], dtype=object)

In [18]:
automl.report()

Best model,name,model_type,metric_type,metric_value,train_time
,1_Baseline,Baseline,logloss,1.04705,1.43
,2_DecisionTree,Decision Tree,logloss,0.376179,28.17
the best,3_Default_Xgboost,Xgboost,logloss,0.221934,62.22
,4_Default_NeuralNetwork,Neural Network,logloss,0.359239,9.83
,5_Default_RandomForest,Random Forest,logloss,0.343617,32.74
,Ensemble,Ensemble,logloss,0.221934,0.6

Model,Weight
3_Default_Xgboost,1

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.823982,0.959646,0.995338,0.8,1.0,0.743083,0.922694,0.887008,0.921109,0.221934
recall,0.886391,0.9658,0.997664,0.8,0.25,0.574924,0.922694,0.745796,0.922694,0.221934
f1-score,0.854048,0.962713,0.996499,0.8,0.4,0.648276,0.922694,0.776923,0.920274,0.221934
support,845.0,2807.0,428.0,5.0,12.0,327.0,0.922694,4424.0,4424.0,0.221934

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,749,54,0,1,0,41
Labeled as Emergency,71,2711,1,0,0,24
Labeled as Newborn,0,1,427,0,0,0
Labeled as Not Available,0,1,0,4,0,0
Labeled as Trauma,0,9,0,0,3,0
Labeled as Urgent,89,49,1,0,0,188

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.73431,0.900498,0.997664,0,0,0.64,0.872514,0.545412,0.85544,0.343617
recall,0.830769,0.967225,0.997664,0,0,0.0489297,0.872514,0.474098,0.872514,0.343617
f1-score,0.779567,0.932669,0.997664,0,0,0.0909091,0.872514,0.466801,0.843911,0.343617
support,845.0,2807.0,428.0,5,12,327.0,0.872514,4424.0,4424.0,0.343617

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,702,138,0,0,0,5
Labeled as Emergency,89,2715,0,0,0,3
Labeled as Newborn,0,0,427,0,0,1
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,1,11,0,0,0,0
Labeled as Urgent,164,146,1,0,0,16

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0,0.634494,0,0,0,0,0.634494,0.105749,0.402582,1.04705
recall,0,1.0,0,0,0,0,0.634494,0.166667,0.634494,1.04705
f1-score,0,0.776379,0,0,0,0,0.634494,0.129397,0.492608,1.04705
support,845,2807.0,428,5,12,327,0.634494,4424.0,4424.0,1.04705

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,0,845,0,0,0,0
Labeled as Emergency,0,2807,0,0,0,0
Labeled as Newborn,0,428,0,0,0,0
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,0,12,0,0,0,0
Labeled as Urgent,0,327,0,0,0,0

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.823982,0.959646,0.995338,0.8,1.0,0.743083,0.922694,0.887008,0.921109,0.221934
recall,0.886391,0.9658,0.997664,0.8,0.25,0.574924,0.922694,0.745796,0.922694,0.221934
f1-score,0.854048,0.962713,0.996499,0.8,0.4,0.648276,0.922694,0.776923,0.920274,0.221934
support,845.0,2807.0,428.0,5.0,12.0,327.0,0.922694,4424.0,4424.0,0.221934

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,749,54,0,1,0,41
Labeled as Emergency,71,2711,1,0,0,24
Labeled as Newborn,0,1,427,0,0,0
Labeled as Not Available,0,1,0,4,0,0
Labeled as Trauma,0,9,0,0,3,0
Labeled as Urgent,89,49,1,0,0,188

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.770492,0.935484,0.979167,0,0,0.475556,0.882233,0.526783,0.870606,0.359239
recall,0.83432,0.950481,0.988318,0,0,0.327217,0.882233,0.516723,0.882233,0.359239
f1-score,0.801136,0.942923,0.983721,0,0,0.387681,0.882233,0.519244,0.875124,0.359239
support,845.0,2807.0,428.0,5,12,327.0,0.882233,4424.0,4424.0,0.359239

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,705,68,4,0,0,68
Labeled as Emergency,91,2668,1,0,0,47
Labeled as Newborn,2,0,423,0,0,3
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,1,11,0,0,0,0
Labeled as Urgent,116,100,4,0,0,107

Unnamed: 0,Elective,Emergency,Newborn,Not Available,Trauma,Urgent,accuracy,macro avg,weighted avg,logloss
precision,0.582861,0.975762,0.993023,0,0,0.625,0.850136,0.529441,0.872711,0.376179
recall,0.973964,0.889206,0.997664,0,0,0.0458716,0.850136,0.484451,0.850136,0.376179
f1-score,0.729287,0.930475,0.995338,0,0,0.0854701,0.850136,0.456762,0.832289,0.376179
support,845.0,2807.0,428.0,5,12,327.0,0.850136,4424.0,4424.0,0.376179

Unnamed: 0,Predicted as Elective,Predicted as Emergency,Predicted as Newborn,Predicted as Not Available,Predicted as Trauma,Predicted as Urgent
Labeled as Elective,823,16,0,0,0,6
Labeled as Emergency,308,2496,1,0,0,2
Labeled as Newborn,0,0,427,0,0,1
Labeled as Not Available,0,5,0,0,0,0
Labeled as Trauma,1,11,0,0,0,0
Labeled as Urgent,280,30,2,0,0,15


## Regression 

In [19]:
import numpy as np
import pandas as pd
from supervised.automl import AutoML

df = pd.read_csv("https://raw.githubusercontent.com/hantswilliams/HHA-507-2022/main/autoML/datasets/data_regression_housing.csv")
x_cols = [c for c in df.columns if c != "AGE"]
X = df[x_cols]
y = df["AGE"]

In [20]:
df

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,,7.147,54.2,6.0622,3,222,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,69.1,2.4786,1,273,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0,0.573,6.120,76.7,2.2875,1,273,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0,0.573,6.976,91.0,2.1675,1,273,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0,0.573,6.794,89.3,2.3889,1,273,21.0,393.45,6.48,22.0


In [21]:
x_cols

['CRIM',
 'ZN',
 'INDUS',
 'CHAS',
 'NOX',
 'RM',
 'DIS',
 'RAD',
 'TAX',
 'PTRATIO',
 'B',
 'LSTAT',
 'MEDV']

In [22]:
X

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,4.0900,1,296,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,4.9671,2,242,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,,7.147,6.0622,3,222,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,2.4786,1,273,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0,0.573,6.120,2.2875,1,273,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0,0.573,6.976,2.1675,1,273,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0,0.573,6.794,2.3889,1,273,21.0,393.45,6.48,22.0


In [23]:
y

0      65.2
1      78.9
2      61.1
3      45.8
4      54.2
       ... 
501    69.1
502    76.7
503    91.0
504    89.3
505    80.8
Name: AGE, Length: 506, dtype: float64

In [24]:
automl = AutoML(results_path="sparcs_regression", mode="Explain")
automl.fit(X, y)

AutoML directory: sparcs_regression
The task is regression with evaluation metric rmse
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 3 models
1_Baseline rmse 30.026682 trained in 1.74 seconds
2_DecisionTree rmse 15.428887 trained in 4.2 seconds
3_Linear rmse 17.141279 trained in 2.73 seconds
* Step default_algorithms will try to check up to 3 models
4_Default_Xgboost rmse 13.683388 trained in 5.63 seconds
5_Default_NeuralNetwork rmse 12.830056 trained in 0.84 seconds
6_Default_RandomForest rmse 14.250642 trained in 7.33 seconds
* Step ensemble will try to check up to 1 model
Ensemble rmse 12.704008 trained in 0.3 seconds
AutoML fit time: 30.63 seconds
AutoML best model: Ensemble


AutoML(results_path='sparcs_regression')

In [25]:
df["predictions"] = automl.predict(X)


In [26]:
print("Predictions")
print(df[["AGE", "predictions"]].head())

Predictions
    AGE  predictions
0  65.2    63.169380
1  78.9    52.861020
2  61.1    41.372815
3  45.8    27.006589
4  54.2    42.278704


# Download outputs

In [27]:
# get current working directory
import os
os.getcwd()

'/content'

In [28]:
folders = os.listdir()
foldersML = [x for x in folders if x.startswith('sparcs')]
print(foldersML)

['sparcs_regression', 'sparcs_classification']


In [33]:
!zip -r /content/sparcs.zip /content/sparcs_classification

  adding: content/sparcs_classification/ (stored 0%)
  adding: content/sparcs_classification/README.md (deflated 68%)
  adding: content/sparcs_classification/params.json (deflated 62%)
  adding: content/sparcs_classification/ldb_performance.png (deflated 25%)
  adding: content/sparcs_classification/progress.json (deflated 92%)
  adding: content/sparcs_classification/split_validation_indices.npy (deflated 66%)
  adding: content/sparcs_classification/Ensemble/ (stored 0%)
  adding: content/sparcs_classification/Ensemble/README.md (deflated 75%)
  adding: content/sparcs_classification/Ensemble/roc_curve.png (deflated 9%)
  adding: content/sparcs_classification/Ensemble/status.txt (stored 0%)
  adding: content/sparcs_classification/Ensemble/learning_curves.png (deflated 10%)
  adding: content/sparcs_classification/Ensemble/predictions_ensemble.csv (deflated 60%)
  adding: content/sparcs_classification/Ensemble/confusion_matrix_normalized.png (deflated 11%)
  adding: content/sparcs_classifi

In [34]:
!zip -r /content/sparcs.zip /content/sparcs_regression


  adding: content/sparcs_regression/ (stored 0%)
  adding: content/sparcs_regression/README.md (deflated 69%)
  adding: content/sparcs_regression/params.json (deflated 63%)
  adding: content/sparcs_regression/ldb_performance.png (deflated 25%)
  adding: content/sparcs_regression/progress.json (deflated 90%)
  adding: content/sparcs_regression/split_validation_indices.npy (deflated 69%)
  adding: content/sparcs_regression/Ensemble/ (stored 0%)
  adding: content/sparcs_regression/Ensemble/README.md (deflated 51%)
  adding: content/sparcs_regression/Ensemble/status.txt (stored 0%)
  adding: content/sparcs_regression/Ensemble/learning_curves.png (deflated 12%)
  adding: content/sparcs_regression/Ensemble/predicted_vs_residuals.png (deflated 12%)
  adding: content/sparcs_regression/Ensemble/predictions_ensemble.csv (deflated 48%)
  adding: content/sparcs_regression/Ensemble/true_vs_predicted.png (deflated 11%)
  adding: content/sparcs_regression/Ensemble/ensemble.json (deflated 50%)
  addin