# LECTURE: AutoML with Neural Networks in Autokeras

**NOTE: Use conda tensorflow envoronment** 

Install autokeras firstly

In [None]:
! pip install autokeras

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


In [None]:
! pip install git+https://github.com/keras-team/keras-tuner.git

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting git+https://github.com/keras-team/keras-tuner.git
  Cloning https://github.com/keras-team/keras-tuner.git to /tmp/pip-req-build-ixwmn2_2
  Running command git clone --filter=blob:none --quiet https://github.com/keras-team/keras-tuner.git /tmp/pip-req-build-ixwmn2_2
  Resolved https://github.com/keras-team/keras-tuner.git to commit 55a5f02842071b62f644954ca16e8b0e3ffee168
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


Import necessary libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
import autokeras as ak
from sklearn.metrics import r2_score
import boto3

Create s3 object to read the data from bucket

In [None]:
s3 = boto3.resource('s3')

Load the dataset **flights.csv** from the bucket which makes this demo Regression problem.

In [None]:
dataset_name = 'flights.csv'
bucket_data_name = 'bah-data'
data_location = 's3://{}/{}'.format(bucket_data_name, dataset_name)

data = pd.read_csv(data_location)

In [None]:
data.head()

Unnamed: 0.1,Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956
3,3,Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Economy,2.25,1,5955
4,4,Vistara,UK-963,Delhi,Morning,zero,Morning,Mumbai,Economy,2.33,1,5955


In [None]:
data.shape

(300153, 12)

In [None]:
data.isnull().any()

Unnamed: 0          False
airline             False
flight              False
source_city         False
departure_time      False
stops               False
arrival_time        False
destination_city    False
class               False
duration            False
days_left           False
price               False
dtype: bool

In [None]:
data.dtypes

Unnamed: 0            int64
airline              object
flight               object
source_city          object
departure_time       object
stops                object
arrival_time         object
destination_city     object
class                object
duration            float64
days_left             int64
price                 int64
dtype: object

In [None]:
data.nunique()

Unnamed: 0          300153
airline                  6
flight                1561
source_city              6
departure_time           6
stops                    3
arrival_time             6
destination_city         6
class                    2
duration               476
days_left               49
price                12157
dtype: int64

Drop unwanted columns

In [None]:
data.drop(['Unnamed: 0'], axis=1, inplace=True)

In [None]:
y = data['price']
X = data.drop(['price'], axis=1)

creating dataset split for prediction

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Checking split

In [None]:
print('X_train:', X_train.shape)
print('y_train:', y_train.shape)
print('X_test:', X_test.shape)
print('y_test:', y_test.shape)

X_train: (240122, 10)
y_train: (240122,)
X_test: (60031, 10)
y_test: (60031,)


Set hyperparameters

In [None]:
EPOCHS = 10
MAX_TRIALS = 3
VALIDATION_SPLIT = 0.1
METRIC = 'mae'

Initialize the StructuredDataRegressor

In [None]:
regression_model = ak.StructuredDataRegressor(max_trials=MAX_TRIALS, 
                                              overwrite=True,
                                              metrics=[METRIC])

Fit the best model

In [None]:
regression_model.fit(X_train.to_numpy(), 
                     y_train.to_numpy(),
                     validation_split=VALIDATION_SPLIT,
                     epochs=EPOCHS, 
                     verbose=1)

Trial 1 Complete [00h 02m 16s]
val_loss: 36169544.0

Best val_loss So Far: 36169544.0
Total elapsed time: 00h 02m 16s
INFO:tensorflow:Oracle triggered exit
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5




INFO:tensorflow:Assets written to: ./structured_data_regressor/best_model/assets


INFO:tensorflow:Assets written to: ./structured_data_regressor/best_model/assets


<keras.callbacks.History at 0x7fe51029afb0>

Evaluate the model

In [None]:
regression_model.evaluate(X_test, y_test)
predicted = regression_model.predict(X_test).flatten() * 100000
real = y_test.to_numpy() * 100000



Sanity check of results

In [None]:
for i in range(15):
    print('Predicted price:', predicted[i].round(3))
    print('Real price:', real[i].round(0))
    print('')

Predicted price: 480583349.609
Real price: 736600000

Predicted price: 6051760156.25
Real price: 6483100000

Predicted price: 779147021.484
Real price: 619500000

Predicted price: 5679044921.875
Real price: 6016000000

Predicted price: 663460449.219
Real price: 657800000

Predicted price: 828379394.531
Real price: 455500000

Predicted price: 3467793750.0
Real price: 2383800000

Predicted price: 670370019.531
Real price: 386000000

Predicted price: 5027532812.5
Real price: 3223000000

Predicted price: 5919110156.25
Real price: 7684100000

Predicted price: 5617348828.125
Real price: 3809900000

Predicted price: 4977090234.375
Real price: 6050800000

Predicted price: 363914916.992
Real price: 247700000

Predicted price: 543122802.734
Real price: 722000000

Predicted price: 3460648437.5
Real price: 3285900000



In [None]:
r2_score(real, predicted).round(3)

0.93

Let's check model summary

First we export the model to a keras model

In [None]:
keras_model = regression_model.export_model()

Now, we ask for the model Summary:

In [None]:
keras_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 10)]              0         
                                                                 
 multi_category_encoding (Mu  (None, 10)               0         
 ltiCategoryEncoding)                                            
                                                                 
 normalization (Normalizatio  (None, 10)               21        
 n)                                                              
                                                                 
 dense (Dense)               (None, 32)                352       
                                                                 
 re_lu (ReLU)                (None, 32)                0         
                                                                 
 dense_1 (Dense)             (None, 32)                1056  