# AutoML

## Acknowledgements
- AutoKeras documentation - https://autokeras.com/tutorial/text_classification/

## Steps

1. Identify the dataset
   1. IMDB dataset for text classification
   2. MNIST dataset for image classification
2. Identify the AutoML libraries


## Setup

```zsh
# Create a new environment for the experiments
conda activate base
conda create -n automlenv python=3.8 anaconda
conda activate automlenv

# Get required AutoML libraries
pip3 install autokeras
```

## Data Preparation

IMDB dataset is being used for text classification activity. 

In [1]:
import os
import numpy as np
import tensorflow as tf
from sklearn.datasets import load_files

In [2]:
dataset = tf.keras.utils.get_file(
    fname="aclImdb.tar.gz",
    origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz",
    extract=True,
)

In [3]:
# set path to dataset
IMDB_DATADIR = os.path.join(os.path.dirname(dataset), "aclImdb")

classes = ["pos", "neg"]
train_data = load_files(
    os.path.join(IMDB_DATADIR, "train"), shuffle=True, categories=classes
)
test_data = load_files(
    os.path.join(IMDB_DATADIR, "test"), shuffle=False, categories=classes
)

In [4]:
x_train = np.array(train_data.data)
y_train = np.array(train_data.target)
x_test = np.array(test_data.data)
y_test = np.array(test_data.target)

In [5]:
print(x_train.shape)  # (25000,)
print(y_train.shape)  # (25000, 1)
print(x_train[0][:50])  # this film was just brilliant casting


(25000,)
(25000,)
b'Zero Day leads you to think, even re-think why two'


## AutoKeras Implementation

In [6]:
import autokeras as ak

Using TensorFlow backend


In [7]:
# Initialize the text classifier.
clf = ak.TextClassifier(
    overwrite=True, max_trials=1
)  # It only tries 1 model as a quick demo.

In [8]:
# Feed the text classifier with training data.
clf.fit(x_train, y_train, epochs=2)
# Predict with the best model.
predicted_y = clf.predict(x_test)



Trial 1 Complete [00h 00m 53s]
val_loss: 0.27512702345848083

Best val_loss So Far: 0.27512702345848083
Total elapsed time: 00h 00m 53s
Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ./text_classifier/best_model/assets


INFO:tensorflow:Assets written to: ./text_classifier/best_model/assets




In [9]:
# Evaluate the best model with testing data.
print(clf.evaluate(x_test, y_test))

[0.27151361107826233, 0.8902000188827515]
