# Customized Image Classification using AutoGluon

In this tutorial, we load images and the corresponding labels into [AutoGluon](https://autogluon.mxnet.io/index.html) and use this data to obtain a neural network that can classify new images. This is different from traditional machine learning where we need to manually define the neural network and then specify the hyperparameters in the training process. Instead, with just a single call to AutoGluon’s fit function, AutoGluon automatically trains many models with different hyperparameter configurations and returns the model that achieved the highest level of accuracy.

Note: Please use **GPU** for training. CPU training will lead to an unceasing running script. 

In [1]:
! pip install -q nvidia-ml-py3==7.352.0
! pip install -q torch==1.10.1
! pip install -q torchvision==0.11.2
! pip install -q d2l==0.16.0
! pip install -q numpy==1.19.5
! pip install -q setuptools==54.1.1
! pip install -q wheel==0.36.2
! pip install -q autogluon==0.1.0
! pip install -q timm==0.4.12
! pip install -q gluoncv==0.10.3

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pathos 0.2.8 requires dill>=0.3.4, but you have dill 0.3.3 which is incompatible.
multiprocess 0.70.12.2 requires dill>=0.3.4, but you have dill 0.3.3 which is incompatible.[0m


Let's import the ImagePredictor

In [4]:
from autogluon.vision import ImagePredictor

To use AutoGluon for computer vision task training, we need to organize our data with the following structure:

    data/
    ├── train/
        ├── class1/
        ├── class2/
        ├── class3/
        ├── ...
    ├── test/
        ├── class1/
        ├── class2/
        ├── class3/
        ├── ...

Here each subfolder contains all images that belong to that category, e.g., `class1` contains all images belonging to the first class. We generally recommend at least 100 training images per class for reasonable classification performance, but this might depend on the type of images in your specific use-case.

## Download the dataset

For demonstration purposes, we use a subset of the __Shopee-IET__ dataset from [Kaggle](https://www.kaggle.com/competitions). Each image in this data depicts a clothing item and the corresponding label specifies its clothing category. Our subset of the data contains the following possible labels: BabyPants, BabyShirt, womencasualshoes, womenchiffontop.

We download the data subset and create training/test dataset folders like below. If you use this on your own dataset, just point it to your training or test folder. Example: `train_dataset = ImagePredictor.Dataset.from_folder('mydataset/train')`

In [3]:
path = 'https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip'
train_dataset, _, test_dataset = ImagePredictor.Dataset.from_folders(path)

Downloading /home/ec2-user/.gluoncv/archive/shopee-iet.zip from https://autogluon.s3.amazonaws.com/datasets/shopee-iet.zip...


100%|██████████| 40895/40895 [00:00<00:00, 42922.99KB/s]


data/
├── test/
└── train/


Let's print the training dataset.

In [4]:
print(train_dataset)

                                                 image  label
0    /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      0
1    /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      0
2    /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      0
3    /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      0
4    /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      0
..                                                 ...    ...
795  /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      3
796  /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      3
797  /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      3
798  /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      3
799  /home/ec2-user/.gluoncv/datasets/shopee-iet/da...      3

[800 rows x 2 columns]


## Use AutoGluon to Fit Models

Now, let's fit a __classifier__ using AutoGluon [predictor.fit()](https://auto.gluon.ai/stable/tutorials/image_prediction/beginner.html). Within fit, the dataset is __automatically__ split into training and validation sets. The model with the best hyperparameter configuration is selected based on its performance on the __validation set__.

In [5]:
predictor = ImagePredictor()

time_limit = 10 * 60 # how long fit() should run (in seconds)
predictor.fit(train_dataset,
              epochs=10,
              time_limit=time_limit,
              ngpus_per_trial=1
             )

INFO:gluoncv.auto.tasks.image_classification:Randomly split train_data into train[717]/validation[83] splits.
INFO:gluoncv.auto.tasks.image_classification:Starting fit without HPO
INFO:ImageClassificationEstimator:modified configs(<old> != <new>): {
INFO:ImageClassificationEstimator:root.train.batch_size 128 != 16
INFO:ImageClassificationEstimator:root.train.lr        0.1 != 0.01
INFO:ImageClassificationEstimator:root.train.epochs    10 != 15
INFO:ImageClassificationEstimator:root.train.rec_val_idx ~/.mxnet/datasets/imagenet/rec/val.idx != auto
INFO:ImageClassificationEstimator:root.train.rec_train_idx ~/.mxnet/datasets/imagenet/rec/train.idx != auto
INFO:ImageClassificationEstimator:root.train.num_training_samples 1281167 != -1
INFO:ImageClassificationEstimator:root.train.rec_val   ~/.mxnet/datasets/imagenet/rec/val.rec != auto
INFO:ImageClassificationEstimator:root.train.rec_train ~/.mxnet/datasets/imagenet/rec/train.rec != auto
INFO:ImageClassificationEstimator:root.train.data_dir  

Downloading /home/ec2-user/.mxnet/models/resnet50_v1b-0ecdba34.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet50_v1b-0ecdba34.zip...


100%|██████████| 55344/55344 [00:00<00:00, 68016.02KB/s]
INFO:ImageClassificationEstimator:Start training from [Epoch 0]


[2022-10-19 06:51:41.116 ip-172-16-58-12:13573 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None[2022-10-19 06:51:41.116 ip-172-16-58-12:13591 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None[2022-10-19 06:51:41.116 ip-172-16-58-12:13600 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None[2022-10-19 06:51:41.117 ip-172-16-58-12:13582 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None



[2022-10-19 06:51:41.248 ip-172-16-58-12:13591 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.[2022-10-19 06:51:41.248 ip-172-16-58-12:13582 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.[2022-10-19 06:51:41.248 ip-172-16-58-12:13573 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.[2022-10-19 06:51:41.248 ip-172-16-58-12:13600 INFO profiler_config_parser.py:111] 

## Model Results

Autogluon also provides the training results, which can be accessed by calling `predictor.fit_summary()`. 

In [6]:
fit_result = predictor.fit_summary()

In [7]:
fit_result

We can access certain results from this summary. For example, training and validation accuracies below.

In [2]:
print('Train acc: %.3f, val acc: %.3f' %(fit_result['train_acc'], fit_result['valid_acc']))

NameError: name 'fit_result' is not defined

The best model and optimum hyperparameters: Learning rate, batch size, epochs can be printed with this:

In [9]:
fit_result['fit_history']['best_config']

## Making Predictions

We can call the predict function to run on different images.

In [10]:
image_path = test_dataset.iloc[0]['image']
predictor.predict(image_path)

Let's get predictions on the test set.

In [12]:
pred = predictor.predict(test_dataset)
print(pred)