# Aim x Ludwig

<a href="https://user-images.githubusercontent.com/13848158/154338760-edfe1885-06f3-4e02-87fe-4b13a403516b.png"><img src="https://user-images.githubusercontent.com/13848158/154338760-edfe1885-06f3-4e02-87fe-4b13a403516b.png" title="source: imgur.com" /></a>

Ludwig is a toolbox built on top of TensorFlow that allows users to train and test deep learning models without the need to write code.

All you need to provide is a dataset file containing your data, a list of columns to use as inputs, and a list of columns to use as outputs, Ludwig will do the rest. Simple commands can be used to train models both locally and in a distributed way, and to use them to predict new data.

A programmatic API is also available in order to use Ludwig from your python code. A suite of visualization tools allows you to analyze models' training and test performance and to compare them.

### Why use Aim

`Aim` is an open-source, self-hosted ML experiment tracking tool. It's good at tracking lots (1000s) of training runs and it allows you to compare them with a performant and beautiful UI.

You can use not only the great Aim UI but also its SDK to query your runs' metadata programmatically. That's especially useful for automations and additional analysis on a Jupyter Notebook.

Aim's mission is to democratize AI dev tools.

### Using Aim with Ludwig

Aim allows for seamless integration into the inner backend of ludwig (training/inference pipelines) and allows you to track your experiments with in-depth granularity

Training is easy:
`ludwig train --dataset DATA_PATH --config_file CONFIG_PATH --aim`

# Installation and demo

In [12]:
# ! pip install git+http://github.com/uber/ludwig.git -qq
# ! pip install ludwig[serve] -qq
# ! pip install aim -qq 

In [13]:
!which python

/home/erik/anaconda3/envs/ludwig/bin/python


In [14]:
import tensorflow as tf
import pandas as pd
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


2022-06-03 18:19:06.152093: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 18:19:06.152409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 178.99GiB/s
2022-06-03 18:19:06.152456: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-06-03 18:19:06.152492: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2022-06-03 18:19:06.152510: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2022-06-03 18:19:06.152527: I tensorflow/stream_executor/platfor

In [15]:
from tensorflow.python.client import device_lib
def get_available_devices():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos]
print(get_available_devices()) 

['/device:CPU:0', '/device:XLA_CPU:0', '/device:XLA_GPU:0', '/device:GPU:0']


2022-06-03 18:19:07.098395: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 18:19:07.099397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 178.99GiB/s
2022-06-03 18:19:07.099532: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-06-03 18:19:07.099663: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2022-06-03 18:19:07.099774: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2022-06-03 18:19:07.099835: I tensorflow/stream_executor/platfor

### Text Classification

Text classification also known as text tagging or text categorization is the process of categorizing text into organized groups. By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content.

Unstructured text is everywhere, such as emails, chat conversations, websites, and social media but it’s hard to extract value from this data unless it’s organized in a certain way. Doing so used to be a difficult and expensive process since it required spending time and resources to manually sort the data or creating handcrafted rules that are difficult to maintain. 

Let's build a text classifier using ludwig.

### Kaggle's AGNews Dataset
AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000  news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

The articles are divided into 4 classes:
```
World
Sports
Business
Sci/Tech
```
Let's download the dataset.

Important to note that we will use only a sample of dataset to train (for showing off the tracking functional), however the training can be complete on the entire dataset as well.

In [16]:
import aim

In [17]:
id_to_label = {
   1: 'World', 2: 'Sports', 3: 'Business', 4: 'Sci/Tech'
    }

## Experiment Tracking


### Train 
This command lets you train a model from your data. You can call it with:


In [18]:
!ludwig train --dataset final_train.csv --config train_conf.yaml -g 0 \
--aim --experiment_name "Classification"


NumExpr defaulting to 8 threads.
2022-06-03 18:19:11.469307: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
ray.init() failed: Could not find any running Ray instance. Please specify the one to connect to by setting `--address` flag or `RAY_ADDRESS` environment variable.
███████████████████████
█ █ █ █  ▜█ █ █ █ █   █
█ █ █ █ █ █ █ █ █ █ ███
█ █   █ █ █ █ █ █ █ ▌ █
█ █████ █ █ █ █ █ █ █ █
█     █  ▟█     █ █   █
███████████████████████
ludwig v0.5rc2 - Train


╒════════════════════════╕
│ EXPERIMENT DESCRIPTION │
╘════════════════════════╛

╒══════════════════╤═════════════════════════════════════════════════════════════════════════════════╕
│ Experiment name  │ Classification                                                                  │
├──────────────────┼─────────────────────────────────────────────────────────────────────────────────┤
│ Model name       │ run                                                 

Building dataset: DONE
Writing preprocessed training set cache
Writing preprocessed test set cache
Writing preprocessed validation set cache
Writing train set metadata

Dataset sizes:
╒════════════╤════════╕
│ Dataset    │   Size │
╞════════════╪════════╡
│ Training   │    209 │
├────────────┼────────┤
│ Validation │     28 │
├────────────┼────────┤
│ Test       │     63 │
╘════════════╧════════╛
aim.on_train_init() called...
base config  {'input_features': [{'name': 'Title', 'type': 'text'}, {'name': 'Description', 'type': 'text'}], 'output_features': [{'name': 'ClassIndex', 'type': 'category'}], 'trainer': {'epochs': 20}}
experiment directory  /home/erik/Documents/UCPH/aim_projs/aim-ludwig-demo/results/Classification_run
experiment name  Classification
model name  run
output directory  /home/erik/Documents/UCPH/aim_projs/aim-ludwig-demo/results/Classification_run
{'name': 'run', 'dir': '/home/erik/Documents/UCPH/aim_projs/aim-ludwig-demo/results/Classification_run'}

╒═══════╕
│ MODE

Training:  12%|████▎                             | 5/40 [00:01<00:09,  3.62it/s]
Running evaluation for step: 6, epoch: 2
Evaluation train: 100%|███████████████████████████| 2/2 [00:00<00:00, 34.81it/s]
Evaluation vali : 100%|███████████████████████████| 1/1 [00:00<00:00, 95.87it/s]
Evaluation test : 100%|███████████████████████████| 1/1 [00:00<00:00, 59.33it/s]
╒══════════════╤════════╤════════════╕
│ ClassIndex   │   loss │   accuracy │
╞══════════════╪════════╪════════════╡
│ train        │ 0.4670 │     0.7416 │
├──────────────┼────────┼────────────┤
│ vali         │ 0.5670 │     0.7143 │
├──────────────┼────────┼────────────┤
│ test         │ 0.5450 │     0.7460 │
╘══════════════╧════════╧════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.4670 │
├────────────┼────────┤
│ vali       │ 0.5670 │
├────────────┼────────┤
│ test       │ 0.5450 │
╘════════════╧════════╛
Validation loss on combined improved, model saved.

END OF EPOCH
Log

Training:  32%|██████████▋                      | 13/40 [00:03<00:07,  3.72it/s]
Running evaluation for step: 14, epoch: 6
Evaluation train: 100%|███████████████████████████| 2/2 [00:00<00:00, 36.66it/s]
Evaluation vali : 100%|██████████████████████████| 1/1 [00:00<00:00, 124.55it/s]
Evaluation test : 100%|███████████████████████████| 1/1 [00:00<00:00, 62.71it/s]
╒══════════════╤════════╤════════════╕
│ ClassIndex   │   loss │   accuracy │
╞══════════════╪════════╪════════════╡
│ train        │ 0.1965 │     0.9904 │
├──────────────┼────────┼────────────┤
│ vali         │ 0.4860 │     0.6429 │
├──────────────┼────────┼────────────┤
│ test         │ 0.5091 │     0.7143 │
╘══════════════╧════════╧════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.1965 │
├────────────┼────────┤
│ vali       │ 0.4860 │
├────────────┼────────┤
│ test       │ 0.5091 │
╘════════════╧════════╛
Validation loss on combined improved, model saved.

END OF EPOCH
Lo

Training:  52%|█████████████████▎               | 21/40 [00:05<00:05,  3.55it/s]
Running evaluation for step: 22, epoch: 10
Evaluation train: 100%|███████████████████████████| 2/2 [00:00<00:00, 38.10it/s]
Evaluation vali : 100%|██████████████████████████| 1/1 [00:00<00:00, 103.80it/s]
Evaluation test : 100%|███████████████████████████| 1/1 [00:00<00:00, 60.87it/s]
╒══════════════╤════════╤════════════╕
│ ClassIndex   │   loss │   accuracy │
╞══════════════╪════════╪════════════╡
│ train        │ 0.0026 │     1.0000 │
├──────────────┼────────┼────────────┤
│ vali         │ 0.3737 │     0.8214 │
├──────────────┼────────┼────────────┤
│ test         │ 0.3858 │     0.8095 │
╘══════════════╧════════╧════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0026 │
├────────────┼────────┤
│ vali       │ 0.3737 │
├────────────┼────────┤
│ test       │ 0.3858 │
╘════════════╧════════╛
Validation loss on combined improved, model saved.

END OF EPOCH
L

Training:  72%|███████████████████████▉         | 29/40 [00:07<00:02,  4.27it/s]
Running evaluation for step: 30, epoch: 14
Evaluation train: 100%|███████████████████████████| 2/2 [00:00<00:00, 38.72it/s]
Evaluation vali : 100%|███████████████████████████| 1/1 [00:00<00:00, 81.70it/s]
Evaluation test : 100%|███████████████████████████| 1/1 [00:00<00:00, 61.66it/s]
╒══════════════╤════════╤════════════╕
│ ClassIndex   │   loss │   accuracy │
╞══════════════╪════════╪════════════╡
│ train        │ 0.0001 │     1.0000 │
├──────────────┼────────┼────────────┤
│ vali         │ 0.4413 │     0.8571 │
├──────────────┼────────┼────────────┤
│ test         │ 0.4002 │     0.8095 │
╘══════════════╧════════╧════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0001 │
├────────────┼────────┤
│ vali       │ 0.4413 │
├────────────┼────────┤
│ test       │ 0.4002 │
╘════════════╧════════╛
Last improvement of combined validation loss happened 8 step(s) ag

Training:  92%|██████████████████████████████▌  | 37/40 [00:09<00:00,  4.64it/s]
Running evaluation for step: 38, epoch: 18
Evaluation train: 100%|███████████████████████████| 2/2 [00:00<00:00, 35.88it/s]
Evaluation vali : 100%|███████████████████████████| 1/1 [00:00<00:00, 66.57it/s]
Evaluation test : 100%|███████████████████████████| 1/1 [00:00<00:00, 59.98it/s]
╒══════════════╤════════╤════════════╕
│ ClassIndex   │   loss │   accuracy │
╞══════════════╪════════╪════════════╡
│ train        │ 0.0000 │     1.0000 │
├──────────────┼────────┼────────────┤
│ vali         │ 0.5443 │     0.8571 │
├──────────────┼────────┼────────────┤
│ test         │ 0.4768 │     0.8413 │
╘══════════════╧════════╧════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0000 │
├────────────┼────────┤
│ vali       │ 0.5443 │
├────────────┼────────┤
│ test       │ 0.4768 │
╘════════════╧════════╛
Last improvement of combined validation loss happened 16 step(s) a

You get all of these detailed insights about the training process in the Aim dashboard:



In [None]:
!aim up --repo "results/Classification_run"


┌------------------------------------------------------------------------┐
                Aim UI collects anonymous usage analytics.                
                        Read how to opt-out here:                         
    https://aimstack.readthedocs.io/en/latest/community/telemetry.html    
└------------------------------------------------------------------------┘
[33mRunning Aim UI on repo `<Repo#-6691639633609373110 path=/home/erik/Documents/UCPH/aim_projs/aim-ludwig-demo/results/Classification_run/.aim read_only=None>`[0m
Open http://127.0.0.1:43800
Press Ctrl+C to exit
