# Ludwig from Uber
[Ludwig](https://eng.uber.com/introducing-ludwig/) is a "code-free" system for training and deploying simple ML models when your data is in a tabular format. They have the [fanciest Github pages](https://uber.github.io/ludwig/) I've ever seen.

They also provide a Python API, but my experience playing with it in this notebook was not great. Better to try the CLI interface they advocate.

### Titanic example

In [None]:
import yaml
import pandas as pd
import numpy as np

from ludwig.api import LudwigModel

If you use the CLI Ludwig will do your splits for you, but I'm not entirely sure how they persist the splits over calls to `ludwig train` and `ludwig predict`. 

Since we're using the Python API I think we have to do the splits ourselves.

In [None]:
titanic_df

In [None]:
titanic_df = pd.read_csv('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv')

train_idx = np.random.randint(0, len(titanic_df), np.int(0.8*len(titanic_df)))
train_df = titanic_df.iloc[train_idx]
test_df = titanic_df.iloc[~titanic_df.index.isin(train_idx)]
assert len(set(train_df.index).intersection(set(test_df.index))) is 0

Use the model definition from [the docs](https://uber.github.io/ludwig/examples/#kaggles-titanic-predicting-survivors), although I converted the feature names to lowercase to match this csv.

The model definition defines and *types* th input features,

In [None]:
model_definition = yaml.safe_load(open('./titanic-model-def.yaml'))
model_definition

In [None]:
# train a model
model = LudwigModel(model_definition)
train_stats = model.train(train_df)

# obtain predictions
predictions = model.predict(test_df)

In [None]:
predictions

In [None]:
print("Accuracy was ", sum(test_df.survived.values == predictions.survived_predictions.values) / len(predictions))

# Visualization
This training run cretated a folder called `results` which we can now interrogate. The interaction with the visualization tools seems a bit clunky via python. It's also really not clear where the different `experiment runs` come from and how they differ!

In [None]:
!pip install seaborn

In [None]:
from ludwig import visualize

In [None]:
visualize.learning_curves_cli(training_statistics=['./results/api_experiment_run_2/training_statistics.json'],
                             output_feature_name='survived')