### For standard tasks, DL already has ready-made solutions and libraries that significantly speed up the solution of the problem and allow you to quickly build a basic solution with a good result.
![](https://www.meme-arsenal.com/memes/0eb7d7ba093f7a61409127282e0d4f1b.jpg)

After googling a little and asking my colleagues, I found a lot of solutions for our task with already pre-trained models with their wrappers and ready-made peplines.
For example:

* [https://finetune.indico.io](https://finetune.indico.io)
* [https://github.com/huggingface/pytorch-transformers](https://github.com/huggingface/pytorch-transformers)
* [https://github.com/deepset-ai/FARM](https://github.com/deepset-ai/FARM)
* [https://github.com/kaushaltrivedi/fast-bert](https://github.com/kaushaltrivedi/fast-bert)
* [https://github.com/amaiya/ktrain](https://github.com/amaiya/ktrain)
* [https://github.com/ludwig-ai/ludwig/tree/master](https://github.com/ludwig-ai/ludwig/tree/master)

We'll work with the latter. Let's see what happened...

Ludwig was "born" in Uber and its chip: that everything is going on yaml configs. Those promise that you do not need to throw at all:   
> Ludwig is a toolbox built on top of TensorFlow that allows users to train and test deep learning models without the need to write code.

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!pip install ludwig==0.3.3 -q

In [None]:
!pip freeze > requirements.txt

In [None]:
from ludwig.api import LudwigModel
from ludwig.visualize import learning_curves
import yaml

In [None]:
import random
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# plt
import matplotlib.pyplot as plt
#let's increase the default size of the charts
from pylab import rcParams
rcParams['figure.figsize'] = 10, 5
#graphs in svg look clearer
%config InlineBackend.figure_format = 'svg' 
%matplotlib inline

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
!conda update -n base -c conda-forge conda

In [None]:
!conda update -y --force-reinstall pandas

# Data
#### TRAIN

In [None]:
DATA_PATH = '/kaggle/input/sf-dl-movie-genre-classification/'
PATH      = '/kaggle/working/'

In [None]:
train = pd.read_csv(DATA_PATH+'train.csv',)

In [None]:
train.head()

In [None]:
train.info()

In [None]:
train.genre.value_counts().plot(kind='bar',figsize=(12,4),fontsize=10)
plt.xticks(rotation=60)
plt.xlabel("Genres",fontsize=10)
plt.ylabel("Counts",fontsize=10)

### TEST

In [None]:
test = pd.read_csv(DATA_PATH+'test.csv',)
test.head()

# MODEL 
we build our solution based on the documentation [https://ludwig-ai.github.io/ludwig-docs/examples/](https://ludwig-ai.github.io/ludwig-docs/examples/)

There are many ready-made solutions for text classification available in Ludwig.  
We take BERT tk now this is one of the best language models: [https://habr.com/ru/post/436878/](https://habr.com/ru/post/436878/)

[And another article from Ludwig where models are compared](https://medium.com/ludwig-ai/the-complete-guide-to-sentiment-analysis-with-ludwig-part-ii-d9f3952a06c6)
<img src="http://www.aitimes.kr/news/photo/201901/13117_13465_1541.jpg" width="600">

In [None]:
config = {
    "input_features": [
        {
            "name": "text",
            "type": "text",
            'encoder': 'bert',
        }#,
#         {
#             "name": "name",
#             "type": "text",
#             'encoder': 'bert',
#         }
    ],
    "output_features": [
        {
            "name": "genre",
            "type": "category",
        }
    ],
    'training': {
        'batch_size': 32,
        'decay': True,
        'trainable': True,
        'learning_rate': 0.0001,
        'epochs': 4
    }
    
}

In [None]:
bert = LudwigModel(config, logging_level=50,)

In [None]:
%%time
print("Training Model...")
train_stats_bert, _, _ = bert.train(
    train,   
    model_name='bert',
    skip_save_processed_input=True,
    random_seed=42
    )

In [None]:
predictions, _ = bert.predict(test)

In [None]:
predictions

In [None]:
submission = pd.DataFrame({'id':range(1, len(predictions)+1),
                           'genre':predictions['genre_predictions'].values},
                          columns=['id', 'genre'])
submission.to_csv('submission.csv', index=False)
submission.head()

# Total:
### That's how with minimal code we have the best solution!

# What can be done to improve the result:
* Read the official [doc](https://ludwig-ai.github.io/ludwig-docs /)
* Add a name to the model
* Pick up other [models](https://ludwig-ai.github.io/ludwig-docs/user_guide/#bert-encoder )
* Choose a training policy