# Introduction
The purpose of this note book was to get experience with text pre-processing, which was of major relevance for my master thesis.  
I've use this [dataset](https://www.kaggle.com/datatattle/covid-19-nlp-text-classification) and this [guid](https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing) to work my way through this notebook.

# Table of content:
1. [Define problem](#1.-Define-problem)
    1. Domain knowledge
2. [Gather data](#2.-Gather-data)
    1. Import libaries
    2. Load data
    3. Merge concategat
3. [Initial exploration of data](#3.-Initial-exploration-of-data)
4. [Pre-process data](#4.-Pre-process-data)
    1. Data clean-up
    2. Data imputation
    3. Encode categorical features
5. [Data analyze](#5.-Data-analyze)
    1. Visualization
    2. Variance
    3. Correlation
    4. Feature importance
6. [Feature selection](#6.-Feature-selection)
7. [Feature engineering](#7.-Feature-engineering)
8. [Train-val-test split](#8.-Train-val-test-split)
9. [Prepare data](#9.-Prepare-data)
    1. Transform data
    2. Feature scaling
    3. Check for imbalance
10. [Train some models](#10.-Train-some-models)
11. [Evaluate models](#11.-Evaluate-models)
12. [Tune selected models](#12.-Tune-selected-models)
13. [Production](#13.-Production)

# 1. Define problem
## 1. Domain knowledge

In [2]:
# Not relevant for the moment

# 2. Gather data
The data is from Kaggle
## 1. Import libaries

In [3]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## 2. Load data

In [19]:
path_to_data = "../data/original/"
file_names = os.listdir(path_to_data)
file_names

['Corona_NLP_test.csv', 'Corona_NLP_train.csv', 'README.md']

In [20]:
train = pd.read_csv(path_to_data+file_names[1],encoding='latin1')
test = pd.read_csv(path_to_data+file_names[0],encoding='latin1')
print("Loaded csv")

Loaded csv


# 3. Initial exploration of data

In [6]:
train.columns

Index(['UserName', 'ScreenName', 'Location', 'TweetAt', 'OriginalTweet',
       'Sentiment'],
      dtype='object')

In [7]:
train.head(20)

Unnamed: 0,UserName,ScreenName,Location,TweetAt,OriginalTweet,Sentiment
0,3799,48751,London,16-03-2020,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,Neutral
1,3800,48752,UK,16-03-2020,advice Talk to your neighbours family to excha...,Positive
2,3801,48753,Vagabonds,16-03-2020,Coronavirus Australia: Woolworths to give elde...,Positive
3,3802,48754,,16-03-2020,My food stock is not the only one which is emp...,Positive
4,3803,48755,,16-03-2020,"Me, ready to go at supermarket during the #COV...",Extremely Negative
5,3804,48756,"ÃT: 36.319708,-82.363649",16-03-2020,As news of the regionÂs first confirmed COVID...,Positive
6,3805,48757,"35.926541,-78.753267",16-03-2020,Cashier at grocery store was sharing his insig...,Positive
7,3806,48758,Austria,16-03-2020,Was at the supermarket today. Didn't buy toile...,Neutral
8,3807,48759,"Atlanta, GA USA",16-03-2020,Due to COVID-19 our retail store and classroom...,Positive
9,3808,48760,"BHAVNAGAR,GUJRAT",16-03-2020,"For corona prevention,we should stop to buy th...",Negative


In [8]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41157 entries, 0 to 41156
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   UserName       41157 non-null  int64 
 1   ScreenName     41157 non-null  int64 
 2   Location       32567 non-null  object
 3   TweetAt        41157 non-null  object
 4   OriginalTweet  41157 non-null  object
 5   Sentiment      41157 non-null  object
dtypes: int64(2), object(4)
memory usage: 1.9+ MB


In [10]:
train.describe()

Unnamed: 0,UserName,ScreenName
count,41157.0,41157.0
mean,24377.0,69329.0
std,11881.146851,11881.146851
min,3799.0,48751.0
25%,14088.0,59040.0
50%,24377.0,69329.0
75%,34666.0,79618.0
max,44955.0,89907.0


In [12]:
pd.pivot_table(train, index='OriginalTweet', columns='Sentiment', values='Location', aggfunc='count' )

Sentiment,Extremely Negative,Extremely Positive,Negative,Neutral,Positive
OriginalTweet,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Coronavirus lingers in air longer than previously thought scientists warn,,,1.0,,
amp,,,,1.0,
Hand Sanitizer Free Gift For Current Situation,,0.0,,,
Our medical frontliners are our country s first line of defense in our fight against COVID 19 As our way of showing our appreciation they are given access to the priority lanes at Robinsons Supermarket,,,,,1.0
Police officers handed out rolls of toilet paper at a supermarket on Thursday to try to calm shoppers down during the outbreak in,,,,,1.0
...,...,...,...,...,...
Â«Â Industrial real-estate operators expect the disruption of consumer supply chains caused by the coronavirus pandemic to drive a new surge in #warehousing demandÂ Â» #RealEstate #COVID2019 #logistics\r\r\nhttps://t.co/0jiC0w0yGZ,,,1.0,,
"Â«Â Well, officer, it is quite simple actually: I cycled up the hill to the supermarket to buy catÂs food, so I ticked all thÃ© boxesÂ Â». #MaVieConfinee #confinementjour2 #COVID2019 #relax ?? https://t.co/WYRIReGhen",,,0.0,,
Â» CONSUMER ALERT: Coronavirus (COVID-19): Know Your Rights | Attorney General Karl A. Racine https://t.co/5pJGz2aRNk,,,,,1.0
Ãa se peut? Why Does Covid-19 Make Some People So Sick? Ask Their DNA Consumer genomics company 23andMe wants to mine its database of millions of customers for clues to why the virus hits some https://t.co/7uPouMm12j,1.0,,,,


# 4. Pre-process data
## 1.Data clean-up
I'll only keep 'OriginalTweet' since I'm intrested text analysis.  
  
Some common text preprocessing / cleaning steps are:
- Lower casing
- Removal of Punctuations
- Removal of Stopwords
- Removal of Frequent words
- Removal of Rare words
- Stemming
- Lemmatization
- Removal of emojis
- Removal of emoticons
- Conversion of emoticons to words
- Conversion of emojis to words
- Removal of URLs
- Removal of HTML tags
- Chat words conversion
- Spelling correction

### Remove columns 

In [21]:
train = train[['OriginalTweet']]

In [22]:
train

Unnamed: 0,OriginalTweet
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...
1,advice Talk to your neighbours family to excha...
2,Coronavirus Australia: Woolworths to give elde...
3,My food stock is not the only one which is emp...
4,"Me, ready to go at supermarket during the #COV..."
...,...
41152,Airline pilots offering to stock supermarket s...
41153,Response to complaint not provided citing COVI...
41154,You know itÂs getting tough when @KameronWild...
41155,Is it wrong that the smell of hand sanitizer i...


### Lower casing

In [28]:
train["text_lower"] = train['OriginalTweet'].str.lower()
train.head()

Unnamed: 0,OriginalTweet,text_wo_punct,text_lower
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,MeNyrbie PhilGahan Chrisitv httpstcoiFz9FAn2Pa...,@menyrbie @phil_gahan @chrisitv https://t.co/i...
1,advice Talk to your neighbours family to excha...,advice Talk to your neighbours family to excha...,advice talk to your neighbours family to excha...
2,Coronavirus Australia: Woolworths to give elde...,Coronavirus Australia Woolworths to give elder...,coronavirus australia: woolworths to give elde...
3,My food stock is not the only one which is emp...,My food stock is not the only one which is emp...,my food stock is not the only one which is emp...
4,"Me, ready to go at supermarket during the #COV...",Me ready to go at supermarket during the COVID...,"me, ready to go at supermarket during the #cov..."


### Removal of Punctuations

In [29]:
import string

PUNCT_TO_REMOVE = string.punctuation
def remove_punctuation(text):
    """custom function to remove the punctuation"""
    return text.translate(str.maketrans('', '', PUNCT_TO_REMOVE))

train['text_wo_punct'] = train['text_lower'].apply(lambda text: remove_punctuation(text))
train.head()

Unnamed: 0,OriginalTweet,text_wo_punct,text_lower
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,@menyrbie @phil_gahan @chrisitv https://t.co/i...
1,advice Talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk to your neighbours family to excha...
2,Coronavirus Australia: Woolworths to give elde...,coronavirus australia woolworths to give elder...,coronavirus australia: woolworths to give elde...
3,My food stock is not the only one which is emp...,my food stock is not the only one which is emp...,my food stock is not the only one which is emp...
4,"Me, ready to go at supermarket during the #COV...",me ready to go at supermarket during the covid...,"me, ready to go at supermarket during the #cov..."


### Removal of stopwords
Stopwords are commonly occuring words in a language like 'the', 'a' and so on. They can be removed from the text most of the times, as they don't provide valuable information for downstream analysis. In cases like Part of Speech tagging, we should not remove them as provide very valuable information about the POS. [Link](https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing#Removal-of-stopwords).

In [31]:
! pip install nltk

Collecting nltk
  Downloading nltk-3.6.3-py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 4.4 MB/s eta 0:00:01
Collecting regex
  Downloading regex-2021.9.24-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (761 kB)
[K     |████████████████████████████████| 761 kB 7.4 MB/s eta 0:00:01
Installing collected packages: regex, nltk
Successfully installed nltk-3.6.3 regex-2021.9.24


In [39]:
import nltk

nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
from nltk.corpus import stopwords
# ", ".join(stopwords.words('english'))
", ".join(stopwords.words('norwegian'))

In [46]:
STOPWORDS = set(stopwords.words('english'))
def remove_stopwords(text):
    """custom function to remove the stopwords"""
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])

train['text_wo_stop'] = train['text_wo_punct'].apply(lambda text: remove_stopwords(text))
train.head()

Unnamed: 0,OriginalTweet,text_wo_punct,text_lower,text_wo_stop
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,@menyrbie @phil_gahan @chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...
1,advice Talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk neighbours family exchange phone n...
2,Coronavirus Australia: Woolworths to give elde...,coronavirus australia woolworths to give elder...,coronavirus australia: woolworths to give elde...,coronavirus australia woolworths give elderly ...
3,My food stock is not the only one which is emp...,my food stock is not the only one which is emp...,my food stock is not the only one which is emp...,food stock one empty please dont panic enough ...
4,"Me, ready to go at supermarket during the #COV...",me ready to go at supermarket during the covid...,"me, ready to go at supermarket during the #cov...",ready go supermarket covid19 outbreak im paran...


### Removal of Frequent words
I'll skip this step for now

In [51]:
from collections import Counter
cnt = Counter()
for text in train["text_wo_stop"].values:
    for word in text.split():
        cnt[word] += 1
        
cnt.most_common(30)

[('coronavirus', 17958),
 ('covid19', 16795),
 ('prices', 7882),
 ('food', 7032),
 ('supermarket', 6981),
 ('store', 6776),
 ('grocery', 6232),
 ('people', 5467),
 ('amp', 4955),
 ('consumer', 4455),
 ('19', 3698),
 ('shopping', 3587),
 ('online', 3413),
 ('covid', 3253),
 ('pandemic', 3136),
 ('get', 2867),
 ('need', 2700),
 ('us', 2610),
 ('workers', 2566),
 ('panic', 2445),
 ('like', 2362),
 ('sanitizer', 2346),
 ('time', 2269),
 ('demand', 2255),
 ('go', 2251),
 ('home', 2226),
 ('help', 2131),
 ('hand', 2058),
 ('stock', 1970),
 ('going', 1943)]

### Removal of Rare words
I'll leav thihs one as well

In [56]:
n=10
cnt.most_common()[:-n-1:-1]

[('whethe', 1),
 ('rift', 1),
 ('newused', 1),
 ('tartiicat', 1),
 ('martinsville', 1),
 ('kameronwilds', 1),
 ('rejecting', 1),
 ('httpstcocz89ua0hnp', 1),
 ('mrsilverscott', 1),
 ('httpstcov8xdxhqeyn', 1)]

In [70]:
RAREWORDS = set([w for (w, wc) in cnt.most_common() if wc <= 1])
len(RAREWORDS)

62613

In [72]:
def remove_rarewords(text):
    """custom function to remove the rare words"""
    return " ".join([word for word in str(text).split() if word not in RAREWORDS])

train['text_wo_stopfreqrare'] = train['text_wo_stop'].apply(lambda text: remove_rarewords(text))
train.head()

Unnamed: 0,OriginalTweet,text_wo_punct,text_lower,text_wo_stop,text_wo_stopfreqrare
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,@menyrbie @phil_gahan @chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,chrisitv
1,advice Talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk neighbours family exchange phone n...,advice talk neighbours family exchange phone n...
2,Coronavirus Australia: Woolworths to give elde...,coronavirus australia woolworths to give elder...,coronavirus australia: woolworths to give elde...,coronavirus australia woolworths give elderly ...,coronavirus australia woolworths give elderly ...
3,My food stock is not the only one which is emp...,my food stock is not the only one which is emp...,my food stock is not the only one which is emp...,food stock one empty please dont panic enough ...,food stock one empty please dont panic enough ...
4,"Me, ready to go at supermarket during the #COV...",me ready to go at supermarket during the covid...,"me, ready to go at supermarket during the #cov...",ready go supermarket covid19 outbreak im paran...,ready go supermarket covid19 outbreak im paran...


### Stemming
Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form.  
  
For example, if there are two words in the corpus walks and walking, then stemming will stem the suffix to make them walk.  
[Link](https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing#Stemming)

In [74]:
# Stemmer for other languages
from nltk.stem.snowball import SnowballStemmer
SnowballStemmer.languages

('arabic',
 'danish',
 'dutch',
 'english',
 'finnish',
 'french',
 'german',
 'hungarian',
 'italian',
 'norwegian',
 'porter',
 'portuguese',
 'romanian',
 'russian',
 'spanish',
 'swedish')

In [73]:
from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()
def stem_words(text):
    return " ".join([stemmer.stem(word) for word in text.split()])

train['text_stemmed'] = train['text_wo_stopfreqrare'].apply(lambda text: stem_words(text))
train.head()

Unnamed: 0,OriginalTweet,text_wo_punct,text_lower,text_wo_stop,text_wo_stopfreqrare,text_stemmed
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,@menyrbie @phil_gahan @chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,chrisitv,chrisitv
1,advice Talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk neighbours family exchange phone n...,advice talk neighbours family exchange phone n...,advic talk neighbour famili exchang phone numb...
2,Coronavirus Australia: Woolworths to give elde...,coronavirus australia woolworths to give elder...,coronavirus australia: woolworths to give elde...,coronavirus australia woolworths give elderly ...,coronavirus australia woolworths give elderly ...,coronaviru australia woolworth give elderli di...
3,My food stock is not the only one which is emp...,my food stock is not the only one which is emp...,my food stock is not the only one which is emp...,food stock one empty please dont panic enough ...,food stock one empty please dont panic enough ...,food stock one empti pleas dont panic enough f...
4,"Me, ready to go at supermarket during the #COV...",me ready to go at supermarket during the covid...,"me, ready to go at supermarket during the #cov...",ready go supermarket covid19 outbreak im paran...,ready go supermarket covid19 outbreak im paran...,readi go supermarket covid19 outbreak im paran...


## Lemmatization
Lemmatization is similar to stemming in reducing inflected words to their word stem but differs in the way that it makes sure the root word (also called as lemma) belongs to the language.  
IT's a more advanced for for stemming.  
[Link](https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing#Lemmatization)

In [78]:
import nltk
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package wordnet to /home/jovyan/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [79]:
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
wordnet_map = {"N":wordnet.NOUN, "V":wordnet.VERB, "J":wordnet.ADJ, "R":wordnet.ADV}
def lemmatize_words(text):
    pos_tagged_text = nltk.pos_tag(text.split())
    return " ".join([lemmatizer.lemmatize(word, wordnet_map.get(pos[0], wordnet.NOUN)) for word, pos in pos_tagged_text])

train['text_lemmatized'] = train['text_wo_stopfreqrare'].apply(lambda text: lemmatize_words(text))
train.head()

Unnamed: 0,OriginalTweet,text_wo_punct,text_lower,text_wo_stop,text_wo_stopfreqrare,text_stemmed,text_lemmatized
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,@menyrbie @phil_gahan @chrisitv https://t.co/i...,menyrbie philgahan chrisitv httpstcoifz9fan2pa...,chrisitv,chrisitv,chrisitv
1,advice Talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk to your neighbours family to excha...,advice talk neighbours family exchange phone n...,advice talk neighbours family exchange phone n...,advic talk neighbour famili exchang phone numb...,advice talk neighbour family exchange phone nu...
2,Coronavirus Australia: Woolworths to give elde...,coronavirus australia woolworths to give elder...,coronavirus australia: woolworths to give elde...,coronavirus australia woolworths give elderly ...,coronavirus australia woolworths give elderly ...,coronaviru australia woolworth give elderli di...,coronavirus australia woolworths give elderly ...
3,My food stock is not the only one which is emp...,my food stock is not the only one which is emp...,my food stock is not the only one which is emp...,food stock one empty please dont panic enough ...,food stock one empty please dont panic enough ...,food stock one empti pleas dont panic enough f...,food stock one empty please dont panic enough ...
4,"Me, ready to go at supermarket during the #COV...",me ready to go at supermarket during the covid...,"me, ready to go at supermarket during the #cov...",ready go supermarket covid19 outbreak im paran...,ready go supermarket covid19 outbreak im paran...,readi go supermarket covid19 outbreak im paran...,ready go supermarket covid19 outbreak im paran...


## Removal of emojis


In [None]:
def remove_emoji(string):
    emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)

## Removal of emoticons


In [None]:
def remove_emoticons(text):
    emoticon_pattern = re.compile(u'(' + u'|'.join(k for k in EMOTICONS) + u')')
    return emoticon_pattern.sub(r'', text)

## Conversion of emoticons to words

In [80]:
# Not relevant

## Conversion of emojis to words

In [81]:
# Not relevant

## Removal of URLs

In [82]:
def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    return url_pattern.sub(r'', text)

## Removal of HTML tags

In [83]:
def remove_html(text):
    html_pattern = re.compile('<.*?>')
    return html_pattern.sub(r'', text)

## Chat words conversion

In [84]:
# Not relevant for now

## Spelling correction
[Supported languages:](https://pyspellchecker.readthedocs.io/en/latest/#non-english-dictionaries)
- English - ‘en’
- Spanish - ‘es’
- French - ‘fr’
- Portuguese - ‘pt’
- German - ‘de’
- Russian - ‘ru’

In [86]:
!pip install spellchecker  

Collecting spellchecker
  Downloading spellchecker-0.4.tar.gz (3.9 MB)
[K     |████████████████████████████████| 3.9 MB 4.9 MB/s eta 0:00:01     |████████████████                | 2.0 MB 4.9 MB/s eta 0:00:01
Collecting inexactsearch
  Downloading inexactsearch-1.0.2.tar.gz (21 kB)
Collecting soundex>=1.0
  Downloading soundex-1.1.3.tar.gz (9.1 kB)
Collecting silpa_common>=0.3
  Downloading silpa_common-0.3.tar.gz (9.4 kB)
Building wheels for collected packages: spellchecker, inexactsearch, silpa-common, soundex
  Building wheel for spellchecker (setup.py) ... [?25ldone
[?25h  Created wheel for spellchecker: filename=spellchecker-0.4-py3-none-any.whl size=3966514 sha256=89ebb73884b3de8fa257a3293b70c8426082bb9d46f40ff0d8348565c55914ef
  Stored in directory: /home/jovyan/.cache/pip/wheels/8c/52/b3/8795c86fe999a3b1a7d0f1b75a197fd984e37057cb537e2977
  Building wheel for inexactsearch (setup.py) ... [?25ldone
[?25h  Created wheel for inexactsearch: filename=inexactsearch-1.0.2-py3-none-

In [88]:
from spellchecker import SpellChecker

spell = SpellChecker()
def correct_spellings(text):
    corrected_text = []
    misspelled_words = spell.unknown(text.split())
    for word in text.split():
        if word in misspelled_words:
            corrected_text.append(spell.correction(word))
        else:
            corrected_text.append(word)
    return " ".join(corrected_text)
        
text = "speling correctin"
correct_spellings(text)

ModuleNotFoundError: No module named 'indexer'

## 2. Data imoutation


## 3. Encode categorical features
...

# 5. Data analyze
...
## 1. Visualization
...

## 2. Variance
...

## 3. Correlation
...

## 4. Feature importance
...

# 6. Feature selection
...

# 7. Feature engineering
...

# 8. Train-val-test split
...

# 9. Prepare data
...
## 1. Transform data
...

## 2. Feature scaling
...

## 3. Check for imbalance
...

# 10. Train some models
...

# 11. Evaluate models
...

# 12. Tune selected models
...

# 13. Production
...