# 4. Technical notebook for Movielens engine recommendation 

In this delivery I will attempt to answer the question “What movie should I watch this evening?” by modelling MovieLens data.

# Exploratory analysis summary

Our main variables are ratings, genres, movie titles and users.

Ratings tend to be quite positive overall. Different variables other than the quality of movies could be impacting user's ratings.

The most popular movie genres are Drama, Comedy, Action, Thriller, and Romance.
The top five most watch movie titles are:

American Beauty (1999) 
Jurassic Park (1993) 
Saving Private Ryan (1998) 
Matrix, The (1999) 
Back to the Future (1985)

The demographic profile of our users is mainly males, aged 18 - 44 years old, working in a variety of occupations (e.g. students, professionals, academics and technicians) and located across US states.

# Modelling approach

We could be exploring different types of approaches for our movie recommendation as per James Lee blog post: 
https://medium.com/@james_aka_yale/the-4-recommendation-engines-that-can-predict-your-movie-tastes-bbec857b8223

Types of Recommendation Engines we could explore as per James Lee blog post are:

1. Content-Based

Benefits:
Easy to implement due to no training or optimization is involved.
No need for data on other users, thus no cold-start or sparsity problems.
Can recommend to users with unique tastes.
Can recommend new & unpopular items.
Can provide explanations for recommended items by listing content-features that caused an item to be recommended (in this case, movie genres)

Drawbacks:
Finding the appropriate features is hard.
Does not recommend items outside a user’s content profile.
Unable to exploit quality judgments of other users.
Model performance decrease when having sparse data which hinders scalability of approach.



2. Memory-Based Collaborative Filtering

Benefits:
Easy to implement due to no training or optimization is involved
Reasonable prediction quality.

Drawbacks:
It doesn't address the cold-start problem, that is when new user or new item enters the system.
It can't deal with sparse data, meaning it's hard to find users that have rated the same items.
It suffers when new users or items that don't have any ratings enter the system.
It tends to recommend popular items.
It doesn’t scale particularly well to massive datasets, especially for real-time recommendations based on user behavior similarities — which takes a lot of computations.
Ratings matrices may be overfitting to noisy representations of user tastes and preferences. 



3. Model-Based Collaborative Filtering (based on Matrix Factorization)

The goal of MF is to learn the latent preferences of users and the latent attributes of items from known ratings (learn features that describe the characteristics of ratings) to then predict the unknown ratings through the dot product of the latent features of users and items. James Less applied Dimensionality Reduction techniques to derive the tastes and preferences from the raw data and Singular Vector Decomposition (SVD)

Why reduce dimensions?
We can discover hidden correlations / features in the raw data.
We can remove redundant and noisy features that are not useful.
We can interpret and visualize the data easier.
We can also access easier data storage and processing.

Benefits:
Widely used for recommender systems 
It deals better with scalability and sparsity than Memory-based CF 
   
Drawbacks:
Singular Vector Decomposition (SVD) is an outdates methodology and would be better using newer factorisation methods.
Some of those could be PCA or Non-Negative Matrix Factorisation because they build on SVD. 


4. Deep Learning / Neural Network

Similar to that of Model-Based Matrix Factorization. The sparse matrix doesn't need to be orthogonal. 
We want our model to learn the values of embedding matrix itself. The user latent features and movie latent features are looked up from the embedding matrices for specific movie-user combination. These are the input values for further linear and non-linear layers. We can pass this input to multiple relu, linear or sigmoid layers and learn the corresponding weights by any optimization algorithm 

Benefits:
This model performed better than all the approaches James Lee attempted before. 

Drawbacks:
High computer performance needed
Important investment of time needed in tunning the model
Highly complex model with difficulty for troubleshooting 
Last lesson of this course which means no practical knowledge of methodology as yet


After researching on what's the best approach for our model considering the project and data set constrains I decided to implement a recommendation engine that doesn't consider users ratings because the variable is not always available and I would like the model to be as replicable as possible in other scenarios. My aim is to make meaninful movie recommendations in this case with the least user information needed so we can reuse our model in other contexts (e.g. recommend unrated events on a different website)

For this, I will attempt to use the Latent Dirichlet Allocation (LDA) modelling method.

In natural language processing, latent Dirichlet allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA is an example of a topic model. Source: https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

My goal is grouping the movie titles watched into themes so recommendations are generated based on top movies within a specific theme that the user hasn't watched yet.

Therefore, I expect my model to be similar to James Lee's Model-Based Collaborative Filtering but using a more sophisticated method as LDA is meant to be. I am also inpiring my project on the New York Times recommendation engine blog post located here:
https://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-times-recommendation-engine/?_r=0

Benefits:
Widely used for recommender systems. LDA is central to topic modeling and has really revolutionized the field
LDA tends to perform well on small datasets because Bayesian methods can avoid overfitting the data
It deals better with scalability and sparsity than Memory-based CF 
LDA is a probabilistic model with interpretable topics
LDA gives you categories for free, in any data set

Drawbacks:
It's hard to know when LDA is working becasue themes are soft-clusters so there is no objective metric to say "this is the best choice" of hyperparameters
Fixed K (the number of themes is fixed and must be known ahead of time)
Uncorrelated topics (Dirichlet theme distribution cannot capture correlations)
Non-hierarchical (in data-limited regimes hierarchical models allow sharing of data)
Static (no evolution of themes over time)
Bag of words (assumes words are exchangeable, sentence structure is not modeled)
Unsupervised (sometimes weak supervision is desirable, e.g. in sentiment analysis)
The accuracy of statistical inference (which is the base of LDA) depends on the number of observations.

I will be using the "Single Variable Strategy" in my approach meaning that I will start with the most important variable and slowly add in while paying attention to the model's performance if scope allows for it. My variables of choice are Movie titles users. 

# Data transformations needed pre modelling

In [2]:
#imports
from __future__ import division
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pylab as pl
import numpy as np
import seaborn as sns
%matplotlib inline
import gensim
import spacy
nlp_toolkit = spacy.load("en")

## Load spacy
import spacy

# Setting up spacy
nlp = spacy.load('en')

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction import text 

#Regular expressions
import regex as re

# Gensim is used for LDA (and other models)
from gensim.models.ldamulticore import LdaModel
from gensim.matutils import Sparse2Corpus

# Setting up spacy
nlp = spacy.load('en')



In [3]:
# Bringing in movie frequency data (unpickling my moviewatchedrank dataframe)
unpickled_moviewatchedrank = pd.read_pickle('moviefreq.pkl')

#Reassigning
movie_freq = unpickled_moviewatchedrank
movie_freq.head()



Unnamed: 0,MovieID,Users_count
2651,2858,3428
466,480,2672
1848,2028,2653
2374,2571,2590
1178,1270,2583


In [4]:
# Bringing in rest of data (unpickling my dfc dataframe)
unpickled_dfc = pd.read_pickle('dfc.pkl')

#Reassigning
dfc = unpickled_dfc
dfc.head()

Unnamed: 0,UserID,MovieID,Rating,Timestamp,Title,Genres,Gender,Age,Occupation,Zip-code
0,1,1193,5,978300760,One Flew Over the Cuckoo's Nest (1975),Drama,F,1,10,48067
1,1,661,3,978302109,James and the Giant Peach (1996),Animation|Children's|Musical,F,1,10,48067
2,1,914,3,978301968,My Fair Lady (1964),Musical|Romance,F,1,10,48067
3,1,3408,4,978300275,Erin Brockovich (2000),Drama,F,1,10,48067
4,1,2355,5,978824291,"Bug's Life, A (1998)",Animation|Children's|Comedy,F,1,10,48067


In [5]:
#Swapping MovieIDs (movie_freq) for Movie titles (dfc)
movie_freq1 = movie_freq.join(dfc.Title, on='MovieID',how='inner', lsuffix='movie_freq', rsuffix='', sort=False)
movie_freq1.head()

#Re ordering my columns and sorting by users count descending
titles_freq = movie_freq1.reindex (columns=['MovieID','Title', 'Users_count'])
titles_freqc = titles_freq.sort_values('Users_count', ascending=False).head(20)
titles_freqc.head()

Unnamed: 0,MovieID,Title,Users_count
2651,2858,"Misérables, Les (1998)",3428
466,480,Superman (1978),2672
1848,2028,Babes in Toyland (1961),2653
2374,2571,"Brady Bunch Movie, The (1995)",2590
1178,1270,Four Weddings and a Funeral (1994),2583


# Latent varible modelling with LDA

# Pending

_Latent variable models_ are different in that instead of attempting to recreate rules of language, we'll try to understand language based on **how** the words are used. For example, we won't attempt to learn that 'bad' and 'badly' are related because they share the same root, but instead we'll determine that they are related because they are often used in the same way often or near the same words.

We'll use _unsupervised_ learning techniques (discovering patterns or structure) to extract the information.

Rather than inferring that 'Python' and 'C++' are both programming languages because they are often a noun preceded by the verb 'program' or 'code', we'll infer a category by identifying that they are often used in the same way. We won't need to guide them with particular phrases to look for parts of speech.

# Pending

_Latent variable models_ are models in which we assume that the data we are observing has some hidden, underlying structure that we can't see, and which we'd like to learn. These hidden, underlying structure are the _latent_ variables we want our model to understand.

Text processing is a common application of latent variable models. Again, in the classical sense we know that language is built by a set of pre-structured grammar rules and vocabulary; however, we also we know that we break those rules pretty often and create new words that get added into our vocabulary (see: selfie).

Instead of attempting to learn the rules of 'proper' grammar, we instead look to uncover the hidden structure and ignore preexisting rules (which might not even describe our syntax anyway). Sometimes, the hidden structure we uncover _are_ the basic rules of our language, but sometimes they may also unveil something new.

These techniques are commonly used for recommending news articles or mining large troves of data data and trying to find commonalities. Topic modeling, a method we will discuss in today's class, is used in the [NY times recommendation engine](http://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-times-recommendation-engine/?_r=0). They attempt to map their articles to a latent space (or underlying structure) of topics using the content of the article.

[Lyst](http://developers.lyst.com/2014/11/11/word-embeddings-for-fashion/), an online fashion retailer, uses latent representations of clothing descriptions to find similar clothing. If we can map phrases like 'chelsea boot' or 'felted hat' to some underlying structure, we can use that new structure to find similar products.

# Pending

Our previous 'representation' of a set of text documents (articles) for classification was a matrix with one row per document and one column per word (or _n-gram_).

![Word Factorization Matrix](./assets/images/word-matrix-factorization.png)

While this does sum up most of the information, it does drop a few things - mostly structure and order. Additionally, many of the columns may be dependent on each other (or correlated).

For example, an article that contains the word 'IPO' is also likely to contain the work 'stock' or 'NASDAQ'.  Therefore, those columns are repetitive and both of those columns likely represent the same 'concept' or idea. For classification, we may not care if the document has the word 'IPO' or 'NASDAQ' or 'stocks', but just that it has financial-related words.

One way to do this is with regularization - `L1` or `lasso` regularization tends to remove repetitive features by bringing their learned coefficients to 0.

Another is to perform `dimensionality reduction` - where we first identify the correlated columns and then replace them with a column that represents the concept they have in common. For instance, we could replace the 'IPO', 'stocks', and 'NASDAQ' column with a single - 'HasFinancialWords' column.

There are many techniques to do this automatically and most follow a very similar approach:

1. Identify correlated columns
2. Replace them with a new column that encapsulates the others

The techniques vary in how they define correlation and how much of the relationship between the original and new columns you need to save.

There are many dimensionality techniques built into `scikit-learn`. One of the most common is **PCA** or **Principal Components Analysis**. Like most of the models we've seen, dimensionality techniques can vary between _linear_ or _non-linear_, meaning that they pick up linear or non-linear correlations between columns.

**PCA** when applied to text data is sometimes known as **LSI** or **Latent Semantic Indexing**.


# Pending

Mixture models (and specifically **LDA** or **Latent Dirichlet Allocation**) take this concept further and generate more structure around the documents. Instead of just replacing correlated columns, we create clusters of common words and generate probability distributions to explicitly state how related words are.

To understand this better, let's imagine a new way to generate text:

1. Start writing a document
    1. First choose a topic (sports, news, science)
        1. Choose a word from that topic
    2. Repeat
2. Repeat for the next document

What this 'model' of text is assuming is that each document is some _mixture_ of topics. It may be mostly science, but may contain some business information. The _latent_ structure we want to uncover are the topics (or concepts) that generated that text.

_Latent Dirichlet Allocation_ is a model that assumes this is the way text is generated and then attempts to learn two things:

    1. What is the _word distribution_ of each topic?
    2. What is the _topic distribution_ of each document?
    
The _word distribution_ is a multinomial distribution for each topic representing what words are most likely from that topic.

Let's say we have 3 topics: sports, business, science.
For each topic, we uncover the words most likely to come from them:

For each word and topic pair, we learn some `P ( word | topic) `

The _topic distribution_ is a multinomial distribution for each document representing which topics are most likely to be in that document. For all documents, we then have a distribution over {sports, science, business}

Topic models are useful for organizing a collection of documents and uncovering the main underlying concepts.

There are many variants as well, that attempt to incorporate more structure into the 'model'

 - Supervised Topic Models
    - Guide the process with pre-decided topics
 - Position-dependent topic models
    - Ignore which words occur in what document but instead focus on _where_ they occur
 - Variable number of topics
    - Test a different number of topics to find the best model

# Pending

- Latent variable models attempt to uncover structure from text.
- Dimensionality reduction is focused on replacing correlated columns.
- Topic modeling (or LDA) uncovers the topics that are most common to each document and then the words most common to those topics.
- Word2Vec builds a representation of a word from the way it was used originally.
- Both techniques avoid learning grammar rules and instead rely on large datasets. They learn based on how the words are used, making them very flexible.

In [6]:
titles_freqc.head()

Unnamed: 0,MovieID,Title,Users_count
2651,2858,"Misérables, Les (1998)",3428
466,480,Superman (1978),2672
1848,2028,Babes in Toyland (1961),2653
2374,2571,"Brady Bunch Movie, The (1995)",2590
1178,1270,Four Weddings and a Funeral (1994),2583


# LDA in Gensim

In [51]:
#Addint stop words to exclusion list

def cust_stop_list(lists):
    '''
    Take a list of lists and looks for specific strings.
    Adds to stop_list

    '''

    stop_list = []

    for line in lists:
        words = line.split(' ')
        for word in words:
            if '(' in word:
                stop_list.append(word)
            if ')' in word:
                stop_list.append(word)
            if ',' in word:
                stop_list.append(word)
            if '1930' in word:
                stop_list.append(word)  
            if '1945' in word:
                stop_list.append(word)
            if '1951' in word:
                stop_list.append(word)
            if '1961' in word:
                stop_list.append(word)
            if '1967' in word:
                stop_list.append(word)
            if '1978' in word:
                stop_list.append(word)
            if '1987' in word:
                stop_list.append(word)
            if '1988' in word:
                stop_list.append(word)
            if '1992' in word:
                stop_list.append(word)
            if '1992' in word:
                stop_list.append(word)
            if '1994' in word:
                stop_list.append(word)
            if '1995' in word:
                stop_list.append(word)
            if '1997' in word:
                stop_list.append(word)
            if '1998' in word:
                stop_list.append(word)
            if '1999' in word:
                stop_list.append(word)
            if '2000' in word:
                stop_list.append(word)
    return(stop_list)

#Conditionals on year should be done with a regex but can't seem to make it work atm

stop_list = cust_stop_list(titles_freqc.Title)
print(stop_list[:20])

['Misérables,', '(1998)', '(1998)', '(1998)', '(1978)', '(1978)', '(1978)', '(1961)', '(1961)', '(1961)', 'Movie,', '(1995)', '(1995)', '(1995)', '(1994)', '(1994)', '(1994)', '(1997)', '(1997)', '(1997)']


In [52]:
#Checking stop words list

stop_words = text.ENGLISH_STOP_WORDS
print(stop_words)

frozenset({'de', 'i', 'thereupon', 'at', 'hers', 'never', 'eight', 'so', 'more', 'somehow', 'only', 'thus', 'put', 'alone', 'except', 'give', 'yet', 'over', 'although', 'often', 'seems', 'sometime', 'you', 'yourselves', 'nobody', 'hence', 'part', 'thin', 'please', 'we', 'whereby', 'their', 'show', 'ltd', 'whither', 'yours', 'empty', 'anywhere', 'its', 'without', 'now', 'very', 'been', 'do', 'am', 'onto', 'my', 'several', 'first', 'our', 'everything', 'next', 'become', 'former', 'seem', 'there', 'therein', 'her', 'off', 'whoever', 'beyond', 'fire', 'together', 'through', 'itself', 'whence', 'then', 'always', 'though', 'anything', 'eg', 'into', 'nowhere', 'sixty', 'this', 'down', 'thereby', 'among', 'may', 'seemed', 'who', 'indeed', 'mill', 'due', 'side', 'three', 'same', 'fill', 'cannot', 'if', 'me', 'his', 'inc', 'below', 'other', 'any', 'than', 'beside', 'above', 'an', 'call', 'for', 'how', 'she', 'whom', 'amoungst', 'eleven', 'whether', 'even', 'besides', 'being', 'elsewhere', 'whate

In [53]:
#Adding custom stop words list

cust_stop_words = stop_words.union(stop_list)
print(cust_stop_words)

frozenset({'de', 'i', 'hers', 'eight', 'so', 'more', 'except', 'yet', 'although', 'sometime', 'hence', 'part', 'thin', 'please', 'we', 'ltd', 'whither', 'its', 'without', 'very', 'been', 'am', 'my', 'several', 'there', 'our', 'former', 'whoever', 'her', 'together', 'itself', 'then', 'always', 'though', 'eg', 'nowhere', 'this', 'among', 'may', 'who', 'due', 'three', 'same', 'cannot', 'me', 'other', 'his', 'below', 'than', 'beside', 'above', 'how', 'she', 'amoungst', 'eleven', 'whether', 'even', 'being', 'whatever', 'as', 'the', 'must', 'fifteen', 'each', 'seeming', 'whereas', '(1930)', 'thick', 'back', 'because', 'made', 'they', 'un', 're', 'was', 'neither', 'can', 'cry', 'thru', 'moreover', 'etc', 'throughout', 'anyhow', 'a', 'others', 'sometimes', 'these', 'from', 'keep', 'almost', '(1945)', 'every', 'Hime)', 'con', 'no', 'where', 'ours', 'your', 'serious', 'rather', 'enough', 'detail', 'would', 'against', 'should', 'twenty', 'here', 'themselves', 'herein', 'in', 'that', 'hereafter', 

In [54]:
df= pd.DataFrame({'titles': titles_freqc.Title})

cv = CountVectorizer(binary = False, stop_words = cust_stop_words)

docs = cv.fit_transform(df['titles'].dropna())

#Build a mapping of numerical ID to word
id2word = dict(enumerate(cv.get_feature_names()))

id2word

{0: '1930',
 1: '1945',
 2: '1951',
 3: '1961',
 4: '1967',
 5: '1978',
 6: '1987',
 7: '1988',
 8: '1992',
 9: '1994',
 10: '1995',
 11: '1997',
 12: '1998',
 13: '1999',
 14: '2000',
 15: 'act',
 16: 'babes',
 17: 'big',
 18: 'boiled',
 19: 'bonnie',
 20: 'brady',
 21: 'bunch',
 22: 'clyde',
 23: 'dracula',
 24: 'election',
 25: 'funeral',
 26: 'hard',
 27: 'heart',
 28: 'hill',
 29: 'hime',
 30: 'house',
 31: 'lashou',
 32: 'les',
 33: 'man',
 34: 'misérables',
 35: 'mononoke',
 36: 'movie',
 37: 'notting',
 38: 'princess',
 39: 'quiet',
 40: 'running',
 41: 'shentan',
 42: 'sister',
 43: 'strangers',
 44: 'superman',
 45: 'thing',
 46: 'titanic',
 47: 'toyland',
 48: 'train',
 49: 'untouchables',
 50: 'weddings',
 51: 'western',
 52: 'world'}

In [55]:
#We convert our word-matrix into gensim's format
corpus = Sparse2Corpus(docs, documents_columns = False)

#Then we fit an LDA model. I will check for 10 topics for now because my movies are distributed across 
#5 different genres as per exploratory analysis findings and want to have an initial look at results
lda_model = LdaModel(corpus = corpus, id2word = id2word, num_topics = 10)
lda_model.show_topics()

[(0,
  '0.133*"1967" + 0.133*"clyde" + 0.133*"bonnie" + 0.012*"1987" + 0.012*"1999" + 0.012*"election" + 0.012*"1951" + 0.012*"heart" + 0.012*"1988" + 0.012*"1997"'),
 (1,
  '0.107*"1998" + 0.107*"les" + 0.107*"misérables" + 0.107*"1999" + 0.107*"election" + 0.010*"1951" + 0.010*"1987" + 0.010*"1997" + 0.010*"heart" + 0.010*"1992"'),
 (2,
  '0.133*"1999" + 0.133*"hill" + 0.133*"notting" + 0.012*"1987" + 0.012*"1951" + 0.012*"1997" + 0.012*"big" + 0.012*"superman" + 0.012*"1992" + 0.012*"titanic"'),
 (3,
  '0.133*"world" + 0.133*"thing" + 0.133*"1951" + 0.012*"1987" + 0.012*"1999" + 0.012*"election" + 0.012*"1997" + 0.012*"big" + 0.012*"titanic" + 0.012*"superman"'),
 (4,
  '0.072*"1992" + 0.072*"hard" + 0.072*"boiled" + 0.072*"shentan" + 0.072*"lashou" + 0.072*"toyland" + 0.072*"1961" + 0.072*"babes" + 0.072*"untouchables" + 0.072*"1987"'),
 (5,
  '0.103*"mononoke" + 0.103*"1997" + 0.054*"princess" + 0.054*"hime" + 0.054*"funeral" + 0.054*"1951" + 0.054*"train" + 0.054*"1994" + 0.054*"

# Evaluating the model fit

In the model above, we need to explicitly specify the number of topics we want the model to uncover. This is a critical step but unfortunately there is not a lot of guidance on the best way to select it. Having domain knowledge about your data may help.

Once we have fit this model, like other unsupervised learning techniques, most of our validation techniques are mostly about interpretation.

Did we learn reasonable topics?
Do the words that make up a topic make sense?
We can evaluate this by viewing the top words for each topic:

gensim has a show_topics function for this.

In [56]:
# A way of evaluating the model fit is by looking at the top words in each topic with gensim how_topics function
#I expect some topics to represent some concepts clearly but others not so much.
num_topics = 10
num_words_per_topic = 5
for ti, topic in enumerate(lda_model.show_topics(num_topics, num_words_per_topic)):
    print("Topic: %d" % (ti))
    print (topic)
    print()

Topic: 0
(0, '0.133*"1967" + 0.133*"clyde" + 0.133*"bonnie" + 0.012*"1987" + 0.012*"1999"')

Topic: 1
(1, '0.107*"1998" + 0.107*"les" + 0.107*"misérables" + 0.107*"1999" + 0.107*"election"')

Topic: 2
(2, '0.133*"1999" + 0.133*"hill" + 0.133*"notting" + 0.012*"1987" + 0.012*"1951"')

Topic: 3
(3, '0.133*"world" + 0.133*"thing" + 0.133*"1951" + 0.012*"1987" + 0.012*"1999"')

Topic: 4
(4, '0.072*"1992" + 0.072*"hard" + 0.072*"boiled" + 0.072*"shentan" + 0.072*"lashou"')

Topic: 5
(5, '0.103*"mononoke" + 0.103*"1997" + 0.054*"princess" + 0.054*"hime" + 0.054*"funeral"')

Topic: 6
(6, '0.097*"house" + 0.097*"sister" + 0.097*"1945" + 0.097*"1992" + 0.097*"act"')

Topic: 7
(7, '0.118*"1978" + 0.118*"2000" + 0.118*"heart" + 0.118*"superman" + 0.011*"1987"')

Topic: 8
(8, '0.019*"1999" + 0.019*"1987" + 0.019*"1951" + 0.019*"superman" + 0.019*"1997"')

Topic: 9
(9, '0.072*"1995" + 0.072*"movie" + 0.072*"man" + 0.072*"1987" + 0.072*"western"')



I am having issues cleaning up the years from my titles so words within topics are more meaningful :(

# Model interpretation

# Improvement areas

# Model optimisation and associated costs/benefits