# Initial Solution

## Description
The aim of this solution is to use the simplest method to solve this problem.  It will generate a baseline score to compare all future attempts against.  I will be performing a simple clean up of the text data and then use exact matching on the text.  The number of matches found for each row can then be used as a check for more complex matching attempts.  If at any time, any future solution produces a worse result than this method, I know that I have an error.

# Importing The Data

In [66]:
import numpy as np
import pandas as pd
import re
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 200)

In [67]:
content_titles_df = pd.read_csv('ProvidedFiles/Content sample.csv')

In [68]:
content_titles_df.head()

Unnamed: 0,Content_name
0,Fear the Walking Dead
1,Supernatural
2,The Gentlemen
3,Outlander
4,The Good Doctor


In [69]:
survey_response_df = pd.read_csv('ProvidedFiles/Survey response sample data.csv')

## Sample Survey Responses

In [70]:
df_styler =survey_response_df.style.set_properties(**{'text-align': 'left'})
df_styler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

Unnamed: 0,Customer_id,Response
0,1,"Fear the walking dead,Supernatural (huge fan and sad it has finished),The Gentlemen, Outlander"
1,2,A lot! -good doctor -gangs of London - the gentleman -ma -spies in disguise
2,3,"Miss scarlet and the duke,knifes out,Dublin murders"
3,4,"History drama-Vikings,Kid friendly-Casper,Sometimes the conversations while watching Neon can get serious but we all end up having fun together, :)"
4,5,"The Undoing,Game of thrones,Outlander, Vikings,CB Strike (and most all British dramas) Westworld"


# Data Transformations

## Survey Response Transformations

In [71]:
survey_response_df['transformed'] = survey_response_df.apply(lambda x: re.sub(r'[^\w]', ' ', x['Response'].lower()), axis=1)
survey_response_df.head()

Unnamed: 0,Customer_id,Response,transformed
0,1,"Fear the walking dead,Supernatural (huge fan and sad it has finished),The Gentlemen, Outlander",fear the walking dead supernatural huge fan and sad it has finished the gentlemen outlander
1,2,A lot!\n\n-good doctor\n-gangs of London \n- the gentleman\n-ma \n-spies in disguise,a lot good doctor gangs of london the gentleman ma spies in disguise
2,3,"Miss scarlet and the duke,knifes out,Dublin murders",miss scarlet and the duke knifes out dublin murders
3,4,"History drama-Vikings,Kid friendly-Casper,Sometimes the conversations while watching Neon can get serious but we all end up having fun together, :)",history drama vikings kid friendly casper sometimes the conversations while watching neon can get serious but we all end up having fun together
4,5,"The Undoing,Game of thrones,Outlander, Vikings,CB Strike (and most all British dramas) Westworld",the undoing game of thrones outlander vikings cb strike and most all british dramas westworld


## Content Title Transformations

In [72]:
content_titles_df['transformed'] = content_titles_df.apply(lambda x: re.sub(r'[^\w]', ' ', x['Content_name'].lower()), axis=1)
content_titles_df.head()

Unnamed: 0,Content_name,transformed
0,Fear the Walking Dead,fear the walking dead
1,Supernatural,supernatural
2,The Gentlemen,the gentlemen
3,Outlander,outlander
4,The Good Doctor,the good doctor


# Finding the Matches

In [73]:
titles_found_list = []
number_of_matches_list = []
for index_2, row_2 in survey_response_df.iterrows():
    
    number_of_matches = 0
    titles_found = ''
    for index_1, row_1 in content_titles_df.iterrows():
        keyword = row_1['transformed']
        if keyword in row_2['transformed']:
            number_of_matches+=1
            titles_found += row_1['Content_name'] + ', '
    number_of_matches_list.append(number_of_matches)
    titles_found_list.append(titles_found)
result_dict = {
    'Customer_id':survey_response_df['Customer_id'],
    'Response':survey_response_df['Response'],
    'Number_of_matches':number_of_matches_list,
    'Titles_found': titles_found_list}
result_df = pd.DataFrame(result_dict)
result_df.head()

Unnamed: 0,Customer_id,Response,Number_of_matches,Titles_found
0,1,"Fear the walking dead,Supernatural (huge fan and sad it has finished),The Gentlemen, Outlander",4,"Fear the Walking Dead, Supernatural, The Gentlemen, Outlander,"
1,2,A lot!\n\n-good doctor\n-gangs of London \n- the gentleman\n-ma \n-spies in disguise,3,"Gangs of London, Ma, Spies in Disguise,"
2,3,"Miss scarlet and the duke,knifes out,Dublin murders",2,"Miss Scarlet and the Duke, Dublin Murders,"
3,4,"History drama-Vikings,Kid friendly-Casper,Sometimes the conversations while watching Neon can get serious but we all end up having fun together, :)",3,"Ma, Vikings, Casper,"
4,5,"The Undoing,Game of thrones,Outlander, Vikings,CB Strike (and most all British dramas) Westworld",6,"Outlander, Ma, Vikings, The Undoing, Game of Thrones, Westworld,"
