# Loading

In [1]:
import autocompleter 
autocompl = autocompleter.Autocompleter()

In [2]:
df = autocompl.import_json("sample_conversations.json")
df.shape, df.columns

load json file...
(22264, 3)


((22264, 3), Index(['IsFromCustomer', 'Text', 'index'], dtype='object'))

The file contains 22K conversations between a customer and a representative.
For the purpose of this project, we are only interested in completing the threads of the representative.

In [3]:
df.head()

Unnamed: 0,IsFromCustomer,Text,index
0,True,Hi! I placed an order on your website and I ca...,0
1,True,I think I used my email address to log in.,0
2,True,My battery exploded!,1
3,True,"It's on fire, it's melting the carpet!",1
4,True,What should I do!,1


# Data Selection and Cleaning

The data is going to separate the threads from the customer and the representative, separate the sentenses based on the punctuation (we will keep the punctuation), the final text will be cleaned up with some light regex and only the sentense larger than 1 word will be kept.

In [4]:
new_df = autocompl.process_data(df)
new_df.shape, new_df.columns

select representative threads...
split sentenses on punctuation...
Text Cleaning using simple regex...
calculate nb words of sentenses...
count occurence of sentenses...
remove duplicates (keep last)...
(8560, 5)


((8560, 5),
 Index(['IsFromCustomer', 'Text', 'index', 'nb_words', 'Counts'], dtype='object'))

# Model and TFIDF matrix

A matrice of similarity is calculated based on the frequency of all the words in the data using tfidfvectorizer

In [5]:
model_tf, tfidf_matrice = autocompl.calc_matrice(new_df)

tfidf_matrice  (8560, 99397)


# Ranking Function

Finally, the autocomplete is calculating the similarity between the sentense in the data and the prefix of the sentense. As a weight feature, we chose to reorder using the frequency of the most common similar sentense.

Examples of auto completions:

In [6]:
prefix = 'What is your'

print(prefix,"\n")

autocompl.generate_completions(prefix, new_df, model_tf,tfidf_matrice)

What is your 



['What is your account number?',
 'What is your order number?',
 'What is your phone number?',
 'What is your address?',
 'What is your username?',
 'What is your order?',
 'What is your flight number?']

In [7]:
prefix = 'How can'
print(prefix," ")
autocompl.generate_completions(prefix, new_df, model_tf,tfidf_matrice)

How can  


['How can I help you?',
 'How can I help you today?',
 'Ok lets see how I can help',
 'How can we help you?',
 'Ok let me see how I can help',
 'How can we be of assistance to you?',
 'How may I help you?']

In [8]:
prefix = 'Let me'
print(prefix," ")
autocompl.generate_completions(prefix, new_df, model_tf,tfidf_matrice)

Let me  


['Let me investigate',
 'Let me assist you',
 'Let me look',
 'Let me know',
 'Let me help',
 'Let me help you',
 'Let me research']

In [9]:
prefix = 'when was'
print(prefix," ")
autocompl.generate_completions(prefix, new_df, model_tf,tfidf_matrice)

when was  


['When was the last time you changed your password?',
 'When was your flight scheduled for?',
 'When was the last time you tried?',
 'When was the last time you changed your password for the router?',
 'When was the last time you changed your wi-fi password?',
 'When was the last time you changed your password for your router?',
 'When was the last time you changed your password on modem/router?']

Now, without any uppercase and just with the important words...

In [10]:
prefix = 'when time password'
print(prefix," ")
autocompl.generate_completions(prefix, new_df, model_tf,tfidf_matrice)

when time password  


['When was the last time you changed your password?',
 'When you select you password?',
 'Take your time',
 'At this time',
 'When was the last time you changed your password for the router?',
 'When was the last time you changed your wi-fi password?',
 'When was the last time you changed your password for your router?']

In [11]:
prefix = 'how is the'
print(prefix," ")
autocompl.generate_completions(prefix, new_df, model_tf, tfidf_matrice)

how is the  


['How is the service?',
 'How is the reception at your house?',
 'How is the internet speed looking now?',
 'How is March 21st?',
 'How is your service now?',
 'How is your service at your house?',
 'What is the address?']

In [12]:
prefix = 'Are you'
print(prefix," ")
autocompl.generate_completions(prefix, new_df, model_tf, tfidf_matrice)

Are you  


['Are you still there?',
 'Are you available?',
 'Are you there?',
 'Are you still there',
 'Are you avilable?',
 'Are you aware of that?',
 'Are you aware of this?']