<a href="https://colab.research.google.com/github/gofornaman/Data-Science-Projects/blob/master/Customer_Queries_Classification_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Say What? Customer Query Classification with some simple A.I

![alt text](https://www.techbleez.com/wp-content/uploads/2018/06/Online-customer-support-with-AI.png)

I´m happy to welcome you back to yet another short tutorial on how to use Machine Learning to help  accelerate and facilitate things in your company. Whatever sector you might be in, I´m certain that you have clients. And clients have this pesky tendency of complaining about stuff or requesting help ...

Today´s exercise will show you how we can take a written message from a customer requesting customer support and classify that message as we see fit. The messages I am going to be using as an example today actually come from *real* customers from my very *real* website. Don´t worry, these messages are completely anonymized and do not contain any sensitive information. The website I run helps English teacher find easy classroom material, and everyday teachers write to us directly asking for specific material. Now in my case, each request we receive falls into one of two categories, it is either a request that can be dealt with automatically (category *0*) or it is something more detailed and complex and it needs manual, human intervention (category *1*). For your own case, the categories might vary, you could use urgent and non-urgent help requests, or even classify them into separate  areas ( category 0 messages for general support, category 1 messages goes to the sales teams, category 2 messages are for the fraud team, etc...). All depends on your needs and context. 

The idea here is for you to get an idea of how we can use A.I and M.L on raw text communication, so that it can inspire you to automate other parts of your company ( email classification, Chatbots, presentation summarizers etc., the list is endless!).

The great thing for our use case is that the data is extremely basic: We just need the raw message from our message and its associated label. Have a look here at the [example dataset that we´ll be using here](https://docs.google.com/spreadsheets/d/1diEXKKl8j4SsqEopc7ln1NTZNzUZ6oFcN371XSkEoWo/edit#gid=0) from my users. So you can just scrape through all your pass customer service requests that will be enough for you to automate for the future.

So here is, as always, the order of proceedings to get this mission accomplished:

##Part 1: Data Cleaning ( As always the most important part)

##Part 2 : Data Learning ( As always, the surprisingly easiest part)

##Part 3: Data Predicting (As always the most fun part if all goes well)



###Step 0- Importing libraries

But before all that, let´s import our libraries without which we couldn´t perform any magic tricks.

In [0]:
# A general purpose ML library that has almost everthing we need
import sklearn
from  sklearn.feature_extraction.text import CountVectorizer, HashingVectorizer
from sklearn.decomposition import PCA
from sklearn.svm import LinearSVC
# for making linear algebra readable
import numpy as np

#Allows to upload and download files directly from the browser
from google.colab import files 


#Allows us to manipulate data in a quick and easy way
import pandas as pd

## Part 1- Data Cleaning

So let´s get started with the data pre-processing or "cleaning". This first step, before any actual Machine Learning has to be done, and it is usually the most crucial. Think of it as like cooking a meal, even if you are the best cook in the world, if our ingredients are going off or are missing, then our food will not taste well no matter what fancy utensils or techniques you use. On the flip side, if you have fresh, high quality ingredients, you can whip up a pretty delicious meal with pretty simply.

The same analogy applies to M.L, if you don´t have well-processed data, then you can forget about the rest. First of all, let´s have a look at our dataset. As I mentioned before, we start off with two columns, the raw text input, and the second column giving us its category, being either 0 as a low-level request, and 1 being an urgent query that requires human intervention:

In [0]:
data_url=("https://raw.githubusercontent.com/busyML/Customer-Queries-Classification-NLP/master/NLP%20INSTANTEACH%20ML%20-%20Sheet1.csv")

data=pd.read_csv(data_url)

data




Unnamed: 0,Input Query,Category
0,be or do,0
1,comparatives and superlatives,0
2,how to teach overcoming obstacles concept wit...,1
3,Present perfect simple or continuous,0
4,-,0
5,-----,0
6,1st conditional,0
7,2018 World Cup,1
8,2018 World Cup,1
9,2nd conditional,0


###Always Shuffle

So as we have seen in other tutorials, shuffling our data randomly should always be our first step. In our case, our data is ordered alphabetically, so we don´t want that to cause any unwanted biases to slip in. It’s always safest to shuffle our data before we even start touching it. And there is no excuse for not doing so when it can be done in 1 line of code:

In [0]:

#we use the "sample" command of pandas to shuffle our data, the random state means that we will always shuffle the data in the same way so that when different people load this code, they will all get the same results.
data= data.sample(frac=1, random_state=11)

#we print out the first 11 rows of our data to check that it has indeed been shuffled, on the left we have the index number which we can also think of as an ID number.

data.head(11)




Unnamed: 0,Input Query,Category
640,"reading, about 2 special schools in London and...",1
336,"grammar, functions, thematic reading, thematic...",0
972,Tenses,0
177,debate,0
328,grammar . speaking. writing,0
548,Places to go _ tourist,0
93,Camping,1
782,Texts and/or audios to expand practice on Simp...,0
80,Business,0
750,Sport,0


In [0]:
#data.to_csv("shuffled.csv")

#files.download("shuffled.csv")

###Step 1- Vectorization of Words

So there is a clear issue that you are probably wondering about, which is of course, how on earth can an algorithm read text? Well, the answer is, it doesn´t. As is always the case, we are going to have to find a way to convert our raw text to numbers so that our algorithm can learn from.

Fortunately, this is a well-studied topic and there are countless of options we could use, ranging from the simplest to the most insanely complex.

We´ll go through the simplest form, explain its limitation and then gives us a medium-range solution that should work well in most cases:


***Counting Word Frequency***

So this is the granddad of word vectorization, the simplest way we can transform words to number, for each word in a sentence, we count the number of times each word appears. When we are comparing different raw text, we´ll take the whole range of vocabulary and count for each sentence the frequency of for each word in our total vocabulary. This simplistic technique is commonly called the "bag of words" technique

![alt text](https://www.python-course.eu/images/bag_of_words.png)




To see this in practice, [here is how the dataset would look after the transformation at the following link](https://docs.google.com/spreadsheets/d/1diEXKKl8j4SsqEopc7ln1NTZNzUZ6oFcN371XSkEoWo/edit#gid=1530603417). 




In [0]:
count_vectorization = CountVectorizer()

count_example= (count_vectorization.fit_transform(data["Input Query"].values.astype('U'))).toarray()

count_example =pd.DataFrame(count_example)

vocab_list=list (count_vectorization.get_feature_names())

i=0
for i in range(len(count_example.columns)):
  count_example.rename(columns={i: vocab_list[i]}, inplace=True)
  

#count_example.to_csv("countexample.csv")

#files.download("countexample.csv")





However, this way of converting the text really isn´t the best, sometimes the algorithm won´t be able to learn much from it because of its simplistic nature. But most importantly, this count vectorization generates a big big dataset and this can be a big problem. Take for instance our small dataset here is only of a thousand queries and there are 1,200 unique vocabulary words. 1000 x 1,200 =... Over a million cells! 

So even with just a small dataset of queries, our numerical dataset is huge. Now keeping in mind that a medium sized company will probably have at least 100,000 past examples with 10,000 of individual words in it and then the size truly becomes mind- boggling. We need a more practical system that can scale better.

**Hashing Vectorization**

So hashing vectorization uses a bit of good´ol math wizardry to plot all the words on a graph and then evaluates their proximity to another. This way, the algorithm can start understanding the relevance of one word to another and get a better, overall idea of how some words are interconnected. To wrap your head around this concept, have a look at the following representations:

![alt text](https://www.tensorflow.org/images/linear-relationships.png)

![alt text](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2017/08/Scatter-Plot-of-PCA-Projection-of-Word2Vec-Model.png)

As we can see, the algorithm looks at how words tend to cluster in different contexts and finds the relations between them based on how they were used in the text. Once, again [if you want to see how the dataset ended up looking, have a gander over at this link.](https://docs.google.com/spreadsheets/d/1diEXKKl8j4SsqEopc7ln1NTZNzUZ6oFcN371XSkEoWo/edit#gid=281324509)

In [0]:
#We call the Hashing Vectorizer from SKlearn, and we limit its columns to 1024 (2 to the tenth power)

vectorization= HashingVectorizer(n_features=2**10, norm = "l1")

#applying the vectorizer on our text
vec_counts = (vectorization.fit_transform(data["Input Query"].values.astype('U'))).toarray()

#putting our training data in Pandas format
training_data=pd.DataFrame(vec_counts)

#We create an excel file that contains the wine with their new categories
#training_data.to_excel("instanteachnlptraining.xlsx")

#We use the ".download" command to download the new excel file to our browser
#files.download("instanteachnlptraining.xlsx")

training_data.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.027778,...,0.0,0.0,0.0,0.0,0.0,-0.027778,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Right, so after a few lines of code, we have our dataset. Most of the time when dealing with text input, this Hash vectorization will be the way to go. The great thing here is that we are able to limit the number of columns of the conversion generates meaning that our dataset is limited in size, it will always have around 1,000 columns, which means that adding more examples is less problematic. If you wish to use this code for yourself and you want to use something like 100,000 examples instead of only a 1,000 like I have, you might want to increase the "n_features" to 2^11 or 2^12, increasing the number of columns.

However, although our dataset is now no longer going to grow exponentially, it is still rather big and maybe can still be difficult to scale depending on your computing resources. Furthermore, if you look at our dataset, you´ll see that there are many zeroes, which seems a bit unnecessary. If only there was a way to compress all these numbers so that it didn´t take so much space in our computer´s memory...

**Compressing the data with PCA**

There is a great technique for this called [Principal Component Analysis](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html), or PCA as every normal mortal calls it. PCA takes any dataset and basically boils it down to a more concise version, keeping only the most essential data and discarding the rest, allowing the dataset to become smaller. It´s like when you take someone´s rambling 3 paragraph email and turn it into a few bullet points. Or another analogy could be of taking a high resolution picture and converting it to a lower quality; we´ll still be able to see and understand what the picture is despite it being a bit fuzzier, but the picture will take up less space on our hard drive: 

![alt text](https://raw.githubusercontent.com/Benli11/data/master/img/reTiger.png)

Of course, the second picture is of less quality **but we can still clearly identify it as a tiger**. That's the core idea behind PCA.



In [0]:
training_rows, training_columns= training_data.shape

#We load PCA from Sklearn, the "0.99" means that we want the compression to retain 99% of the original data´s significance. 
pca_compressor=PCA(0.99)

#We use the PCA transformer to compress the image and we use the "DataFrame" function to convert it into our favorite pandas format
compressed_training_data = pd.DataFrame(pca_compressor.fit_transform(training_data))

#we get how many columns the new compressed dataset has so that we can compate with the original one.
compressed_rows, compressed_columns = compressed_training_data.shape

#We  print out the comparison between the two
print("number of columns before compression:",training_columns,"\n","number of columns after compression:" ,compressed_columns)

#compressed_training_data.to_excel("pcadata.xlsx")

#files.download("pcadata.xlsx")

compressed_training_data.head()


number of columns before compression: 1024 
 number of columns after compression: 371


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,361,362,363,364,365,366,367,368,369,370
0,-0.018548,-0.009943,-0.041752,-0.009834,0.038122,-0.020786,0.03267,-0.002762,-0.022673,0.002803,...,0.000165,0.002007,0.000524,0.005563,0.005353,0.003982,-0.002125,0.005118,-0.002865,-0.002994
1,0.144641,-0.030201,-0.024569,-0.010797,0.046357,0.01659,0.104466,-0.018984,-0.016666,0.089007,...,0.000126,-0.004897,-0.001377,-0.005838,-0.000881,0.003819,0.00419,0.005479,-0.001715,-0.003087
2,-0.036206,-0.031946,0.059021,-0.153236,0.376132,0.819627,-0.13438,-0.002157,0.215932,-0.243564,...,0.000546,-0.000513,-4.7e-05,0.000207,-0.00024,-0.00101,0.001648,-0.00038,0.000367,0.001204
3,-0.017365,-0.006946,-0.032016,-0.010879,0.022846,-0.024185,-0.008863,0.004951,-0.010268,-0.015265,...,0.00011,0.000158,8.5e-05,3.7e-05,-8.8e-05,2.8e-05,1.1e-05,-3e-05,-6e-05,8.1e-05
4,0.311567,-0.059424,-0.079415,0.240529,-0.139072,0.096719,-0.001947,0.005382,-0.000205,0.01451,...,-0.002816,-0.003351,0.000699,0.001242,0.000354,2.3e-05,0.001958,-0.001988,-0.00283,-0.000235


Well, would you believe it! After applying the compression process, we only have 371 columns in our dataset, in other words we´ve reduced the size of the dataset by 65% and all losing 1% significance of the original data. As you can see, we no longer have sparse zero columns, everything has been compressed as tightly as possible in order to have as few columns as possible. This now makes our data far more scalable when adding examples to it in the future.

### The "answer" dataset

Well we now have our training data ready, all we need now is to create another, much smaller dataset that stores the attributed label, and we can do so with one line of code. Our algorithm will use this as its basis to learn and classify future queries.

In [0]:
training_answers= data["Category"]

training_answers.head(15)

640    1
336    0
972    0
177    0
328    0
548    0
93     1
782    0
80     0
750    0
69     0
792    0
672    1
38     0
950    1
Name: Category, dtype: int64

Our model will use these answers to "learn" the relation between each input text and it´s label category.

Of course, in your context, you might have several different categories assigned to your customer queries. The models should be able to handle multi queries as well if you so need it. 

# Learning from the data

So now the “surprisingly easy" part. Indeed, if our data has been well cleaned and formatted in the previous step and it should be relatively easy to get good results. First of all, this is a classification problem, so the first thing that should jump to our mind is: what´s more dangerous for my company? A false positive or a false negative? Enter the concepts of Precision and Recall:

###Step 1- Choosing between Precision or Recall as our key metric:

First of all, let's define some useful vocabulary. This will help us to understand how well our model is performing. There are four types of predictions that our model could make: True Positive, False Positive, True Negative, False Negative. Let's define these terms:

True Positive: This is a good prediction, our model predicted that the customer was going to churn (emitted a 1) and the customer did in reality churn (churn=1).

False Positive: This is an incorrect prediction, our model predicted that our customer churned ( emitted a 1), however it turns out that customer did not in fact churn (churn=0)

True Negative : This is again a correct prediction, our model said that the customer was not going to churn ( it emitted a 0) and this turned out to be correct in the real world ( churn was equal to 0)

False Negative: Again, this is an incorrect prediction, but of a different kind. Here our model predicted that the customer was not going to churn (emitted a 0), but lo and behold, in reality our customer did in fact leave the company (churn=1)

With this now clearly defined, we need to decide what evaluation metric we will use to evaluate our model.

This actually is a crucial concept, and we have three evaluation options to choose from:


*   **Precision** - High precision means that the model does not give many "False Positives".

*   **Recall** High Recall means that model has a very low proportion of "False Negatives".

* **F1 Score **  The F1 Score is a balance between Precision and Recall, this a great measure for overall accuracy.

![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Precisionrecall.svg/350px-Precisionrecall.svg.png)


Which of these metrics are most important to pay attention to? Well, this wholly depends on your context and what type of error is most costly to your business. Here we need to analyze the consequence of each type of error.

If, for example a false positive is very very costly, such as in the case of Face Recognition or Mortgage Approval, then we will want to maximise the Precision metric as much as possible.

However, if getting a false negative is even more problematic, as for example in cases such as Fraud Detection or Cancer Diagnosis, then we'll make sure our model has the highest Recall measure possible.

If obtaining a false negative is just as bad as a false positive and there is no difference between them, then we can simply use the F1 score.

Remember there is no right or wrong answer here because it is a judgment call based on your own individual context.

So for the personal context of my website, it just so happens that it is far more important for me to be able to detect the "1" category because these are urgent and need human intervention, whereas the "0" category responses are not so important since they can be dealt with automatically by the system. It´s not a big deal if a few "0" type queries get bunched with the "1", a human can also deal with them. However, if a user is issuing an urgent query that demands human attention and the model classifies that as being a non-important "0" query, then that could cause high customer dissatisfaction (because no one will be dealing with their query).

Therefore in practice, I will want my model to be "paranoid" and to always lean towards the "1" category unless it is extremely certain that it is a type "0" query. Basically, if there when there is any room for doubt, the model will classify the query as being of a "1" just to be on the safe side of things.

Of course this decision is based on my unique business context, you will have to go through this decision-making process yourself for your own company needs, but it´s nothing that a bit of common sense can´t handle.

So how do we manage this programmatically? We simply need to assign to each category, in our case "0" and "1", different weights. We´ll assign a heavier weight to the "1" category because we want to be as certain as possible to always detect that class. If you want to use this code for yourself, you can adjust the weights for your own classes to what you need for your own use case.



In [0]:
# the class weights tell the algorithm how paranoid to be about each class. In this case, the 0.87 for the "1" category means that it will only categorize a query as "0" if it is more than 87% sure. Otherwise it will classify it as "1", which means that it´s not taking any risks with the "1" category
class_weights = {0:0.13, 1:0.87}

### Step 2- Learning about the learning algorithm

Right, now let´s get to the fancy part. We´ll be using an algorithm called a linear SVC (Support Vector Classifier). It´s not too complex of a concept as I hope you can discern from the beautiful drawings below. And as always, you don´t need to know the math behind it, all you need to know is the one line of code to load it from the Sklearn library.

![alt text](http://michelleful.github.io/code-blog/assets/images/201506/svm2_new.png)

![alt text](https://chrisalbon.com/images/machine_learning_flashcards/Support_Vector_Classifier_print.png)




In [0]:


# We load the LinearSVC from Scikit learn. We use the C parameter from stopping the model jsut memorizing the dataset, then we also input the clas weights... in the class weights parameter
svc_model= LinearSVC(C=7, dual=True, loss="squared_hinge", penalty="l2", tol=1e-7, class_weight=class_weights)

#we use the .fit command to get the model to learn from the formatted compressed data, matching them with the training answers. 
svc_model.fit(compressed_training_data,training_answers)



LinearSVC(C=7, class_weight={0: 0.13, 1: 0.87}, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=1e-07,
     verbose=0)

As always we can load the model with one simple line of code and train it with the ".fit" command to get to learn from the data. Learning from the data will only take a few seconds! As you can see from the code below, it is very easy and simple to implement:



###Step 3 - Evaluating the model

So now we actually need to test our model, to see if this thing actually works! Below I´ve constructed a little program that counts the number of false positives and the number of false negatives etc. that our model generates. 

For the eagle-eyed out there, you will have noticed that we are not using "test" dataset here to examine the performance of our model, but rather, we are re-using out training data for this. How dare we?! The reason is that we are in the online context, where this model will always be tested and learning in real time as user input comes through. The true test will come when we deploy into the real world. This also means we´ll be adding to our dataset new examples ideally every couple of weeks and that it will steadily and the model will be improving over time. 

So below, we´ll count the number of correct and incorrect answers and we´ll print out our key metrics:



In [0]:
#initializing our counts of true positive, false negative, etc.
TP=0
TN=0
FP=0
FN=0

#we create what is called a for loop to iterate through every row of the dataset
for i in range(len(training_data)):
    #Counts True Positives if the answer is "1" and the model predicted "1"
    if svc_model.predict(compressed_training_data.iloc[[i,]])==1 and training_answers.iloc[i]==1:
                                       TP=TP+1
        
    ##Counts False Positives if the answer is "0" and the model predicted "1"
    if svc_model.predict(compressed_training_data.iloc[[i,]])==1 and training_answers.iloc[i]==0:
                                       FP=FP+1    
    ##Counts True Negatives if the answer is "0" and the model predicted "0"
    if svc_model.predict(compressed_training_data.iloc[[i,]])==0 and training_answers.iloc[i]==0:
                                       TN=TN+1    
    
    #Counts False Negatives if the answer is "1" and the model predicted "0"
    if svc_model.predict(compressed_training_data.iloc[[i,]])==0 and training_answers.iloc[i]==1:
                                       FN=FN+1   

print ("Model", "True Positives:",TP, "False Positives:",FP,"True Negatives:",TN , "False Negatives:", FN) 

model_accuracy= (TP+TN)/(len(training_answers))
model_precision= TP/(TP+FP) 
model_recall=TP/(TP+FN)
model_f1= 2 * (model_precision * model_recall) / (model_precision + model_recall)

print("Model´s Accuracy On Training Data:", model_accuracy *100,'%')
print("Model´s Precision On Training Data:", model_precision *100,'%')
print("Model´s Recall On Training Data:", model_recall *100,'%')
print("Model´s F1 On Training Data:", model_f1 *100,'%')



Model True Positives: 191 False Positives: 125 True Negatives: 767 False Negatives: 1
Model´s Accuracy On Training Data: 88.37638376383764 %
Model´s Precision On Training Data: 60.44303797468354 %
Model´s Recall On Training Data: 99.47916666666666 %
Model´s F1 On Training Data: 75.1968503937008 %


So we´ve calculated our metrics by using some pretty simple sumation and multiplication. 

And as we can see, the most important metric for us, **Recall** was pretty good, with 99% (note that this will probably drop in the real world to maybe around 90-95%). We also see that we have quite a few false positives but that´s ok for our business context. Of course, if in your case you wanted to prioritize the “0" category, then the **precision** metric is the one you would try to maximize (by readjusting the class weights from earlier.)





#Part 3- Data Predicting 

So we've done the hardest part, now we need to create a little program that allows us to classify any new text inputted into it. The idea here is that you'll have this program waiting in the background on your platform and whenever a customer issues a customer support query, then the program will classify query and direct it to correct funnel for it to be solved.

This section is more to do with developer implementation and deployment, so it's not so important to look through the code. However, I do recommend that you have a go at it yourself and input some text as an example. (For this, pretend that you are an English teacher who needs a specific topic for your class. A general query should output a 0 whereas a specific query should be classified as a 1.) Have fun!

In [0]:
def query_classifier(new_input):
  #we get the raw text into a "list" format with these brackets
  new_input=[new_input]

  #Now we need to format it in the same way we formated our training data. we first apply to it the hash vectorization to get it into the same format
  new_input_vectorized = vectorization.fit_transform(new_input)
  new_input_vectorized=pd.DataFrame(new_input_vectorized.toarray())
  
  #Now that we have the hash vectors, we can compress it using PCA (we actually need to add to the training set because PCA compresses the data in function of other data)
  compressing_new_input= training_data.append(new_input_vectorized, ignore_index=True)
  pca_input_compressor= PCA(n_components=compressed_columns, svd_solver='full')
  
  #We compress the data...
  compressing_new_input= pd.DataFrame(pca_input_compressor.fit_transform(compressing_new_input))
  
  #And now we extract the last row that corresponds to the last row which is our new formatted input that we want to predict
  new_input_compressed = compressing_new_input.iloc[[(len(compressing_new_input.index)-1),]]

  #Now we use the ".predict" function to classify the text as "0" or "1"
  prediction=svc_model.predict(new_input_compressed)
  
  if prediction==0:
    print(prediction)
    print("Not to worry, we can deal with this query automatically. This is not an urgent request!")
  else:
    print(prediction)
    print("Human, help please! This request is too complex and specific... Please do it manually")
    

new_input= (input("input new text:"))


query_classifier(new_input)

#Conclusion

That's all. I hope you were able to appreciate how easy it was to automatic a significant portion of the customer support query helpline. Once we know what type of query is being inputted, we can do lots of magical things like take the user to the exact help page he/she needs or open a chat with a human agent all depending on what they wrote in their support query and the problem we are having. I can only see this as a win-win seeing as the user gets a more personalized and satisfying experience meanwhile the company saves precious human time.

So that will be it for today. The key things to remember is that by using a **"hash vectorizer"**, we can conveniently convert any raw text to numbers that can then be easily digested by the algorithm. However, depending on your case, this text-to-number conversion can generate huge datasets, so we can use PCA compression to reduce it and save valuable memory space.

Lastly and perhaps the most important message of all is that I hope you continue to be convinced that you don't need to be a programmer to implement these tools in your company today and you can do so to make your and your colleagues’' lives a lot easier.

Feel free to contact me about any questions, comments or feedback at my email: [conrad.w.s@gmail.com](mailto:conrad.w.s@gmail.com) or hit me up [on Linkedin at Conrad WS.](https://www.linkedin.com/in/conrad-wilkinson-schwarz-210aa9b2/)

In [0]:
print("thank you for reading!!")