This script uses the random forest and gradient boosting script built for the healthcare 
and wellness services recommender system. But splits the data into 85/15 for train/test sets
with 150 trees both and depth of 11 on the gradient boosted trees.

This dataset is a dataset of 19 modalities of massage with a description of the modality,
the benefits, contraindications, and the side effects of each modality. There is only one unique entry for
each modality, and the 19 classes were duplicated 24 times each so that when splitting the data, none of the 
modalities would get left out. Because only having 19 unique samples as a test set, will leave out 15% of 
the samples that won't be classified, because only the classes in the testing set of 85% will be used.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np


In [2]:
modes = pd.read_csv('MassageModalities2.csv')

In [3]:
modes.head()

Unnamed: 0,modality,Description,benefits,contraindications,sideEffects
0,Swedish Massage,"Traditional spa or clinic massage with hands, ...","improves tight muscles, loosens tight muscle f...","dehydration, fever, rashes, infection, some me...",if no contraindications for massage exist in c...
1,Shiatsu Massage,Accupressure or small and localized pressure i...,"improved circulation, relaxing, detoxes, break...","dehydration, fever, rashes, infection, some me...",can cause muscle soreness for a few days from ...
2,Prenatal Massage,swedish massage up to medium pressure or middl...,"improved circulation, better sleep, pain relie...","dehydration, Be a high risk pregnancy (history...",can make client dizzy or nausious if first mas...
3,Sports Massage,Massage that is focused on sports related disc...,improves healing from sports or work related m...,"dehydration, fever, rashes, infection, some me...",can cause muscle soreness for a few days from ...
4,Lymphatic Drainage Massage,Massage focused on superficial muscles and lym...,"improves circulation,","dehydration, fever, rashes, infection, some me...",client could get dizzy or feel nausious if he ...


In [4]:
modes.shape

(456, 5)

In [6]:
modes.groupby('modality').describe()

Unnamed: 0_level_0,Description,Description,Description,Description,benefits,benefits,benefits,benefits,contraindications,contraindications,contraindications,contraindications,sideEffects,sideEffects,sideEffects,sideEffects
Unnamed: 0_level_1,count,unique,top,freq,count,unique,top,freq,count,unique,top,freq,count,unique,top,freq
modality,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Aromatherapy,24,1,natural therapeutic properties of essential oi...,24,24,1,"anxiety, stress, back pain, insomnia, headache...",24,24,1,"dehydration, asthma, allergies to fragrances, ...",24,24,1,can burn if directly on skin without carrier m...,24
Biofreeze Muscle Pain Relief Gel,24,1,This is a non-narcotic pain relief gel applied...,24,24,1,pain relief of muscles and tendons from workin...,24,24,1,"dehydration, fever, rashes, infection, some me...",24,24,1,can irritate the eyes if client touches the bo...,24
Cannabidiol (CBD) Massage Balm,24,1,can be combined with other massage modalities ...,24,24,1,helps with chronic pain associated with nerve ...,24,24,1,"dehydration, fever, rashes, infection, some me...",24,24,1,depends on the ingredients and amount of CBD i...,24
Cold Stone Therapy,24,1,This can be combined with other massage modali...,24,24,1,"helps relieve pain, calms nervous system, impr...",24,24,1,"dehydration, fever, rashes, infection, some me...",24,24,1,if placed directly on the skin it could slight...,24
Craniosacral Massage,24,1,massage while client face up with focus of mas...,24,24,1,"headaches, scoliosis, visual disturbances, aud...",24,24,1,"dehydration, seizures, psychological disorders...",24,24,1,can make client dizzy or slightly disoriented ...,24
Cupping Therapy,24,1,silicone cups are placed on the areas of the d...,24,24,1,"improved circulation, better sleep, pain relie...",24,24,1,"dehydration, fever, rashes, infection, some me...",24,24,1,can leave marks on the body that last up to t...,24
Deep tissue Massage,24,1,Swedish massage with a slow and controlled dee...,24,24,1,"relax client, reduce stress, calming, increas...",24,24,1,"dehydration, fever, rashes, infection, some me...",24,24,1,can cause muscle soreness for a few days from ...,24
Hot Stone Therapy Massage,24,1,hot stones that are not too hot for the massag...,24,24,1,"relax client, reduce stress, calming, increas...",24,24,1,"dehydration, fever, rashes, infection, some me...",24,24,1,"can burn if stones are too hot for the client,...",24
Instrument Assisted Soft Tissue Mobilization (IASTM) Friction Massage,24,1,"solid, curved, smooth, stainless steel or othe...",24,24,1,"trigger pain points, fibrosis, increased range...",24,24,1,"dehydration, wounds, directly over surgical in...",24,24,1,Can leave red marks along areas used that can ...,24
Lymphatic Drainage Massage,24,1,Massage focused on superficial muscles and lym...,24,24,1,"improves circulation,",24,24,1,"dehydration, fever, rashes, infection, some me...",24,24,1,client could get dizzy or feel nausious if he ...,24


In [7]:
def split_into_tokens(review):
    
    return TextBlob(review).words

In [8]:
import pandas as pd 
import matplotlib.pyplot as plt 
from textblob import TextBlob 
import sklearn 
import numpy as np 
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.metrics import classification_report, f1_score, accuracy_score, confusion_matrix 

import re
import string
import nltk 

np.random.seed(47) 

In [9]:
modes.benefits.head().apply(split_into_tokens)

0    [improves, tight, muscles, loosens, tight, mus...
1    [improved, circulation, relaxing, detoxes, bre...
2    [improved, circulation, better, sleep, pain, r...
3    [improves, healing, from, sports, or, work, re...
4                              [improves, circulation]
Name: benefits, dtype: object

In [10]:
modes.head()

Unnamed: 0,modality,Description,benefits,contraindications,sideEffects
0,Swedish Massage,"Traditional spa or clinic massage with hands, ...","improves tight muscles, loosens tight muscle f...","dehydration, fever, rashes, infection, some me...",if no contraindications for massage exist in c...
1,Shiatsu Massage,Accupressure or small and localized pressure i...,"improved circulation, relaxing, detoxes, break...","dehydration, fever, rashes, infection, some me...",can cause muscle soreness for a few days from ...
2,Prenatal Massage,swedish massage up to medium pressure or middl...,"improved circulation, better sleep, pain relie...","dehydration, Be a high risk pregnancy (history...",can make client dizzy or nausious if first mas...
3,Sports Massage,Massage that is focused on sports related disc...,improves healing from sports or work related m...,"dehydration, fever, rashes, infection, some me...",can cause muscle soreness for a few days from ...
4,Lymphatic Drainage Massage,Massage focused on superficial muscles and lym...,"improves circulation,","dehydration, fever, rashes, infection, some me...",client could get dizzy or feel nausious if he ...


In [11]:
stopwords = nltk.corpus.stopwords.words('english')
ps=nltk.PorterStemmer()
wn=nltk.WordNetLemmatizer()


In [12]:
def count_punct(text):
    count=sum([1 for char in text if char in string.punctuation])
    return round(count/(len(text)-text.count(" ")),3)*100

In [13]:
def clean_text(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+',text)
    text=" ".join([ps.stem(word) for word in tokens if word not in stopwords])#unlisted with N-grams vectorization
    return text

def lemmatize(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+', text)
    text=" ".join([wn.lemmatize(word) for word in tokens if word not in stopwords])#unlisted with N-grams vectorization
    #text=[wn.lemmatize(word) for word in tokens if word not in stopwords]#when using count Vectorization its a list
    #or else single letters returned.
    return text    

In [14]:
modes['Cleaned_text']=modes['benefits'].apply(lambda x: clean_text(x))
modes['Lemmatized']=modes['benefits'].apply(lambda x: lemmatize(x))
modes.head()

Unnamed: 0,modality,Description,benefits,contraindications,sideEffects,Cleaned_text,Lemmatized
0,Swedish Massage,"Traditional spa or clinic massage with hands, ...","improves tight muscles, loosens tight muscle f...","dehydration, fever, rashes, infection, some me...",if no contraindications for massage exist in c...,improv tight muscl loosen tight muscl fascia i...,improves tight muscle loosens tight muscle fas...
1,Shiatsu Massage,Accupressure or small and localized pressure i...,"improved circulation, relaxing, detoxes, break...","dehydration, fever, rashes, infection, some me...",can cause muscle soreness for a few days from ...,improv circul relax detox break apart adhes im...,improved circulation relaxing detox break apar...
2,Prenatal Massage,swedish massage up to medium pressure or middl...,"improved circulation, better sleep, pain relie...","dehydration, Be a high risk pregnancy (history...",can make client dizzy or nausious if first mas...,improv circul better sleep pain relief relax e...,improved circulation better sleep pain relief ...
3,Sports Massage,Massage that is focused on sports related disc...,improves healing from sports or work related m...,"dehydration, fever, rashes, infection, some me...",can cause muscle soreness for a few days from ...,improv heal sport work relat muscl tendon pain...,improves healing sport work related muscle ten...
4,Lymphatic Drainage Massage,Massage focused on superficial muscles and lym...,"improves circulation,","dehydration, fever, rashes, infection, some me...",client could get dizzy or feel nausious if he ...,improv circul,improves circulation


In [17]:
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(modes[['benefits','Description','contraindications','Cleaned_text','Lemmatized']],modes['modality'],test_size=0.15)

In [18]:
from sklearn.feature_extraction.text import CountVectorizer

In [23]:
n_gram_vect=CountVectorizer(ngram_range=(1,4))

In [24]:
type(X_train['Cleaned_text'])

pandas.core.series.Series

In [25]:
X_train['Lemmatized'].head()

388        muscle trauma muscle spasm pain trigger point
182    pain relief muscle tendon working muscular adh...
391    pain relief muscle tendon working muscular adh...
17     help chronic pain associated nerve pain arthri...
100    relaxing improves circulation improves sleep h...
Name: Lemmatized, dtype: object

In [26]:
n_gram_vect_fit=n_gram_vect.fit(X_train['Lemmatized'])


n_gram_train=n_gram_vect_fit.transform(X_train['Lemmatized'])
n_gram_test=n_gram_vect_fit.transform(X_test['Lemmatized'])

In [27]:
len(n_gram_vect_fit.get_feature_names())

905

In [28]:
print(n_gram_vect_fit.get_feature_names()[200:500])

['doctor approved', 'doctor approved high', 'doctor approved high risk', 'effect', 'effect calming', 'effect calming effect', 'effect calming effect improves', 'effect improves', 'effect improves range', 'effect improves range motion', 'effect soothing', 'effect soothing effect', 'effect soothing effect calming', 'emotional', 'emotional disorder', 'emotional disorder rheumatoid', 'emotional disorder rheumatoid arthritis', 'emotionally', 'emotionally upset', 'emotionally upset detox', 'emotionally upset detox recharge', 'energize', 'energize unwind', 'energize unwind invigorate', 'fascia', 'fascia adhesion', 'fascia adhesion increase', 'fascia adhesion increase healing', 'fascia improves', 'fascia improves circulation', 'fascia improves circulation improves', 'fibromyalgia', 'fibromyalgia myofascial', 'fibromyalgia myofascial pain', 'fibromyalgia myofascial pain visceral', 'fibromyalgia neuropathic', 'fibromyalgia neuropathic pain', 'fibrosis', 'fibrosis increased', 'fibrosis increased 

In [29]:
n_gram_train_df=pd.concat([X_train[['benefits','Description','contraindications','Cleaned_text','Lemmatized']].reset_index(drop=True),pd.DataFrame(n_gram_train.toarray())],axis=1)

n_gram_test_df=pd.concat([X_test[['benefits','Description','contraindications','Cleaned_text','Lemmatized']].reset_index(drop=True),pd.DataFrame(n_gram_test.toarray())],axis=1)

In [30]:
n_gram_train_df.head()

Unnamed: 0,benefits,Description,contraindications,Cleaned_text,Lemmatized,0,1,2,3,4,...,895,896,897,898,899,900,901,902,903,904
0,"muscle trauma, muscle spasms, pain trigger points",massage to release adhesions in muscle fascia ...,"dehydration, local site wounds, sores, sensiti...",muscl trauma muscl spasm pain trigger point,muscle trauma muscle spasm pain trigger point,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,pain relief of muscles and tendons from workin...,This is a non-narcotic pain relief gel applied...,"dehydration, fever, rashes, infection, some me...",pain relief muscl tendon work muscular adhes s...,pain relief muscle tendon working muscular adh...,0,0,0,0,0,...,1,1,1,0,0,0,0,0,0,0
2,pain relief of muscles and tendons from workin...,This is a non-narcotic pain relief gel applied...,"dehydration, fever, rashes, infection, some me...",pain relief muscl tendon work muscular adhes s...,pain relief muscle tendon working muscular adh...,0,0,0,0,0,...,1,1,1,0,0,0,0,0,0,0
3,helps with chronic pain associated with nerve ...,can be combined with other massage modalities ...,"dehydration, fever, rashes, infection, some me...",help chronic pain associ nerv pain arthriti st...,help chronic pain associated nerve pain arthri...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"relaxing, improves circulation, improves sleep...",Focus of this massage is to stimulate body fun...,"dehydration, fever, rashes, infection, some me...",relax improv circul improv sleep help pain dis...,relaxing improves circulation improves sleep h...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import precision_recall_fscore_support as score
import time

In [32]:
rf=RandomForestClassifier(n_estimators=150, max_depth=None, n_jobs=-1)
start=time.time()
rf_model=rf.fit(n_gram_train,y_train)
end=time.time()
fit_time=(end-start)
fit_time

0.4838418960571289

In [33]:
start=time.time()
y_pred=rf_model.predict(n_gram_test)
end=time.time()
pred_time=(end-start)
pred_time

0.26906824111938477

In [34]:

prd = pd.DataFrame(y_pred)
prd.columns=['Predicted']

prd.index=y_test.index
pred=pd.concat([pd.DataFrame(prd),y_test],axis=1)
print(pred)

                                             Predicted  \
39                                     Shiatsu Massage   
282                                 Cold Stone Therapy   
351                                Deep tissue Massage   
128                                    Cupping Therapy   
416                     Cannabidiol (CBD) Massage Balm   
..                                                 ...   
75   Instrument Assisted Soft Tissue Mobilization (...   
217                                 Myofascial Massage   
251                         Lymphatic Drainage Massage   
330                              Trigger Point Therapy   
132  Instrument Assisted Soft Tissue Mobilization (...   

                                              modality  
39                                     Shiatsu Massage  
282                                 Cold Stone Therapy  
351                                Deep tissue Massage  
128                                    Cupping Therapy  
416               

In [35]:
from sklearn.metrics import classification_report, f1_score, accuracy_score, confusion_matrix 

print('accuracy', accuracy_score(y_test, y_pred))

accuracy 0.9420289855072463


In [36]:
print('confusion matrix\n', confusion_matrix(y_test, y_pred))
print('(row=expected, col=predicted)')

confusion matrix
 [[6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2]]
(row=expected, col=predicted)


In [37]:
print(classification_report(y_test, y_pred))

                                                                       precision    recall  f1-score   support

                                                         Aromatherapy       1.00      1.00      1.00         6
                                     Biofreeze Muscle Pain Relief Gel       1.00      1.00      1.00         1
                                       Cannabidiol (CBD) Massage Balm       1.00      1.00      1.00         2
                                                   Cold Stone Therapy       1.00      1.00      1.00         4
                                                 Craniosacral Massage       1.00      1.00      1.00         5
                                                      Cupping Therapy       1.00      1.00      1.00         7
                                                  Deep tissue Massage       0.33      1.00      0.50         2
                                            Hot Stone Therapy Massage       0.00      0.00      0.00         4


  'precision', 'predicted', average, warn_for)


A note on the above, all the deep tissue classes were correctly predicted 
but none of the hot stone classes were identified correctly as they were all identified as deep tissue massage.
This is why the recall on deep tissue is 100% for correctly predicting 2/2 samples as deep tissue, but the 
precision is 33% because it identified six samples as deep tissue when only 1/3 or 2/6 were deep tissue.

Another note is that this data set has single values that were duplicated 24 times, 
but every description, benefits, contraindications, and sideEffects values are identical to every unique class. It is odd
that any of the predictions are incorrect, but hot stones must have more of the same keywords as deep tissue. 
We will run this on the contraindications next.

In [38]:
gb=GradientBoostingClassifier(n_estimators=150,max_depth=11)
start=time.time()
gb_model=gb.fit(n_gram_train,y_train)
end=time.time()
fit_time=(end-start)
fit_time

6.874598979949951

In [39]:
start=time.time()
y_pred=gb_model.predict(n_gram_test)
end=time.time()
pred_time=(end-start)
pred_time

0.0070035457611083984

In [40]:
prd = pd.DataFrame(y_pred)
prd.columns=['Predicted']

prd.index=y_test.index
pred=pd.concat([pd.DataFrame(prd),y_test],axis=1)
print(pred)

                                             Predicted  \
39                                     Shiatsu Massage   
282                                 Cold Stone Therapy   
351                                Deep tissue Massage   
128                                    Cupping Therapy   
416                     Cannabidiol (CBD) Massage Balm   
..                                                 ...   
75   Instrument Assisted Soft Tissue Mobilization (...   
217                                 Myofascial Massage   
251                         Lymphatic Drainage Massage   
330                              Trigger Point Therapy   
132  Instrument Assisted Soft Tissue Mobilization (...   

                                              modality  
39                                     Shiatsu Massage  
282                                 Cold Stone Therapy  
351                                Deep tissue Massage  
128                                    Cupping Therapy  
416               

In [41]:
from sklearn.metrics import classification_report, f1_score, accuracy_score, confusion_matrix 

print('accuracy', accuracy_score(y_test, y_pred))
print('confusion matrix\n', confusion_matrix(y_test, y_pred))
print('(row=expected, col=predicted)')
print(classification_report(y_test, y_pred))

accuracy 0.9420289855072463
confusion matrix
 [[6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2]]
(row=expected, col=predicted)
                                                                       precision    recall  f1-score   support

                                

  'precision', 'predicted', average, warn_for)


The same exact recall and precision occurs with gradient boosted trees as with random forests for the deep tissue massage
and the hot stone massage.

In [61]:
modes['modality'][9:11]

9           Deep tissue Massage
10    Hot Stone Therapy Massage
Name: modality, dtype: object

In [67]:
pd.set_option('max_colwidth', 200) 
m = pd.DataFrame(modes[['Lemmatized','modality']][9:11])

m[0:2]

Unnamed: 0,Lemmatized,modality
9,relax client reduce stress calming increase healing improve circulation break adhesion improve range motion improve sleep,Deep tissue Massage
10,relax client reduce stress calming increase healing improve circulation break adhesion improve range motion improve sleep,Hot Stone Therapy Massage


In [42]:
def clean_text(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+',text)
    text=" ".join([ps.stem(word) for word in tokens if word not in stopwords])
    return text

def predict_ngramRFC_clean(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['clean']=nr['newReview'].apply(lambda x: clean_text(x))

    rf=RandomForestClassifier(n_estimators=150,max_depth=None, n_jobs=-1)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Cleaned_text'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Cleaned_text'])
    n_gram_test=n_gram_vect_fit.transform(nr['clean'])
    
    model = rf.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service']
    pred.index= ['stemmed_1ngram4_RFC_80-20:']
    print('\n\n',pred)
    

In [43]:
def lemmatize(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+', text)
    text=" ".join([wn.lemmatize(word) for word in tokens if word not in stopwords])
    return text    
    
def predict_ngramRFC_lemma(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['lemma']=nr['newReview'].apply(lambda x: lemmatize(x))

    rf=RandomForestClassifier(n_estimators=150,max_depth=None, n_jobs=-1)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Lemmatized'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Lemmatized'])
    n_gram_test=n_gram_vect_fit.transform(nr['lemma'])
    
    model = rf.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service:']
    pred.index= ['lemmatized_1ngram4RFC_80-20:']
    print('\n\n',pred)

In [44]:
def clean_text(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+',text)
    text=" ".join([ps.stem(word) for word in tokens if word not in stopwords])
    return text

def predict_ngramGBC_clean(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['clean']=nr['newReview'].apply(lambda x: clean_text(x))

    gb=GradientBoostingClassifier(n_estimators=150,max_depth=11)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Cleaned_text'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Cleaned_text'])
    n_gram_test=n_gram_vect_fit.transform(nr['clean'])
    
    model = gb.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service']
    pred.index= ['stemmed_1ngram4_GBC_80-20:']
    print('\n\n',pred)
    

In [45]:
def lemmatize(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+', text)
    text=" ".join([wn.lemmatize(word) for word in tokens if word not in stopwords])
    return text    
    
def predict_ngramGBC_lemma(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['lemma']=nr['newReview'].apply(lambda x: lemmatize(x))

    gb=GradientBoostingClassifier(n_estimators=150,max_depth=11)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Lemmatized'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Lemmatized'])
    n_gram_test=n_gram_vect_fit.transform(nr['lemma'])
    
    model = gb.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service:']
    pred.index= ['lemmatized_1ngram4GBC_80-20:']
    print('\n\n',pred)

In [46]:
predict_ngramGBC_clean('I need a massage that relaxes, improves my sleep, relieves my aches, and increases my immunity') 



                            Recommended Healthcare Service
stemmed_1ngram4_GBC_80-20:     Lymphatic Drainage Massage


In [47]:
predict_ngramGBC_lemma('I need a massage that relaxes, improves my sleep, relieves my aches, and increases my immunity ')



                              Recommended Healthcare Service:
lemmatized_1ngram4GBC_80-20:      Lymphatic Drainage Massage


In [48]:
predict_ngramRFC_lemma('I need a massage that relaxes, improves my sleep, relieves my aches, and increases my immunity')



                              Recommended Healthcare Service:
lemmatized_1ngram4RFC_80-20:      Lymphatic Drainage Massage


In [49]:
predict_ngramRFC_clean('I need a massage that relaxes, improves my sleep, relieves my aches, and increases my immunity') 



                            Recommended Healthcare Service
stemmed_1ngram4_RFC_80-20:     Lymphatic Drainage Massage


In [73]:
modes['Cleaned_text']=modes['contraindications'].apply(lambda x: clean_text(x))
modes['Lemmatized']=modes['contraindications'].apply(lambda x: lemmatize(x))
modes.head()

Unnamed: 0,modality,Description,benefits,contraindications,sideEffects,Cleaned_text,Lemmatized
0,Swedish Massage,"Traditional spa or clinic massage with hands, palms, elbows, forearms of massage therapist used with glides and varying pressure applied along the muscle fibers of the body with varying amounts of...","improves tight muscles, loosens tight muscle fascia, improves circulation, improves relaxation, improves immunity, improves sleep, improves range of motion","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, aneurism history,pyschosis, numbness in limbs, limb numbness, nerve tingling, nerve pain, difficulty breathing, rag...","if no contraindications for massage exist in client, can make client tired",dehydr fever rash infect mental disord nausea epilepsi aneur historypyschosi numb limb limb numb nerv tingl nerv pain difficulti breath rag breath troubl breath,dehydration fever rash infection mental disorder nausea epilepsy aneurism historypyschosis numbness limb limb numbness nerve tingling nerve pain difficulty breathing ragged breathing trouble breat...
1,Shiatsu Massage,"Accupressure or small and localized pressure is applied with knuckles, thumbs, or tools to deep muscle layers along the spine and limbs or meridians of the body to promote healing of muscle aches,...","improved circulation, relaxing, detoxes, breaks apart adhesions, improves sleep, improves healing","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, aneurism history, cancer,pyschosis, numbness in limbs, limb numbness, nerve tingling, nerve pain, difficulty breath...",can cause muscle soreness for a few days from breaking apart adhesions and pressure used,dehydr fever rash infect mental disord nausea epilepsi aneur histori cancerpyschosi numb limb limb numb nerv tingl nerv pain difficulti breath rag breath troubl breath,dehydration fever rash infection mental disorder nausea epilepsy aneurism history cancerpyschosis numbness limb limb numbness nerve tingling nerve pain difficulty breathing ragged breathing troubl...
2,Prenatal Massage,"swedish massage up to medium pressure or middle muscle layers to help relax client going through body changes due to pregnancy and associated aches in the feet, low back, and upper back, avoids th...","improved circulation, better sleep, pain relief, relaxing effect, soothing effect, calming effect, improves range of motion, helps with congestion, helps detox, helps clean old bruises, modified m...","dehydration, Be a high risk pregnancy (history of painful menstruation, history of uterine diseases such as fibroids, cysts, endometriosis, history of miscarriages, over the age of 35 and first ch...",can make client dizzy or nausious if first massage and not familiar with massage or early stages of pregnancy and a first time pregnancy but not a high risk pregnancy,dehydr high risk pregnanc histori pain menstruat histori uterin diseas fibroid cyst endometriosi histori miscarriag age 35 first child diabet blood disord heart disord fever rash infect mental dis...,dehydration high risk pregnancy history painful menstruation history uterine disease fibroid cyst endometriosis history miscarriage age 35 first child diabetes blood disorder heart disorder fever ...
3,Sports Massage,"Massage that is focused on sports related discomforts around tendons and tight muscles to improve muscle recovery or improve muscle performance, improve range of motion, can include stretching and...","improves healing from sports or work related muscle and tendon pains and discomforts, relaxing, improves range of motion, improves flexibility, improves immunity, improves sleep, improves workouts...","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, pregnant, aneurism history, cancer,pyschosis, numbness in limbs, limb numbness, nerve tingling, nerve pain, difficu...",can cause muscle soreness for a few days from breaking apart adhesions and pressure used,dehydr fever rash infect mental disord nausea epilepsi pregnant aneur histori cancerpyschosi numb limb limb numb nerv tingl nerv pain difficulti breath rag breath troubl breath,dehydration fever rash infection mental disorder nausea epilepsy pregnant aneurism history cancerpyschosis numbness limb limb numbness nerve tingling nerve pain difficulty breathing ragged breathi...
4,Lymphatic Drainage Massage,"Massage focused on superficial muscles and lymphatic system of the body to drain edema or excess fluid retention from the limbs to the torso or heart to be eliminated, it is rhythmic and slower to...","improves circulation,","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, pregnant, on wound, pregnant, heart condition, aneurism history,pyschosis, numbness in limbs, limb numbness, nerve ...","client could get dizzy or feel nausious if he or she has an underlying health condition such as a circulatory problem, blood disorder, blockage somewhere in the body",dehydr fever rash infect mental disord nausea epilepsi pregnant wound pregnant heart condit aneur historypyschosi numb limb limb numb nerv tingl nerv pain difficulti breath rag breath troubl breath,dehydration fever rash infection mental disorder nausea epilepsy pregnant wound pregnant heart condition aneurism historypyschosis numbness limb limb numbness nerve tingling nerve pain difficulty ...


In [74]:
n_gram_vect_fit=n_gram_vect.fit(X_train['Lemmatized'])


n_gram_train=n_gram_vect_fit.transform(X_train['Lemmatized'])
n_gram_test=n_gram_vect_fit.transform(X_test['Lemmatized'])

In [75]:
len(n_gram_vect_fit.get_feature_names())

905

In [76]:
print(n_gram_vect_fit.get_feature_names()[200:500])

['doctor approved', 'doctor approved high', 'doctor approved high risk', 'effect', 'effect calming', 'effect calming effect', 'effect calming effect improves', 'effect improves', 'effect improves range', 'effect improves range motion', 'effect soothing', 'effect soothing effect', 'effect soothing effect calming', 'emotional', 'emotional disorder', 'emotional disorder rheumatoid', 'emotional disorder rheumatoid arthritis', 'emotionally', 'emotionally upset', 'emotionally upset detox', 'emotionally upset detox recharge', 'energize', 'energize unwind', 'energize unwind invigorate', 'fascia', 'fascia adhesion', 'fascia adhesion increase', 'fascia adhesion increase healing', 'fascia improves', 'fascia improves circulation', 'fascia improves circulation improves', 'fibromyalgia', 'fibromyalgia myofascial', 'fibromyalgia myofascial pain', 'fibromyalgia myofascial pain visceral', 'fibromyalgia neuropathic', 'fibromyalgia neuropathic pain', 'fibrosis', 'fibrosis increased', 'fibrosis increased 

In [77]:
n_gram_train_df=pd.concat([X_train[['benefits','Description','contraindications','Cleaned_text','Lemmatized']].reset_index(drop=True),pd.DataFrame(n_gram_train.toarray())],axis=1)

n_gram_test_df=pd.concat([X_test[['benefits','Description','contraindications','Cleaned_text','Lemmatized']].reset_index(drop=True),pd.DataFrame(n_gram_test.toarray())],axis=1)

In [78]:
n_gram_train_df.head()

Unnamed: 0,benefits,Description,contraindications,Cleaned_text,Lemmatized,0,1,2,3,4,...,895,896,897,898,899,900,901,902,903,904
0,"muscle trauma, muscle spasms, pain trigger points","massage to release adhesions in muscle fascia to improve range of motion and restore muscle length, cross fiber friction of muscles and tendons to break apart adhesions at superficial, middle, and...","dehydration, local site wounds, sores, sensitive skin, thin skin, cuts, burns, rash, fever, rashes, infection, some mental disorders, nausea, epilepsy, pregnant, aneurism history, cancer,pyschosis...",muscl trauma muscl spasm pain trigger point,muscle trauma muscle spasm pain trigger point,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,pain relief of muscles and tendons from working out or muscular adhesions from stress,"This is a non-narcotic pain relief gel applied to the body that is cooling to the touch and designed to calm the pain radiating from aching muscles and tendons, usually used with a hot towel prior...","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, allergies, sensitivity to pain, high blood pressure, low blood pressure, pregnant, aneurism history, cancer, sensiti...",pain relief muscl tendon work muscular adhes stress,pain relief muscle tendon working muscular adhesion stress,0,0,0,0,0,...,1,1,1,0,0,0,0,0,0,0
2,pain relief of muscles and tendons from working out or muscular adhesions from stress,"This is a non-narcotic pain relief gel applied to the body that is cooling to the touch and designed to calm the pain radiating from aching muscles and tendons, usually used with a hot towel prior...","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, allergies, sensitivity to pain, high blood pressure, low blood pressure, pregnant, aneurism history, cancer, sensiti...",pain relief muscl tendon work muscular adhes stress,pain relief muscle tendon working muscular adhesion stress,0,0,0,0,0,...,1,1,1,0,0,0,0,0,0,0
3,"helps with chronic pain associated with nerve pain, arthritis, or stress","can be combined with other massage modalities like hot stone therapy, cold stone therapy, aromatheray, CBD or cannabidiol products, cupping therapy, stretching, massage gun therapy, biofreeze, cra...","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, pregnant, aneurism history, cancer, autoimmune disease,pyschosis, numbness in limbs, limb numbness, nerve tingling,...",help chronic pain associ nerv pain arthriti stress,help chronic pain associated nerve pain arthritis stress,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"relaxing, improves circulation, improves sleep, helps with pain and discomfort, increases immunity, recommended for people with pain sensitivity or people who cannot be touched because it causes d...","Focus of this massage is to stimulate body functions and improve health of clients who find it more relaxing to have their scalp, hands, and/or feet massaged where these areas of the body have ali...","dehydration, fever, rashes, infection, some mental disorders, nausea, epilepsy, ,aneurism history,pyschosis, numbness in limbs, limb numbness, nerve tingling, nerve pain, difficulty breathing, ra...",relax improv circul improv sleep help pain discomfort increas immun recommend peopl pain sensit peopl cannot touch caus discomfort tickl itch hurt like case fibromyalgia neuropath pain,relaxing improves circulation improves sleep help pain discomfort increase immunity recommended people pain sensitivity people cannot touched cause discomfort tickling itching hurting like case fi...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [79]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import precision_recall_fscore_support as score
import time

In [80]:
rf=RandomForestClassifier(n_estimators=150, max_depth=None, n_jobs=-1)
start=time.time()
rf_model=rf.fit(n_gram_train,y_train)
end=time.time()
fit_time=(end-start)
fit_time

0.8330368995666504

In [81]:
start=time.time()
y_pred=rf_model.predict(n_gram_test)
end=time.time()
pred_time=(end-start)
pred_time

0.2371077537536621

In [88]:

prd = pd.DataFrame(y_pred)
prd.columns=['Predicted']

prd.index=y_test.index
pred=pd.concat([pd.DataFrame(prd),pd.DataFrame(y_test)],axis=1)
pd.set_option('max_colwidth', 20)
print(pred)

               Predicted             modality
39       Shiatsu Massage      Shiatsu Massage
282   Cold Stone Therapy   Cold Stone Therapy
351  Deep tissue Massage  Deep tissue Massage
128      Cupping Therapy      Cupping Therapy
416  Cannabidiol (CBD...  Cannabidiol (CBD...
..                   ...                  ...
75   Instrument Assis...  Instrument Assis...
217   Myofascial Massage   Myofascial Massage
251  Lymphatic Draina...  Lymphatic Draina...
330  Trigger Point Th...  Trigger Point Th...
132  Instrument Assis...  Instrument Assis...

[69 rows x 2 columns]


In [89]:
from sklearn.metrics import classification_report, f1_score, accuracy_score, confusion_matrix 

print('accuracy', accuracy_score(y_test, y_pred))

accuracy 0.9420289855072463


In [90]:
print('confusion matrix\n', confusion_matrix(y_test, y_pred))
print('(row=expected, col=predicted)')

confusion matrix
 [[6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2]]
(row=expected, col=predicted)


In [91]:
print(classification_report(y_test, y_pred))

                                                                       precision    recall  f1-score   support

                                                         Aromatherapy       1.00      1.00      1.00         6
                                     Biofreeze Muscle Pain Relief Gel       1.00      1.00      1.00         1
                                       Cannabidiol (CBD) Massage Balm       1.00      1.00      1.00         2
                                                   Cold Stone Therapy       1.00      1.00      1.00         4
                                                 Craniosacral Massage       1.00      1.00      1.00         5
                                                      Cupping Therapy       1.00      1.00      1.00         7
                                                  Deep tissue Massage       0.33      1.00      0.50         2
                                            Hot Stone Therapy Massage       0.00      0.00      0.00         4


  'precision', 'predicted', average, warn_for)


Note that even using the contraindications that the deep tissue massage and hot stone massage have the same misclassifications
of 100% recall for deep tissue, but none of the hot stone massage classes getting categorized, and all of those classes
getting classified as deep tissue massage.

In [93]:
gb=GradientBoostingClassifier(n_estimators=150,max_depth=11)
start=time.time()
gb_model=gb.fit(n_gram_train,y_train)
end=time.time()
fit_time=(end-start)
fit_time

6.578742265701294

In [94]:
start=time.time()
y_pred=gb_model.predict(n_gram_test)
end=time.time()
pred_time=(end-start)
pred_time

0.009007930755615234

In [95]:
prd = pd.DataFrame(y_pred)
prd.columns=['Predicted']

prd.index=y_test.index
pred=pd.concat([pd.DataFrame(prd),y_test],axis=1)
print(pred)

               Predicted             modality
39       Shiatsu Massage      Shiatsu Massage
282   Cold Stone Therapy   Cold Stone Therapy
351  Deep tissue Massage  Deep tissue Massage
128      Cupping Therapy      Cupping Therapy
416  Cannabidiol (CBD...  Cannabidiol (CBD...
..                   ...                  ...
75   Instrument Assis...  Instrument Assis...
217   Myofascial Massage   Myofascial Massage
251  Lymphatic Draina...  Lymphatic Draina...
330  Trigger Point Th...  Trigger Point Th...
132  Instrument Assis...  Instrument Assis...

[69 rows x 2 columns]


In [96]:
from sklearn.metrics import classification_report, f1_score, accuracy_score, confusion_matrix 

print('accuracy', accuracy_score(y_test, y_pred))
print('confusion matrix\n', confusion_matrix(y_test, y_pred))
print('(row=expected, col=predicted)')
print(classification_report(y_test, y_pred))

accuracy 0.9420289855072463
confusion matrix
 [[6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2]]
(row=expected, col=predicted)
                                                                       precision    recall  f1-score   support

                                

  'precision', 'predicted', average, warn_for)


In [97]:
pd.set_option('max_colwidth', 200) 
m = pd.DataFrame(modes[['Lemmatized','modality']][9:11])

m[0:2]

Unnamed: 0,Lemmatized,modality
9,dehydration fever rash infection mental disorder nausea epilepsy arthritis neuropathic pain pregnant aneurism history cancer osteoporosispyschosis numbness limb limb numbness nerve tingling nerve ...,Deep tissue Massage
10,dehydration fever rash infection mental disorder nausea epilepsy pregnant aneurism history cancer heat sensitive sensitivity heatpyschosis numbness limb limb numbness nerve tingling nerve pain dif...,Hot Stone Therapy Massage


In [98]:
def clean_text(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+',text)
    text=" ".join([ps.stem(word) for word in tokens if word not in stopwords])
    return text

def predict_ngramRFC_clean(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['clean']=nr['newReview'].apply(lambda x: clean_text(x))

    rf=RandomForestClassifier(n_estimators=150,max_depth=None, n_jobs=-1)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Cleaned_text'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Cleaned_text'])
    n_gram_test=n_gram_vect_fit.transform(nr['clean'])
    
    model = rf.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service']
    pred.index= ['stemmed_1ngram4_RFC_80-20:']
    print('\n\n',pred)
    

In [99]:
def lemmatize(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+', text)
    text=" ".join([wn.lemmatize(word) for word in tokens if word not in stopwords])
    return text    
    
def predict_ngramRFC_lemma(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['lemma']=nr['newReview'].apply(lambda x: lemmatize(x))

    rf=RandomForestClassifier(n_estimators=150,max_depth=None, n_jobs=-1)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Lemmatized'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Lemmatized'])
    n_gram_test=n_gram_vect_fit.transform(nr['lemma'])
    
    model = rf.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service:']
    pred.index= ['lemmatized_1ngram4RFC_80-20:']
    print('\n\n',pred)

In [100]:
def clean_text(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+',text)
    text=" ".join([ps.stem(word) for word in tokens if word not in stopwords])
    return text

def predict_ngramGBC_clean(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['clean']=nr['newReview'].apply(lambda x: clean_text(x))

    gb=GradientBoostingClassifier(n_estimators=150,max_depth=11)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Cleaned_text'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Cleaned_text'])
    n_gram_test=n_gram_vect_fit.transform(nr['clean'])
    
    model = gb.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service']
    pred.index= ['stemmed_1ngram4_GBC_80-20:']
    print('\n\n',pred)
    

In [101]:
def lemmatize(text):
    text="".join([word.lower() for word in text if word not in string.punctuation])
    tokens=re.split('\W+', text)
    text=" ".join([wn.lemmatize(word) for word in tokens if word not in stopwords])
    return text    
    
def predict_ngramGBC_lemma(new_review): 
    nr=pd.DataFrame([new_review])
    nr.columns=['newReview']
    nr['lemma']=nr['newReview'].apply(lambda x: lemmatize(x))

    gb=GradientBoostingClassifier(n_estimators=150,max_depth=11)
    n_gram_vect=CountVectorizer(ngram_range=(1,4))
    
    n_gram_vect_fit=n_gram_vect.fit(X_train['Lemmatized'])
    n_gram_train=n_gram_vect_fit.transform(X_train['Lemmatized'])
    n_gram_test=n_gram_vect_fit.transform(nr['lemma'])
    
    model = gb.fit(n_gram_train,y_train)
    pred=pd.DataFrame(model.predict(n_gram_test))
    pred.columns=['Recommended Healthcare Service:']
    pred.index= ['lemmatized_1ngram4GBC_80-20:']
    print('\n\n',pred)

In [106]:
X_train['Lemmatized']

388                                                                                                                                                              muscle trauma muscle spasm pain trigger point
182                                                                                                                                                 pain relief muscle tendon working muscular adhesion stress
391                                                                                                                                                 pain relief muscle tendon working muscular adhesion stress
17                                                                                                                                                    help chronic pain associated nerve pain arthritis stress
100    relaxing improves circulation improves sleep help pain discomfort increase immunity recommended people pain sensitivity people cannot touched cause discomfort tickli

In [102]:
predict_ngramGBC_clean('I need a massage, have allergies, sensitive to pain, ticklish feet, and pregnant') 



                            Recommended Healthcare Service
stemmed_1ngram4_GBC_80-20:     Lymphatic Drainage Massage


In [107]:
predict_ngramGBC_lemma('I need a massage, have allergies, sensitive to pain, ticklish feet, and pregnant ')



                              Recommended Healthcare Service:
lemmatized_1ngram4GBC_80-20:                Prenatal Massage


In [109]:
predict_ngramRFC_lemma('I need a massage, have allergies, sensitive to pain, ticklish feet, and pregnant')



                              Recommended Healthcare Service:
lemmatized_1ngram4RFC_80-20:      Lymphatic Drainage Massage


In [110]:
predict_ngramRFC_clean('I need a massage, have allergies, sensitive to pain, ticklish feet, and pregnant') 



                            Recommended Healthcare Service
stemmed_1ngram4_RFC_80-20:     Lymphatic Drainage Massage


It is clear that this recommendation for a massage not recommended for a user based on their 
intake of health conditions is not a good system to suggest massages not to get. This type of recommender
based on facts or claims of benefits, contraindications, and side effects of specifica massage modalities
would be better suited for a program that doesn't use machine learning but lists and if else statements to 
suggest a massage modality after eliminating massage modalities they cannot have.I need a massage, have allergies, sensitive to pain, ticklish feet, and pregnant

Because the gradient boosted trees classifier with ngrams of 1-4 adjacent words as lemmatized tokens 
of the contraindications for massage, suggested not to get a prenatal massage, when the user is pregnant.