# Problem Statement:
Build a Drug Recommendation System that recommends the most effective drug for the given condition based on the reviews of various drugs used for that condition.<br>

Our task is to classify the reviews to Positive or Negative based on the text analysis, then a Recommendation score need to be calculated for each drug for recommending the best effective drug. Hence it is a Binary Classification problem.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
from wordcloud import WordCloud
from wordcloud import STOPWORDS
import nltk
import regex as re
#nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer

### EDA
Exploratory Data Analysis. Getting to know the data.

In [2]:
train_data = pd.read_csv("../archive/drugsComTrain_raw.csv")
test_data = pd.read_csv("../archive/drugsComTest_raw.csv")
print("train_data size: ",train_data.shape)
print("test_data size: ",test_data.shape)

train_data size:  (161297, 7)
test_data size:  (53766, 7)


Merging the given datasets! <br>
Data will be changed, hence merging it. Later we can split the data into Train - Val - Test

In [3]:
data = pd.concat([train_data,test_data])
data.reset_index(inplace=True,drop=True)
print(data.shape)
data.head()

(215063, 7)


Unnamed: 0,uniqueID,drugName,condition,review,rating,date,usefulCount
0,206461,Valsartan,Left Ventricular Dysfunction,"""It has no side effect, I take it in combinati...",9,20-May-12,27
1,95260,Guanfacine,ADHD,"""My son is halfway through his fourth week of ...",8,27-Apr-10,192
2,92703,Lybrel,Birth Control,"""I used to take another oral contraceptive, wh...",5,14-Dec-09,17
3,138000,Ortho Evra,Birth Control,"""This is my first time using any form of birth...",8,3-Nov-15,10
4,35696,Buprenorphine / naloxone,Opiate Dependence,"""Suboxone has completely turned my life around...",9,27-Nov-16,37


In [4]:
data.dtypes

uniqueID        int64
drugName       object
condition      object
review         object
rating          int64
date           object
usefulCount     int64
dtype: object

Checking n Dropping NULL Values

In [5]:
data.isnull().any()

uniqueID       False
drugName       False
condition       True
review         False
rating         False
date           False
usefulCount    False
dtype: bool

In [6]:
#checking for the number of null values and percentage in given dataset
null_size = data.isnull().sum()['condition']
null_size

1194

In [7]:
datasize = len(data)
null_per = (null_size/datasize)*100
null_per

0.5551861547546533

In [8]:
print('Size of the dataset before dropping null values:',data.shape)
data = data.dropna(axis=0)
print('Size of the dataset after dropping null values:',data.shape)

Size of the dataset before dropping null values: (215063, 7)
Size of the dataset after dropping null values: (213869, 7)


Analyzing Disese Conditions

In [9]:
conditions = dict(data['condition'].value_counts())
data.condition.unique().shape

(916,)

In [10]:
conditions

{'Birth Control': 38436,
 'Depression': 12164,
 'Pain': 8245,
 'Anxiety': 7812,
 'Acne': 7435,
 'Bipolar Disorde': 5604,
 'Insomnia': 4904,
 'Weight Loss': 4857,
 'Obesity': 4757,
 'ADHD': 4509,
 'Diabetes, Type 2': 3362,
 'Emergency Contraception': 3290,
 'High Blood Pressure': 3104,
 'Vaginal Yeast Infection': 3085,
 'Abnormal Uterine Bleeding': 2744,
 'Bowel Preparation': 2498,
 'Smoking Cessation': 2440,
 'ibromyalgia': 2370,
 'Migraine': 2277,
 'Anxiety and Stress': 2236,
 'Major Depressive Disorde': 2131,
 'Constipation': 2120,
 'Chronic Pain': 1940,
 'Panic Disorde': 1932,
 'Migraine Prevention': 1867,
 'Urinary Tract Infection': 1747,
 'Muscle Spasm': 1631,
 'Osteoarthritis': 1626,
 'Generalized Anxiety Disorde': 1542,
 'Opiate Dependence': 1477,
 'Erectile Dysfunction': 1467,
 'Irritable Bowel Syndrome': 1339,
 'Allergic Rhinitis': 1323,
 'Rheumatoid Arthritis': 1315,
 'Bacterial Infection': 1252,
 'Cough': 1224,
 'Sinusitis': 1124,
 'Nausea/Vomiting': 1013,
 'GERD': 968,
 'Ov

After analyzing the above conditions, there are some conditions given as '0<//span> users found this comment helpful.' which is not the correct condition.<br>
Also there is condition named as 'Not Listed / Othe'<br>
We can remove the rows with these conditions as we cannot recommend any drug here.

In [11]:
# Top 15 Conditions
top_con = list(conditions.keys())[0:15]
top_con

['Birth Control',
 'Depression',
 'Pain',
 'Anxiety',
 'Acne',
 'Bipolar Disorde',
 'Insomnia',
 'Weight Loss',
 'Obesity',
 'ADHD',
 'Diabetes, Type 2',
 'Emergency Contraception',
 'High Blood Pressure',
 'Vaginal Yeast Infection',
 'Abnormal Uterine Bleeding']

In [12]:
# Removing the undesirable string from data
s = "</span> users found this comment helpful."
data = data[~data['condition'].str.contains(s)]
conditions = dict(data['condition'].value_counts())

In [13]:
# For each unique condition how many unique drugs are given to patients
con_nunique_drugs = []
for con in list(conditions.keys()):
	# Data where condition in Data is given condition, getting number of unique drugnames.
	con_nunique_drugs.append(data[data['condition'] == con]['drugName'].nunique())
con_nunique_drugs

[181,
 115,
 219,
 81,
 127,
 82,
 85,
 22,
 43,
 58,
 97,
 12,
 146,
 26,
 77,
 29,
 17,
 46,
 60,
 19,
 52,
 44,
 53,
 40,
 51,
 53,
 32,
 84,
 18,
 9,
 16,
 52,
 95,
 107,
 46,
 43,
 42,
 40,
 45,
 29,
 12,
 40,
 29,
 46,
 36,
 43,
 24,
 61,
 39,
 18,
 33,
 64,
 22,
 47,
 18,
 30,
 38,
 12,
 56,
 25,
 43,
 253,
 13,
 18,
 29,
 29,
 13,
 8,
 9,
 32,
 24,
 15,
 30,
 7,
 30,
 47,
 21,
 13,
 30,
 20,
 18,
 25,
 42,
 37,
 5,
 22,
 38,
 5,
 27,
 23,
 17,
 19,
 37,
 23,
 31,
 6,
 29,
 15,
 29,
 14,
 28,
 18,
 1,
 19,
 21,
 13,
 16,
 27,
 10,
 27,
 17,
 10,
 17,
 16,
 37,
 42,
 38,
 14,
 26,
 7,
 28,
 22,
 9,
 20,
 30,
 9,
 15,
 13,
 20,
 37,
 7,
 25,
 14,
 35,
 20,
 26,
 23,
 2,
 15,
 42,
 13,
 20,
 5,
 5,
 15,
 20,
 27,
 12,
 11,
 20,
 24,
 45,
 18,
 9,
 8,
 9,
 35,
 18,
 3,
 16,
 11,
 26,
 7,
 18,
 4,
 10,
 10,
 10,
 17,
 15,
 31,
 9,
 16,
 11,
 12,
 3,
 7,
 7,
 5,
 12,
 25,
 8,
 3,
 20,
 3,
 13,
 18,
 11,
 11,
 15,
 4,
 9,
 16,
 20,
 21,
 14,
 19,
 3,
 17,
 17,
 4,
 7,
 8,
 8,
 9,
 4,
 

In [14]:
# A Dict containing conditions and no.of unique drugs given to patients.
drug_cond = dict(zip(list(conditions.keys()),con_nunique_drugs))
drug_cond

{'Birth Control': 181,
 'Depression': 115,
 'Pain': 219,
 'Anxiety': 81,
 'Acne': 127,
 'Bipolar Disorde': 82,
 'Insomnia': 85,
 'Weight Loss': 22,
 'Obesity': 43,
 'ADHD': 58,
 'Diabetes, Type 2': 97,
 'Emergency Contraception': 12,
 'High Blood Pressure': 146,
 'Vaginal Yeast Infection': 26,
 'Abnormal Uterine Bleeding': 77,
 'Bowel Preparation': 29,
 'Smoking Cessation': 17,
 'ibromyalgia': 46,
 'Migraine': 60,
 'Anxiety and Stress': 19,
 'Major Depressive Disorde': 52,
 'Constipation': 44,
 'Chronic Pain': 53,
 'Panic Disorde': 40,
 'Migraine Prevention': 51,
 'Urinary Tract Infection': 53,
 'Muscle Spasm': 32,
 'Osteoarthritis': 84,
 'Generalized Anxiety Disorde': 18,
 'Opiate Dependence': 9,
 'Erectile Dysfunction': 16,
 'Irritable Bowel Syndrome': 52,
 'Allergic Rhinitis': 95,
 'Rheumatoid Arthritis': 107,
 'Bacterial Infection': 46,
 'Cough': 43,
 'Sinusitis': 42,
 'Nausea/Vomiting': 40,
 'GERD': 45,
 'Overactive Bladde': 29,
 'Hyperhidrosis': 12,
 'Multiple Sclerosis': 40,
 'H

In [16]:
cond_drug = {}
for cond in list(conditions.keys()):
	temp = data[data['condition'] == cond]['drugName'].value_counts()
	temp = temp.reset_index()
	li=[]
	l=len(temp.drugName)
	print(l)
	# break
	n =  3 if l>3 else l
	print(n)
	for i in range(n):
		print(temp.drugName[i])
		li.append(temp.drugName[i])
	cond_drug[cond] = li

cond_drug

181
3
Etonogestrel
Ethinyl estradiol / norethindrone
Levonorgestrel
115
3
Bupropion
Sertraline
Venlafaxine
219
3
Tramadol
Acetaminophen / hydrocodone
Oxycodone
81
3
Escitalopram
Alprazolam
Buspirone
127
3
Isotretinoin
Adapalene / benzoyl peroxide
Doxycycline
82
3
Lamotrigine
Quetiapine
Lamictal
85
3
Zolpidem
Trazodone
Ambien
22
3
Phentermine
Lorcaserin
Belviq
43
3
Bupropion / naltrexone
Contrave
Liraglutide
58
3
Lisdexamfetamine
Vyvanse
Methylphenidate
97
3
Liraglutide
Victoza
Dulaglutide
12
3
Levonorgestrel
Plan B
Plan B One-Step
146
3
Lisinopril
Losartan
Amlodipine
26
3
Miconazole
Tioconazole
Fluconazole
77
3
Medroxyprogesterone
Depo-Provera
Levonorgestrel
29
3
Magnesium sulfate / potassium sulfate / sodium sulfate
Suprep Bowel Prep Kit
Polyethylene glycol 3350 with electrolytes
17
3
Varenicline
Chantix
Bupropion
46
3
Milnacipran
Savella
Pregabalin
60
3
Sumatriptan
Rizatriptan
Imitrex
19
3
Citalopram
Celexa
Fluoxetine
52
3
Vortioxetine
Trintellix
Bupropion
44
3
Bisacodyl
Dulcolax
Mag

12
3
Aluminum chloride hexahydrate
Drysol
Hypercare
40
3
Glatiramer
Copaxone
Natalizumab
29
3
Ledipasvir / sofosbuvir
Harvoni
Sofosbuvir / velpatasvir
46
3
Efavirenz / emtricitabine / tenofovir
Atripla
Cobicistat / elvitegravir / emtricitabine / tenofovir
36
3
Atorvastatin
Simvastatin
Rosuvastatin
43
3
Acetaminophen / hydrocodone
Tramadol
Naproxen
24
3
Ropinirole
Pramipexole
Tramadol
61
3
Ustekinumab
Stelara
Adalimumab
39
3
Paliperidone
Lurasidone
Latuda
18
3
Linaclotide
Linzess
Lubiprostone
33
3
Sertraline
Fluvoxamine
Zoloft
64
3
Leuprolide
Lupron Depot
Ethinyl estradiol / levonorgestrel
22
3
Tamsulosin
Silodosin
Flomax
47
3
Azithromycin
Clarithromycin
Levofloxacin
18
3
Testosterone
Axiron
Testim
30
3
Brimonidine
Mirvaso
Ivermectin
38
3
Levetiracetam
Keppra
Lacosamide
12
3
Metronidazole
Clindamycin
Tinidazole
56
3
Symbicort
Fluticasone / salmeterol
Montelukast
25
3
Modafinil
Armodafinil
Nuvigil
43
3
Acetaminophen / butalbital / caffeine
Fioricet
Acetaminophen / dichloralphenazone / is

{'Birth Control': ['Etonogestrel',
  'Ethinyl estradiol / norethindrone',
  'Levonorgestrel'],
 'Depression': ['Bupropion', 'Sertraline', 'Venlafaxine'],
 'Pain': ['Tramadol', 'Acetaminophen / hydrocodone', 'Oxycodone'],
 'Anxiety': ['Escitalopram', 'Alprazolam', 'Buspirone'],
 'Acne': ['Isotretinoin', 'Adapalene / benzoyl peroxide', 'Doxycycline'],
 'Bipolar Disorde': ['Lamotrigine', 'Quetiapine', 'Lamictal'],
 'Insomnia': ['Zolpidem', 'Trazodone', 'Ambien'],
 'Weight Loss': ['Phentermine', 'Lorcaserin', 'Belviq'],
 'Obesity': ['Bupropion / naltrexone', 'Contrave', 'Liraglutide'],
 'ADHD': ['Lisdexamfetamine', 'Vyvanse', 'Methylphenidate'],
 'Diabetes, Type 2': ['Liraglutide', 'Victoza', 'Dulaglutide'],
 'Emergency Contraception': ['Levonorgestrel', 'Plan B', 'Plan B One-Step'],
 'High Blood Pressure': ['Lisinopril', 'Losartan', 'Amlodipine'],
 'Vaginal Yeast Infection': ['Miconazole', 'Tioconazole', 'Fluconazole'],
 'Abnormal Uterine Bleeding': ['Medroxyprogesterone',
  'Depo-Provera

In [17]:
cond_drug

{'Birth Control': ['Etonogestrel',
  'Ethinyl estradiol / norethindrone',
  'Levonorgestrel'],
 'Depression': ['Bupropion', 'Sertraline', 'Venlafaxine'],
 'Pain': ['Tramadol', 'Acetaminophen / hydrocodone', 'Oxycodone'],
 'Anxiety': ['Escitalopram', 'Alprazolam', 'Buspirone'],
 'Acne': ['Isotretinoin', 'Adapalene / benzoyl peroxide', 'Doxycycline'],
 'Bipolar Disorde': ['Lamotrigine', 'Quetiapine', 'Lamictal'],
 'Insomnia': ['Zolpidem', 'Trazodone', 'Ambien'],
 'Weight Loss': ['Phentermine', 'Lorcaserin', 'Belviq'],
 'Obesity': ['Bupropion / naltrexone', 'Contrave', 'Liraglutide'],
 'ADHD': ['Lisdexamfetamine', 'Vyvanse', 'Methylphenidate'],
 'Diabetes, Type 2': ['Liraglutide', 'Victoza', 'Dulaglutide'],
 'Emergency Contraception': ['Levonorgestrel', 'Plan B', 'Plan B One-Step'],
 'High Blood Pressure': ['Lisinopril', 'Losartan', 'Amlodipine'],
 'Vaginal Yeast Infection': ['Miconazole', 'Tioconazole', 'Fluconazole'],
 'Abnormal Uterine Bleeding': ['Medroxyprogesterone',
  'Depo-Provera

In [18]:
data

Unnamed: 0,uniqueID,drugName,condition,review,rating,date,usefulCount
0,206461,Valsartan,Left Ventricular Dysfunction,"""It has no side effect, I take it in combinati...",9,20-May-12,27
1,95260,Guanfacine,ADHD,"""My son is halfway through his fourth week of ...",8,27-Apr-10,192
2,92703,Lybrel,Birth Control,"""I used to take another oral contraceptive, wh...",5,14-Dec-09,17
3,138000,Ortho Evra,Birth Control,"""This is my first time using any form of birth...",8,3-Nov-15,10
4,35696,Buprenorphine / naloxone,Opiate Dependence,"""Suboxone has completely turned my life around...",9,27-Nov-16,37
...,...,...,...,...,...,...,...
215058,159999,Tamoxifen,"Breast Cancer, Prevention","""I have taken Tamoxifen for 5 years. Side effe...",10,13-Sep-14,43
215059,140714,Escitalopram,Anxiety,"""I&#039;ve been taking Lexapro (escitaploprgra...",9,8-Oct-16,11
215060,130945,Levonorgestrel,Birth Control,"""I&#039;m married, 34 years old and I have no ...",8,15-Nov-10,7
215061,47656,Tapentadol,Pain,"""I was prescribed Nucynta for severe neck/shou...",1,28-Nov-11,20


In [19]:
data.to_csv('../csv/merged_data.csv',index=False)