<a href="https://colab.research.google.com/github/candanbalcii/health-condition-prediction/blob/main/HealthConditionPrediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [17]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [18]:
import pandas as pd

df = pd.read_csv('/content/drive/My Drive/drugLibTrain_raw.tsv', sep='\t')

df.head()


Unnamed: 0.1,Unnamed: 0,urlDrugName,rating,effectiveness,sideEffects,condition,benefitsReview,sideEffectsReview,commentsReview
0,2202,enalapril,4,Highly Effective,Mild Side Effects,management of congestive heart failure,slowed the progression of left ventricular dys...,"cough, hypotension , proteinuria, impotence , ...","monitor blood pressure , weight and asses for ..."
1,3117,ortho-tri-cyclen,1,Highly Effective,Severe Side Effects,birth prevention,Although this type of birth control has more c...,"Heavy Cycle, Cramps, Hot Flashes, Fatigue, Lon...","I Hate This Birth Control, I Would Not Suggest..."
2,1146,ponstel,10,Highly Effective,No Side Effects,menstrual cramps,I was used to having cramps so badly that they...,Heavier bleeding and clotting than normal.,I took 2 pills at the onset of my menstrual cr...
3,3947,prilosec,3,Marginally Effective,Mild Side Effects,acid reflux,The acid reflux went away for a few months aft...,"Constipation, dry mouth and some mild dizzines...",I was given Prilosec prescription at a dose of...
4,1951,lyrica,2,Marginally Effective,Severe Side Effects,fibromyalgia,I think that the Lyrica was starting to help w...,I felt extremely drugged and dopey. Could not...,See above


In [19]:
df.condition.value_counts()

Unnamed: 0_level_0,count
condition,Unnamed: 1_level_1
depression,236
acne,165
anxiety,63
insomnia,54
birth control,49
...,...
"panic attacks, depression",1
extrinsic aging,1
all over and various type pain,1
"excessive coughing, later diagnosed as pneumonia",1


In [20]:
df_train = df[(df['condition'] == 'depression') | (df['condition'] == 'acne') | (df['condition'] == 'anxiety') | (df['condition'] == 'insomnia')]


In [21]:
df_train = df_train.drop(columns=['Unnamed: 0', 'urlDrugName', 'rating', 'effectiveness', 'sideEffects', 'benefitsReview', 'sideEffectsReview'])


In [22]:
!pip install --upgrade nltk




DATA PREPROCESSING

In [23]:
from nltk.corpus import stopwords
import nltk

nltk.download('stopwords')
stop= stopwords.words('english')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [25]:
import re
from nltk.corpus import stopwords

stop = stopwords.words('english')

def clean_text(text):
    if isinstance(text, str):
        text = text.lower()

        text = re.sub(r'\d+', '', text)

        text = re.sub(r'[^\w\s]', '', text)

        text = ' '.join([word for word in text.split() if word not in stop])

        return text
    else:
        return ""

df_train['commentsReview'] = df_train['commentsReview'].apply(clean_text)

print(df_train.head())


     condition                                     commentsReview
7   depression                         one day taken hour bedtime
12  depression  prescribes treatment depression seasonable dis...
15     anxiety  took medication years hard get drug liked way ...
22        acne  confess im confused section real details assoc...
29        acne  take medication twice day combination good ski...


In [26]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df_train['commentsReview'])

print(X.shape)


(518, 2642)


In [27]:
y = df_train['condition']

print(y.value_counts())


condition
depression    236
acne          165
anxiety        63
insomnia       54
Name: count, dtype: int64


In [28]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

lr_model = LogisticRegression(class_weight='balanced')
lr_model.fit(X_train, y_train)

y_pred_lr = lr_model.predict(X_test)
print("Logistic Regression Classification Report:")
print(classification_report(y_test, y_pred_lr))

rf_model = RandomForestClassifier(class_weight='balanced')
rf_model.fit(X_train, y_train)

y_pred_rf = rf_model.predict(X_test)
print("Random Forest Classification Report:")
print(classification_report(y_test, y_pred_rf))


Logistic Regression Classification Report:
              precision    recall  f1-score   support

        acne       0.90      0.88      0.89        50
     anxiety       0.44      0.47      0.46        17
  depression       0.81      0.83      0.82        66
    insomnia       0.90      0.83      0.86        23

    accuracy                           0.81       156
   macro avg       0.76      0.75      0.76       156
weighted avg       0.81      0.81      0.81       156

Random Forest Classification Report:
              precision    recall  f1-score   support

        acne       0.84      0.86      0.85        50
     anxiety       0.57      0.24      0.33        17
  depression       0.71      0.89      0.79        66
    insomnia       0.87      0.57      0.68        23

    accuracy                           0.76       156
   macro avg       0.75      0.64      0.67       156
weighted avg       0.76      0.76      0.75       156



In [29]:
df['commentsReview'] = df['commentsReview'].fillna('')


In [30]:
new_review = input("Please enter your review: ")

new_review_tfidf = vectorizer.transform([new_review])

prediction = lr_model.predict(new_review_tfidf)
# prediction = rf_model.predict(new_review_tfidf)

print(f"The predicted condition for the review is: {prediction[0]}")


Please enter your review: I feel like I have no energy anymore. Every day feels like a struggle just to get out of bed
The predicted condition for the review is: depression
