# Deep Learning based Medical Diagnoser

## Introduction

The rapid growth of Artificial Intelligence has opened new possibilities in the healthcare domain, especially in assisting medical diagnosis. With the help of Deep Learning, it is now possible to analyze patient symptoms and generate meaningful insights within seconds. Deep Learning models can automatically learn complex patterns from large volumes of data, making them highly effective for decision-support systems in medicine.

This project focuses on developing an intelligent medical diagnosis system that predicts diseases and recommends appropriate medications based on a patient’s symptoms. Instead of relying on manual analysis, the system processes textual symptom descriptions and provides instant predictions using a trained deep learning model. Such AI-driven solutions can help improve efficiency, reduce diagnostic time, and support healthcare professionals in decision-making.

## Project Description

In this project, a Deep Learning–based medical diagnosis model is designed and implemented using TensorFlow. The model is trained on textual data representing patient symptoms and is capable of predicting both the disease and the corresponding medication as output.

Since patient symptoms are provided in the form of text sequences, a Recurrent Neural Network (RNN) architecture is required to capture the contextual relationship between words. To achieve this, a Long Short-Term Memory (LSTM) network is used. LSTM models are well-known for their ability to remember important information from earlier parts of a sentence, making them suitable for understanding symptom progression and severity.

For example, symptoms such as loss of appetite followed by fatigue and muscle weakness carry more meaning when their order is preserved. The LSTM layer effectively learns these sequential dependencies to improve prediction accuracy.

The dataset used for training includes three key components:
* Patient Symptoms: Text-based descriptions of medical issues
* Diseases: Diagnosed medical conditions
* Medications: Prescribed treatments for each condition

The model architecture consists of an embedding layer for text representation, followed by an LSTM layer for sequence learning. The learned features are then passed to two separate dense output layers—one for disease classification and another for medication recommendation. This multi-output approach allows the system to provide complete diagnostic assistance in a single prediction.

Overall, this project demonstrates the practical application of Deep Learning and Natural Language Processing in healthcare, showcasing how AI can be leveraged to build efficient and intelligent medical support systems.

## Implementation: Medical Diagnosis with LSTM

Building a deep learning model for medical diagnosis requires a large dataset of labeled medical data. The dataset used in this tutorial includes:

* Patient Symptoms: Textual descriptions of the patient's symptoms.
* Diagnoses: The confirmed diseases for each patient.
* Medications: The prescribed medications for each patient's condition.

## Importing Libraries

In [19]:
import pandas as pd
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense


## Loading the Dataset

In [20]:
data = pd.read_csv('medical_data.csv')
data.head()


Unnamed: 0,Patient_Problem,Disease,Prescription
0,"Constant fatigue and muscle weakness, struggli...",Chronic Fatigue Syndrome,"Cognitive behavioral therapy, graded exercise ..."
1,"Frequent severe migraines, sensitivity to ligh...",Migraine with Aura,"Prescription triptans, avoid triggers like bri..."
2,"Sudden weight gain and feeling cold, especiall...",Hypothyroidism,Levothyroxine to regulate thyroid hormone levels.
3,"High fever, sore throat, and swollen lymph nod...",Mononucleosis,"Rest and hydration, ibuprofen for pain."
4,"Excessive thirst and frequent urination, dry m...",Diabetes Mellitus,Insulin therapy and lifestyle changes.


## Data Preprocessing and Preparation

Before using medical data in a deep learning model, it needs to be preprocessed to ensure the model can understand it. Preprocessing steps often include:

* **Text Tokenization**: Converting textual data into sequences of numbers that the model can process.
* **Padding Sequences**: Making all sequences the same length by adding padding characters at the beginning or end of shorter sequences.
* **Label Encoding**: Converting categorical variables, such as disease names and medication names, into numerical labels.

### Tokenizing and Sequencing Text Data

A 'tokenizer' variable is created to convert the textual data into sequences of integers. It only considers the top 5,000 words in the dataset in order to reduce the complexity. If the model encounters any out-of-vocabulary words during the training process then it will be replaced with the < OOV > token.

In [21]:
tokenizer = Tokenizer(num_words=5000, oov_token="<OOV>")
tokenizer.fit_on_texts(data['Patient_Problem'])

sequences = tokenizer.texts_to_sequences(data['Patient_Problem'])


### Padding Sequences

In order to make the input sequences have the same length, the code finds the longest sequence and pads all other sequences with zeros at the end ('post' padding) to match this sentence.

In [22]:
max_length = max(len(x) for x in sequences)
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post') 


### Encoding the Labels and Converting them to Categorical

We will encode the 'Disease' and 'Prescription' columns as integers. Then the integer-encoded labels are converted into binary class matrices.

In [23]:
# Encoding the labels
label_encoder_disease = LabelEncoder()
label_encoder_prescription = LabelEncoder()

disease_labels = label_encoder_disease.fit_transform(data['Disease'])
prescription_labels = label_encoder_prescription.fit_transform(data['Prescription'])

# Converting labels to categorical
disease_labels_categorical = to_categorical(disease_labels)
prescription_labels_categorical = to_categorical(prescription_labels)


### Combining Labels into a Multi-label Target Variable

Finally, now we will stack the binary class matrices together to form a single multi-label target variable 'Y'. This allows the model to predict both 'Disease' and 'Prescription' from the patient's problem.

In [24]:
Y = np.hstack((disease_labels_categorical, prescription_labels_categorical))


## Model Building - LSTM Model

Now, we will build the model using the LSTM and Sequential algorithm from TensorFlow. This model will learn from our preprocessed dataset to predict diseases based on patient symptoms.

### Defining Model Architecture

We will use the 'Model' and 'Input' to define the model architecture, and 'Embedding' to convert the integer sequences into dense vectors of fixed size. We will use 'Dense' for output layers that make predictions.

In [25]:
input_layer = Input(shape=(max_length,))

embedding = Embedding(input_dim=5000, output_dim=64)(input_layer)
lstm_layer = LSTM(64)(embedding)

disease_output = Dense(len(label_encoder_disease.classes_), activation='softmax',
name='disease_output')(lstm_layer)

prescription_output = Dense(len(label_encoder_prescription.classes_),
activation='softmax', name='prescription_output')(lstm_layer)


The model firstly have, an input layer that can handle sequences up to a certain length. Then there's an embedding layer that turns the numbers into vectors. After that, there's an LSTM layer that looks at the order of things, and finally, two dense layers that predict diseases and prescriptions using a softmax function for classification.

### Compiling the model

In [26]:
model = Model(inputs=input_layer, outputs=[disease_output, prescription_output])

model.compile(
    loss={'disease_output': 'categorical_crossentropy',
    'prescription_output': 'categorical_crossentropy'},
    optimizer='adam',
    metrics={'disease_output': ['accuracy'], 'prescription_output': ['accuracy']}
)

model.summary()


### Training the model

In [27]:
model.fit(
    padded_sequences,
    {
        'disease_output': disease_labels_categorical,
        'prescription_output': prescription_labels_categorical
    },
    epochs=100,
    batch_size=32,
)


Epoch 1/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 15ms/step - disease_output_accuracy: 0.0147 - disease_output_loss: 5.1774 - loss: 11.1441 - prescription_output_accuracy: 0.0000e+00 - prescription_output_loss: 5.9666  
Epoch 2/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - disease_output_accuracy: 0.0319 - disease_output_loss: 5.1511 - loss: 11.1157 - prescription_output_accuracy: 0.0025 - prescription_output_loss: 5.9645
Epoch 3/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - disease_output_accuracy: 0.0319 - disease_output_loss: 5.0521 - loss: 11.0353 - prescription_output_accuracy: 0.0025 - prescription_output_loss: 5.9842       
Epoch 4/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - disease_output_accuracy: 0.0319 - disease_output_loss: 4.9375 - loss: 10.9223 - prescription_output_accuracy: 0.0147 - prescription_output_loss: 5.9841      
Epoch 5/100
[1m13/13

<keras.src.callbacks.history.History at 0x2444d0342d0>

### Making Predictions

The model is used to make predictions for new patients:

1. Pre-processed the patient's symptoms by performing tokenization and padding.
2. Feed the pre-processed data into the trained model.
3. The model predicts the disease and medication based on the patient's symptoms.
4. The predicted disease and medication will be presented.

In [28]:
def make_prediction(patient_problem):
    # Preprocessing the input
    sequence = tokenizer.texts_to_sequences([patient_problem])
    padded_sequence = pad_sequences(sequence, maxlen=max_length, padding='post')

    # Making prediction
    prediction = model.predict(padded_sequence)

    # Decoding the prediction
    disease_index = np.argmax(prediction[0], axis=1)[0]
    prescription_index = np.argmax(prediction[1], axis=1)[0]

    disease_predicted = label_encoder_disease.inverse_transform([disease_index])[0]
    prescription_predicted = label_encoder_prescription.inverse_transform([prescription_index])[0]

    print(f"Predicted Disease: {disease_predicted}")
    print(f"Suggested Prescription: {prescription_predicted}")


## Conclusion

Adding deep learning to medical diagnostics is a game-changer in healthcare.The model is trained on the preprocessed dataset, iteratively adjusting its internal parameters to improve its accuracy in predicting diseases and medications based on patient symptoms.

In [29]:
patient_input = "I've experienced a loss of appetite and don't enjoy food anymore."
make_prediction(patient_input)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 373ms/step
Predicted Disease: Depression
Suggested Prescription: Antidepressants; eating nutrient-rich foods.


In [30]:
patient_input = "I am experiencing stomach pain, bloating, and nausea after eating."
make_prediction(patient_input)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step
Predicted Disease: Subarachnoid Hemorrhage
Suggested Prescription: Antidepressants, psychotherapy.


In [31]:
patient_input = "I have continuous joint pain and stiffness, especially in the morning."
make_prediction(patient_input)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 61ms/step
Predicted Disease: Peripheral Neuropathy
Suggested Prescription: Blood sugar control; pain relievers; capsaicin cream.


In [32]:
patient_input = "I have difficulty breathing, wheezing, and chest tightness."
make_prediction(patient_input)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 61ms/step
Predicted Disease: Bronchial Asthma
Suggested Prescription: Low-salt diet; medication to reduce urine output.


In [None]:
patient_input = "I have itchy red rashes on my skin that have been spreading."
make_prediction(patient_input)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step
Predicted Disease: Psoriasis
Suggested Prescription: Topical treatments and light therapy.
