<h3>Name Entity Regogintion(NER) for Disease and Treatment Mapping</h3>
<h4>Used to map the user given diseases and treatment to actual diseases and treatment</h4>

<h3>Unique Treatment are extracted from dataset and saved as csv</h3>

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv('symptom_precaution.csv')

# Combine the precaution columns into a single text column with $ separation
df['Precautions'] = df[['Precaution_1', 'Precaution_2', 'Precaution_3', 'Precaution_4']].apply(lambda row: '$'.join(row), axis=1)

# Split the precautions column into separate columns
precautions_split = df['Precautions'].str.split('$', expand=True)

# Flatten the precautions columns into a single list
unique_precautions = set()

# Loop through each column of precautions and add unique entries to the set
for col in precautions_split.columns:
    unique_precautions.update(precautions_split[col].dropna().unique())

# Convert the unique precautions to a DataFrame
unique_precautions_df = pd.DataFrame(unique_precautions, columns=['Unique_Treatment'])

# Save the unique treatments to a new CSV file
unique_precautions_df.to_csv('unique_treatments.csv', index=False)

print("Unique treatments saved to 'unique_treatments.csv' successfully!")


Unique treatments saved to 'unique_treatments.csv' successfully!


<h3>Building and Testing NER</h3>

In [33]:
import pandas as pd
from fuzzywuzzy import process
from sklearn.feature_extraction.text import CountVectorizer
import re

# Step 1: Load symptoms from CSV (make sure it only contains the desired symptoms)
df = pd.read_csv('disease.csv')  # Replace with the path to your CSV file
disease_list = df['Disease'].dropna().tolist()  # Extract symptoms into a list

dff = pd.read_csv('unique_treatments.csv')  # Replace with the path to your CSV file
treatment_list = dff['Treatment'].dropna().tolist()  # Extract symptoms into a list

# Step 2: Function to generate n-grams (bigrams)
def generate_ngrams(text, n=2):
    """Generate n-grams from text."""
    vectorizer = CountVectorizer(ngram_range=(n, n), stop_words='english')
    ngrams = vectorizer.fit_transform([text])
    ngrams_list = vectorizer.get_feature_names_out()
    return ngrams_list


# Step 3: Function to extract symptoms using fuzzy matching and n-grams
def extract_disease_from_input(user_input):
    
    # Normalize input text
    user_input = user_input.lower().strip()

    # Generate bigrams from the user input
    user_input_bigrams = generate_ngrams(user_input)

    # List to store matched disease
    matched_disease = []

    for disease in disease_list:
        # Check if the disease directly matches any bigram
        if disease.lower() in user_input:
            matched_disease.append(disease)
        else:
            # Use fuzzy matching to allow for misspellings or slight variations
            match = process.extractOne(disease.lower(), user_input_bigrams)
            if match and match[1] >= 80:  # Adjust threshold as needed
                matched_disease.append(disease)

    # Post-processing: Optional, for more fine-tuning, e.g., removing irrelevant matches.
    matched_disease = list(set(matched_disease))  # Remove duplicates

    # Return the matched disease as a single string, joined by commas
    return ', '.join(matched_disease)

def extract_treatment_from_input(user_input):

    # Normalize input text
    user_input = user_input.lower().strip()

    # Generate bigrams from the user input
    user_input_bigrams = generate_ngrams(user_input)

    # List to store matched treatment
    matched_treatments = []

    for treatment in treatment_list:
        # Check if the treatment directly matches any bigram
        if treatment.lower() in user_input:
            matched_treatments.append(treatment)
        else:
            # Use fuzzy matching to allow for misspellings or slight variations
            match = process.extractOne(treatment.lower(), user_input_bigrams)
            if match and match[1] >= 90:  # Adjust threshold as needed
                matched_treatments.append(treatment)

    # Post-processing: Optional, for more fine-tuning, e.g., removing irrelevant matches.
    matched_treatments = list(set(matched_treatments))  # Remove duplicates

    # Return the matched symptoms as a single string, joined by commas
    return matched_treatments


# Example usage:
user_input = "Hello there, I’m really sorry to hear that you're feeling unwell. After reviewing your symptoms, it seems you may be dealing with 'Heart Attack'. I understand this might be worrying, but please know that we can take steps together to help you feel better. 😊💪Please follow the following steps to feel better and work towards your recovery: 1. Call an ambulance immediately if you or someone else is experiencing severe symptoms or if the situation is life-threatening. Time is critical in emergencies, so don’t hesitate to call for professional help.2. Chew or swallow aspirin if advised by a healthcare professional and if you are experiencing chest pain, as it may help prevent further damage to the heart in cases of a heart attack. However, do this only if there are no contraindications for you.3. Keep calm and try to remain composed. Staying calm helps you think clearly and take the necessary actions.4. Perform CPR if needed and if you are trained to do so. Chest compressions can help maintain blood flow until medical professionals arrive.Remember, recovery takes time, but you're not alone in this journey. Stay positive! 🌟"  # Example user input
extracted_disease = extract_disease_from_input(user_input)
extracted_treatment = extract_treatment_from_input(user_input)

print("Extracted Disease:", extracted_disease)
print("Extracted Treatment:", extracted_treatment)



Extracted Disease: Heart attack
Extracted Treatment: ['keep calm', 'perform CPR if needed']
