<h3>Name Entity Regogintion(NER) for Symptoms Mapping</h3>
<h4>Used to map the user given symptoms to actual Symptoms</h4>

<h3>Unique Symptoms are extracted from dataset and saved as csv</h3>

In [3]:
import pandas as pd

# Load the CSV file
df = pd.read_csv('dataset.csv')

# List of all symptom columns
symptom_columns = [f'Symptom_{i}' for i in range(1, 18)]  # Generates Symptom_1 to Symptom_17

# Combine all symptom data into a single list
all_symptoms = df[symptom_columns].fillna('').values.ravel()  # Flatten all symptom data

# Clean symptoms by removing extra spaces and ignoring empty values
all_symptoms = [symptom.strip() for symptom in all_symptoms if symptom.strip()]

# Get unique symptoms
unique_symptoms = sorted(set(all_symptoms))  # Sort for an alphabetical list

# Save the unique symptoms to a CSV file
unique_symptoms_df = pd.DataFrame(unique_symptoms, columns=['Symptom'])  # Create DataFrame
unique_symptoms_df.to_csv('unique_symptoms.csv', index=False)  # Save to CSV without index

print("Unique symptoms have been saved to 'unique_symptoms.csv'.")


Unique symptoms have been saved to 'unique_symptoms.csv'.


<h3>Building and testing  NER.</h3>

In [5]:
import pandas as pd
from fuzzywuzzy import process
from sklearn.feature_extraction.text import CountVectorizer
import re

# Step 1: Load symptoms from CSV (make sure it only contains the desired symptoms)
df = pd.read_csv('unique_symptoms.csv')  # Replace with the path to your CSV file
symptoms_list = df['Symptom'].dropna().tolist()  # Extract symptoms into a list

# Step 2: Function to generate n-grams (bigrams)
def generate_ngrams(text, n=2):
    """Generate n-grams from text."""
    vectorizer = CountVectorizer(ngram_range=(n, n), stop_words='english')
    ngrams = vectorizer.fit_transform([text])
    ngrams_list = vectorizer.get_feature_names_out()
    return ngrams_list

# Step 3: Function to extract symptoms using fuzzy matching and n-grams
def extract_symptoms_from_input(user_input):
    """
    Function to extract symptoms from user input using fuzzy matching and n-grams.
    Returns a string of matched symptoms, separated by commas.
    """
    # Normalize input text
    user_input = user_input.lower().strip()

    # Generate bigrams from the user input
    user_input_bigrams = generate_ngrams(user_input)

    # List to store matched symptoms
    matched_symptoms = []

    for symptom in symptoms_list:
        # Check if the symptom directly matches any bigram
        if symptom.lower() in user_input:
            matched_symptoms.append(symptom)
        else:
            # Use fuzzy matching to allow for misspellings or slight variations
            match = process.extractOne(symptom.lower(), user_input_bigrams)
            if match and match[1] >= 80:  # Adjust threshold as needed
                matched_symptoms.append(symptom)

    # Post-processing: Optional, for more fine-tuning, e.g., removing irrelevant matches.
    matched_symptoms = list(set(matched_symptoms))  # Remove duplicates

    # Return the matched symptoms as a single string, joined by commas
    return ', '.join(matched_symptoms)

# Example usage:
user_input = "Hello, Doctor. I’ve been feeling quit heade uncomfortable lately. I’ve been having a lot of stomach pain, especially around my abdomen. It feels like a constant ache, and sometimes it gets sharp.Additionally, I’ve noticed burning micturition whenever I go to the bathroom. The pain is sharp, and it’s very uncomfortable.Lastly, there has been some spotting urination; I see a little bit of blood when I urinate, and I’m really worried about it.Could you help me figure out what’s going on?"  # Example user input
extracted_symptoms = extract_symptoms_from_input(user_input)

print("Extracted Symptoms:", extracted_symptoms)


Extracted Symptoms: burning_micturition, stomach_pain, spotting_ urination
