<a href="https://colab.research.google.com/github/AsianaHolloway/AssignmentDifferentialDiagnostics/blob/main/AssignmentDiffrentialDiagnostics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment: Low-Level Differential Diagnostic System

In this assignment, you will implement a low-level differential diagnostic system using Python. The goal of this assignment is to solidify your understanding of the system without using machine learning. The system will calculate the most probable diagnoses based on input symptoms, evoking strengths, frequencies, and, optionally, test results.



SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. This package is pre installed if you are using google Colab and you can skip installing the package.

In [None]:
import os, json

json_path = "/content/symptom_database.json"  # or your Drive path
print("Exists?", os.path.exists(json_path))

with open(json_path, "r") as f:
    db = json.load(f)

print("Top-level keys (symptoms):", list(db.keys())[:5], "… total:", len(db))


Exists? True
Top-level keys (symptoms): ['Fever', 'Cough', 'Shortness of Breath', 'Headache', 'Fatigue'] … total: 13


In [None]:
#Uncomment to Install
# !pip install SQLAlchemy

Lets Import all the Nessesary packages

In [None]:
import json
import pprint
from sqlalchemy import create_engine, Column, Integer, String, Float, Table, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.orm import declarative_base

## Step 1: Data Collection

Utilize the "symptom_database.json" file available on the Assignment Canvas Page. Please ensure you upload the file to Google Colab if you are using Colab for your work.

In [None]:
symptom_database = ""
# Opening JSON file
with open('symptom_database.json', 'r') as openfile:

    # Reading from json file
    symptom_database = json.load(openfile)

The pprint module provides a capability to “pretty-print” arbitrary Python data structures like JSON

In [None]:
pprint.pprint(symptom_database)

{'Abdominal Pain': [{'Class': 'Gastrointestinal Disorder',
                     'Condition': 'Gastritis',
                     'Evoking_Strength': 3,
                     'Frequency': 2},
                    {'Class': 'Gastrointestinal Disorder',
                     'Condition': 'Appendicitis',
                     'Evoking_Strength': 4,
                     'Frequency': 2}],
 'Chest Pain': [{'Class': 'Cardiovascular Disorder',
                 'Condition': 'Heart Attack',
                 'Evoking_Strength': 4,
                 'Frequency': 1},
                {'Class': 'Gastrointestinal Disorder',
                 'Condition': 'Gastroesophageal Reflux Disease (GERD)',
                 'Evoking_Strength': 3,
                 'Frequency': 2}],
 'Cough': [{'Class': 'Viral Infection',
            'Condition': 'COVID-19',
            'Evoking_Strength': 4,
            'Frequency': 3},
           {'Class': 'Viral Infection',
            'Condition': 'Common Cold',
            'Evoking_Str

Frequency:

1: Rare
2: Occasional
3: Common
4: Very Common


Evoking_Strength:

1: Low
2: Moderate
3: Medium
4: High

As you can observe, the dataset includes symptoms and their corresponding associated conditions. These conditions are categorized into various classes and include information about their evoking strength and frequency of occurrence.

***It's important to emphasize that this dataset is for illustrative purposes only and is artificially generated; it does not reflect scientific or clinical accuracy.***

### Exercise 1a:  2 pts

Count the Number of Symptoms present in **symptom_database** and print the Value

In [None]:
import json

JSON_PATH = "symptom_database.json"  # or "/content/symptom_database.json"
with open(JSON_PATH, "r") as f:
    symptom_db = json.load(f)

# Each top-level key is a symptom
num_symptoms = len(symptom_db.keys())
print("Number of symptoms in the database:", num_symptoms)


Number of symptoms in the database: 13


### Exercise 1b: 3 pts

ABC Hospital relies on the information stored in the **symptom_database** for potential initial diagnoses. A patient arrives at the hospital with the symptom "Cough." Your task is to **write code that prints the potential initial conditions, one at a time, for the patient's condition.**

*Hint: You can use a loop to iterate over the conditions associated with "Cough" and print them sequentially.*

In [None]:
# Target symptom from the exercise
target_symptom = "Cough"

# Get the list of conditions linked to that symptom
conditions_for_symptom = symptom_db[target_symptom]

# Print each condition one by one
print(f"Potential initial conditions for symptom: {target_symptom}\n")
for condition in conditions_for_symptom:
    print(f"- {condition['Condition']} "
          f"(Class: {condition['Class']}, "
          f"Frequency: {condition['Frequency']}, "
          f"Evoking_Strength: {condition['Evoking_Strength']})")


Potential initial conditions for symptom: Cough

- COVID-19 (Class: Viral Infection, Frequency: 3, Evoking_Strength: 4)
- Common Cold (Class: Viral Infection, Frequency: 3, Evoking_Strength: 3)
- Bronchitis (Class: Respiratory Infection, Frequency: 2, Evoking_Strength: 3)


Through this exercises, you may have encountered challenges when extracting information from data represented as a JSON (Python Dictionary). While JSON is versatile, there are limitations to efficiently extracting data from it. As an alternative, we can leverage relational databases like SQL, which offer robust querying capabilities and can significantly simplify the process of extracting information, eliminating the need for custom code.

## Step 2: Use SQLAlchemy to Create an SQL Database

Transform the symptom_database which is in JSON format into an SQL database using SQLAlchemy. Define appropriate table structures for symptoms and conditions, and establish a many-to-many relationship between them.

In [None]:
# Define the SQLAlchemy engine and session
engine = create_engine('sqlite:///symptom_database.db')
Session = sessionmaker(bind=engine)

In [None]:
# Define the SQLAlchemy engine and session
session = Session()

Base = declarative_base()

# Association table to represent the many-to-many relationship
symptom_condition_association = Table(
    'symptom_condition_association',
    Base.metadata,
    Column('symptom_id', Integer, ForeignKey('symptoms.id')),
    Column('condition_id', Integer, ForeignKey('conditions.id'))
)

class Symptom(Base):
    __tablename__ = 'symptoms'

    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    conditions = relationship("Condition", secondary=symptom_condition_association)

class Condition(Base):
    __tablename__ = 'conditions'

    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    class_name = Column(String, nullable=True)
    frequency = Column(Float, nullable=True)
    evoking_strength = Column(Float, nullable=True)

Base.metadata.create_all(engine)



Now that we have the tables defined lets add data to the database

In [None]:
# Add data to the database
for symptom_name, conditions in symptom_database.items():
    # Check if the symptom already exists in the database
    existing_symptom = session.query(Symptom).filter_by(name=symptom_name).first()

    # If the symptom doesn't exist, create it
    if not existing_symptom:
        new_symptom = Symptom(name=symptom_name)
        session.add(new_symptom)
    else:
        new_symptom = existing_symptom

    # Add associated conditions to the symptom
    for condition_data in conditions:
        condition_name = condition_data["Condition"]
        class_name = condition_data.get("Class")
        frequency = condition_data.get("Frequency")
        evoking_strength = condition_data.get("Evoking_Strength")

        # Check if the condition already exists in the database
        existing_condition = session.query(Condition).filter_by(name=condition_name).first()

        # If the condition doesn't exist, create it
        if not existing_condition:
            new_condition = Condition(
                name=condition_name,
                class_name=class_name,
                frequency=frequency,
                evoking_strength=evoking_strength,
            )
            session.add(new_condition)
        else:
            new_condition = existing_condition

        # Add the condition to the symptom's conditions list
        if new_condition not in new_symptom.conditions:
            new_symptom.conditions.append(new_condition)

# Commit the changes to the database
session.commit()

# Close the session
session.close()

## Step 3: Querying

Print all Conditions

In [None]:
# Define the SQLAlchemy session

session = Session()

# Query all conditions from the database
all_conditions = session.query(Condition).all()

# Print the list of conditions
for condition in all_conditions:
    print(f"Condition: {condition.name}")
    print(f"Class: {condition.class_name}")
    print()

# Close the session
session.close()

Condition: Influenza
Class: Viral Infection

Condition: COVID-19
Class: Viral Infection

Condition: Common Cold
Class: Viral Infection

Condition: Bronchitis
Class: Respiratory Infection

Condition: Asthma
Class: Respiratory Disorder

Condition: Pneumonia
Class: Respiratory Infection

Condition: Migraine
Class: Neurological Disorder

Condition: Tension Headache
Class: Neurological Disorder

Condition: Sinusitis
Class: Respiratory Infection

Condition: Chronic Fatigue Syndrome
Class: Autoimmune Disorder

Condition: Anemia
Class: Hematological Disorder

Condition: Gastritis
Class: Gastrointestinal Disorder

Condition: Appendicitis
Class: Gastrointestinal Disorder

Condition: Food Poisoning
Class: Gastrointestinal Disorder

Condition: Morning Sickness
Class: Pregnancy-related

Condition: Gastroenteritis
Class: Gastrointestinal Disorder

Condition: Heart Attack
Class: Cardiovascular Disorder

Condition: Gastroesophageal Reflux Disease (GERD)
Class: Gastrointestinal Disorder

Condition: Rhe

Query Associated Conitions for a Symptom

In [None]:
def query_symptoms_by_name(name_filter=None):
    session = Session()
    # Define the query
    query = session.query(Symptom)

    # Apply the name filter if provided
    if name_filter:
        query = query.filter(Symptom.name.ilike(f'%{name_filter}%'))

    # Query symptoms from the database
    symptoms = query.all()

    # Print the list of symptoms
    for symptom in symptoms:
      print(f"Symptom: {symptom.name}")
      for condition in symptom.conditions:
          print(f"Associated Condition: {condition.name}")

    # Close the session
    session.close()


symptom_name_filter = "Fever"  # Replace with the symptom name you want to filter
query_symptoms_by_name(symptom_name_filter)

Symptom: Fever
Associated Condition: Influenza
Associated Condition: COVID-19
Associated Condition: Common Cold


## Step : 6 Implement the Diagnostic System

Create functions in Python to implement the diagnostic system. The system should take an array of symptoms as input and query the database to return the list of most probable diagnoses based on evoking strength and frequency. Implement a scoring mechanism to determine the order of probable diagnoses.




$Score = EvokingStrength$.

Lets use simple formula for score with just Evoking Strength as an example

In [None]:

def get_probable_diagnoses_simple(symptoms):
    session = Session()

    # Initialize a dictionary to store condition scores
    condition_scores = {}

    # Query all conditions from the database
    all_conditions = session.query(Condition).all()

    # Calculate scores for each condition based on symptoms
    for condition in all_conditions:
        condition_name = condition.name
        condition_evoking_strength = condition.evoking_strength or 1.0  # Default to 1.0 if evoking_strength is None

        # Calculate a score for the condition based on symptoms
        score = 0
        for symptom in symptoms:
            # Check if the symptom is associated with the current condition
            associated_symptom = session.query(Symptom).filter_by(name=symptom).first()
            if associated_symptom and condition in associated_symptom.conditions:
                score +=  condition_evoking_strength

        # Store the score for the condition
        condition_scores[condition_name] = score

    # Sort conditions by score in descending order
    sorted_conditions = sorted(condition_scores.items(), key=lambda x: x[1], reverse= False)

    # Close the session
    session.close()

    # Return the list of most probable diagnoses
    return sorted_conditions




In [None]:
input_symptoms = ["Fever", "Cough"]
probable_diagnoses = get_probable_diagnoses_simple(input_symptoms)

print("Most probable diagnoses:")
for condition, score in probable_diagnoses:
    print(f"Condition: {condition}, Score: {score}")

Most probable diagnoses:
Condition: Asthma, Score: 0
Condition: Pneumonia, Score: 0
Condition: Migraine, Score: 0
Condition: Tension Headache, Score: 0
Condition: Sinusitis, Score: 0
Condition: Chronic Fatigue Syndrome, Score: 0
Condition: Anemia, Score: 0
Condition: Gastritis, Score: 0
Condition: Appendicitis, Score: 0
Condition: Food Poisoning, Score: 0
Condition: Morning Sickness, Score: 0
Condition: Gastroenteritis, Score: 0
Condition: Heart Attack, Score: 0
Condition: Gastroesophageal Reflux Disease (GERD), Score: 0
Condition: Rheumatoid Arthritis, Score: 0
Condition: Osteoarthritis, Score: 0
Condition: Vertigo, Score: 0
Condition: Low Blood Pressure, Score: 0
Condition: Eczema, Score: 0
Condition: Contact Dermatitis, Score: 0
Condition: Urinary Tract Infection (UTI), Score: 0
Condition: Diabetes, Score: 0
Condition: Bronchitis, Score: 3.0
Condition: Influenza, Score: 4.0
Condition: Common Cold, Score: 6.0
Condition: COVID-19, Score: 8.0


### Exercise 2

Now extend the function to use scoring with both Frequency and Evoking Strength

$Score = Frequency * EvokingStrength$.



#### Exercise 2a - 7 pts:


*   Adjust the get_probable_diagnoses_simple function to effectively incorporate both "Frequency" and "Evoking Strength" into the **scoring formula mentioned earlier**.


*   Extend the function to accept an **additional input parameter**, n, which represents the number of top diagnoses to be returned. For instance, if n is set to 5, the function should return the top 5 probable diagnoses based on the scoring formula.



*   Make sure that the list of probable diagnoses returned by the function is **sorted in descending order**, with the most likely diagnosis appearing at the top of the list.



In [None]:
from collections import defaultdict

def get_probable_diagnoses_simple(symptoms, n=10):
    # Score = Frequency * EvokingStrength (per matched symptom), return top-n

    # clean inputs
    symptoms = [s.strip() for s in symptoms if s and s.strip()]
    if not symptoms:
        return []

    # cache symptom rows once
    sym_rows = {
        s.name: s.id
        for s in session.query(Symptom).filter(Symptom.name.in_(symptoms)).all()
    }
    if not sym_rows:
        return []

    # pull all conditions once
    all_conditions = session.query(Condition).all()

    condition_scores = defaultdict(float)

    for condition in all_conditions:
        freq = (getattr(condition, "frequency", 0) or 0)
        evk  = (getattr(condition, "evoking_strength", 0) or 0)
        pair_score = freq * evk  # <-- scoring formula required by the prompt

        # find which of the input symptoms are linked to THIS condition
        linked_symptom_ids = {
            sid for (sid,) in session.query(symptom_condition_association.c.symptom_id)
            .filter(symptom_condition_association.c.condition_id == condition.id)
            .all()
        }

        # add score once per matched symptom
        for sym_name, sym_id in sym_rows.items():
            if sym_id in linked_symptom_ids:
                condition_scores[condition.name] += pair_score

    # sort by score desc and return top-n
    ranked = sorted(condition_scores.items(), key=lambda kv: kv[1], reverse=True)[:n]
    return ranked

# quick sanity test
print(get_probable_diagnoses_simple(["Fever","Cough"], n=5))




[('COVID-19', 24.0), ('Common Cold', 18.0), ('Influenza', 12.0), ('Bronchitis', 6.0)]


#### Exercise 2b - 3 pts

Now, run the ***get_probable_diagnoses_extended*** function for a patient **presenting symptoms "Fever," "Cough," "Headache," and "Shortness of Breath,**" and ensure that it returns the **top 4 probable diagnoses**.

In [None]:
# Input symptom set for 2b
example_symptoms = ["Fever", "Cough", "Headache", "Shortness of Breath"]

# The prompt asks for the function get_probable_diagnoses_extended and top 4 dx.
# We'll implement a thin wrapper that calls the simple function with n=4.
def get_probable_diagnoses_extended(symptoms, n=4):
    return get_probable_diagnoses_simple(symptoms, n=n)

top4 = get_probable_diagnoses_extended(example_symptoms, n=4)

# Display cleanly
import pandas as pd
df_results_2b = pd.DataFrame(top4, columns=["Condition", "Score"])
df_results_2b


Unnamed: 0,Condition,Score
0,COVID-19,36.0
1,Common Cold,18.0
2,Influenza,12.0
3,Migraine,12.0
