# Gemma 3n: Clinical Diagnosis Support

## Problem Statement and Importance
According to BMJ Quality & Safety (from British Medical Journal, also stated on reports from Johns Hopkins Medicine) approximately **795,000** people die or are permanently disabled due to misdiagnosis a year in America alone. Current medicine focuses on ***aggregates and averages***, and often neglects rarer diseases. 

The disease mentioned here is **Immune Thrombocytopenic Purpura** (ITP), and it is a rare disease according to Johns Hopkins Medicine. This may also be referred to as ***Thrombocytopenia with Immune Thrombocytopenic Purpura***, as thrombocytopenia is simply having a low platelet count. Another one of the most common names used is **Idiopathic Thrombocytopenic Purpura**. A quick description of ITP is that it is a disease where the body kills its own platelets.

ITP specifically was picked because of my own personal experience with misdiagnosis being able to cause serious damage. A member in my own family had ITP and consistently had **trouble getting treatment and had been misdiagnosed**. This became **near-fatal** to my family member when he broke his arm and had to undergo surgery, with his condition causing surprise complications.

However, Gemma 3n, as can be demonstrated **can diagnose a rare disease**, and one that has made nearly every injury amplified to the extremes. 

This project tries to make sure that other families like mine won't deal with the same situation and that more families can plan and be more informed regarding their diseases. This could help family members keep themselves and their loved ones safe, and not have to be sprung with problems in a life or death scenario.

Additionally, human doctors can have **human problems**, meaning that they can be fatigued, be overloaded from work, and be under stress. All of these factors can **make humans perform worse**.

Generative **AI doesn't have these problems**. Furthermore, using GenAI is ***cheaper, easier, and faster*** than manually going to a doctor, scheduling an appointment, and paying hefty prices or waiting in long lines.

## Step by Step Solution
1. Takes patient file as a string (or inputted in the textbox on the Gradio interface).
2. Uses **few-shot prompting** by giving examples in order to return a structured version of the patient file as a JSON.
3. With the structured data, Gemma 3n analyzes everything under a ***persona prompt*** by roleplaying as a medical professional to come up with **personalized diagnoses**.
4. Given the fact that this is Generative AI, it is not ethically sound to simply output diagnoses in full confidence. That is why Gemma is specifically told to **give a *disclaimer*** about its limitations and how a real professional should be consulted.

## Framework
As discussed earlier, this project first **parses then analyzes** a patient file to make diagnoses.

This is done because this project does not only hold the potential to save lives through prediction, but it also solves a problem that plagues many applications of technology in healthcare, which is **unstructured data**. By **structuring the data** so that it is very easy to understand and in a standardized format, **it is more easy to use and implement**.

### The Use of Structured Data
According to the National Institutes of Health, **80 percent** of medical data is **unstructured**. If we need to use data to create more products used in medicine, **especially diagnosis prediction**, it needs to be in a ***standard, recognized, and understandable*** format. Currently, some data can be written in barely spoken languages, may never be translated, and be very hard for professionals across the globe to use in products that can change lives.

This project **helps to solve that.** It **parses the file into a JSON**, outputting ***structured* data**, that AI can then diagnose.

In [None]:
!pip install timm==1.0.17 --q
!pip install transformers==4.53.2 --q
!pip install gradio --q

In [None]:
#Imports
import kagglehub #This is used to help us download the model
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer #We are importing the model, processor, and tokenizer from 
#Transformers
import gradio as gr #This is being used to make a clean user interface, using gradio to even make a web application.

In [None]:
# Download the model from Kaggle
# This path points to the model downloaded in the notebook
GEMMA_PATH = kagglehub.model_download("google/gemma-3n/transformers/gemma-3n-e2b-it")

# Set up the tokenizer, model, and processor
tokenizer = AutoTokenizer.from_pretrained(GEMMA_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(GEMMA_PATH, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(GEMMA_PATH)

In [None]:
#Patient file for ITP patient
patient_file="""
Name: John Doe

DoB: (mm/dd/yyyy): 07/31/1983

Gender: Male

Presents With
1. Open wounds not clotting
2. Ability to easily be bruised
3. Purple-Red spots all over the body
4. Started in childhood
5. Easily has nosebleeds

Relevant Measurements
Height: 5 feet 6 inches
Weight: 170 lbs
Platelet Count: 12000 per microlitre"""

In [None]:
#Defines diagnosis function
def diagnose_file_two_steps(patient_file, model, processor):
    #Uses few-shot prompting to make a structured JSON output
    JSON_req= """Parse a hospital patient's file into a valid JSON.

Example:
A 5 year old male patient has a common cold and has symptoms including a runny nose and sneezing. He has a temprature of 98 degrees Farenheit and an SpO2 of 98. He weighs 40 pounds.

JSON Response:


```
{
"age" : 5,
"gender": "male",
"disease": "common cold",
"symptoms": ["sneezing", "runny nose"],
"temprature": "98 (F)",
"SpO2": "98",
"weight" : "40 lbs"
}
```

Example: A 13 year old has a stomach bug, is vomiting, and has had extreme loss of appetite. They are running a temprature of 37 degrees Celsius. They are allergic to peanuts and are not on any current medications. They weigh 90 kilograms.

```
{
"age" : 13,
"disease": "stomach bug",
"symptoms": ["vomiting", "appetite loss"],
"temprature": "37 (C)",
"allergies" : ["peanuts"],
"current medications" : null,
"weight" : "90 kgs"
}
```

Example: "The patient is complaining of sudden onset of sharp chest pain radiating to the left arm, accompanied by sweating and dizziness. They report no known allergies, are 55 years old, and identify as male."
```
{
"age" : 55,
"disease": null,
"symptoms": ["radiating chest pain (left arm)","sweating","dizziness"],
"gender": "male",
"allergies" : null
}
```
"""
    #Sends the file and the JSON request to the model
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": patient_file},
                {"type": "text", "text": JSON_req}
            ]
        }
    ]
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt"
    ).to(model.device, dtype=model.dtype)
    input_len = inputs["input_ids"].shape[-1]
    outputs = model.generate(**inputs, max_new_tokens=512, disable_compile=True) #Generates outputs
    data = processor.batch_decode(
        outputs[:, input_len:],
        skip_special_tokens=True,
        clean_up_tokenization_spaces=True
    )[0]
    objective="""You are a medical professional. Assume patients and professionals have read the JSON data. If the official diagnosis is not given (or disease is marked as null), generate potential diagnoses. State the specific disease, not just a category, symptom, tendency, or cause. Put your diagnoses in order of whatever you think is most likely, your first one should be the most likely. You have 512 tokens to do this. Only say the diagnoses and a disclaimer about being an AI, and how real professionals are needed, etc."""
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": patient_file},
                {"type": "text", "text": data},
                {"type": "text", "text": objective}
            ]
        }
    ]
    
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt"
    ).to(model.device, dtype=model.dtype)
    input_len = inputs["input_ids"].shape[-1]
    
    outputs = model.generate(**inputs, max_new_tokens=512, disable_compile=True)
    diagnoses = processor.batch_decode(
        outputs[:, input_len:],
        skip_special_tokens=True,
        clean_up_tokenization_spaces=True
    )[0]
    return data,diagnoses

In [None]:
#Sets up Gradio interface
iface = gr.Interface(
    # This lambda function correctly passes all required arguments
    fn=lambda patient_text: diagnose_file_two_steps(patient_text, model, processor),
    
    inputs=gr.Textbox(lines=15, label="Patient File", value=patient_file),
    outputs=[
        gr.Markdown(label="Step 1: Generated JSON"),
        gr.Markdown(label="Step 2: Potential Diagnoses")
    ],
    title="Gemma 3n: Clinical Diagnosis Support"
)

print("This project is testing whether Gemma can predict the diagnosis based on the patient file, without knowing the diagnosis in advance. \nThe correct diagnosis for the default patient file that is there which Gemma does not know is Immune Thrombocytopenic Purpura (a.k.a. Idiopathic Thrombocytopenic Purpura). \nThe first provided diagnosis is what the AI thinks is most likely.\n")

# Launch the app
iface.launch(share=True)