# Workflow of the Medical NLP Pipeline

This pipeline extracts structured medical insights from physician-patient conversations using Google Gemini AI. The workflow consists of the following steps:

1️⃣ File Upload       
	•	Users upload a .txt file containing the conversation.                              
	•	The text is read and preprocessed for further analysis.

2️⃣ NLP Processing with Google Gemini API

The conversation undergoes three levels of Natural Language Processing:   
	•	Medical NLP Summarization 🏥: Extracts patient details, symptoms, diagnosis, treatment, and prognosis in JSON format.   
	•	Sentiment & Intent Analysis 💬: Determines the emotional tone of the patient and the overall purpose of the consultation.   
	•	SOAP Notes Generation 📋: Structures clinical findings into Subjective, Objective, Assessment, and Plan (SOAP) format.

3️⃣ Displaying Extracted JSON Output    
	•	The extracted details are displayed in JSON format using st.json() in Streamlit.  
	•	Provides structured output, making it easy for healthcare professionals to interpret.

4️⃣ Downloading Results     
	•	Users can download the extracted JSON files for each processed result.    
	•	These structured outputs can be used for medical documentation, research, or AI model training.

## Install the pre-Requirements

In [3]:
!pip install streamlit google-generativeai pyngrok

Collecting streamlit
  Downloading streamlit-1.43.1-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting pyngrok
  Downloading pyngrok-7.2.3-py3-none-any.whl.metadata (8.7 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.43.1-py2.py3-none-any.whl (9.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.7/9.7 MB[0m [31m39.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyngrok-7.2.3-py3-none-any.whl (23 kB)
Downloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m60.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-6.0.0-py3-none-manylinux2014_x86_64

In [11]:
!streamlit run app.py &> logs.txt &

In [13]:
!ngrok authtoken 2uB6NnHqdZU5zB1JGvmeEbqpEq6_23Pnuak2kHAbHgkcHjSXE

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [20]:
%%writefile app.py
import streamlit as st
import google.generativeai as genai
import json
import re

#  Configure Google Gemini API
API_KEY = "AIzaSyBnnYESRnUWV3OCYVQILJZImyCFkYiX_vE"
genai.configure(api_key=API_KEY)

#  Function to clean JSON response
def clean_json_response(response_text):
    cleaned_text = re.sub(r"```json\s*|\s*```", "", response_text).strip()
    try:
        return json.loads(cleaned_text)
    except json.JSONDecodeError:
        return {"error": "Invalid JSON format", "response": cleaned_text}

#  Medical NLP Processing Functions
def medical_nlp_summarization(text):
    prompt = f"""
    Extract structured medical details in JSON format:
    {{
      "Patient_Name": "Full Name",
      "Symptoms": ["Symptom1", "Symptom2"],
      "Diagnosis": "Medical Diagnosis",
      "Treatment": ["Treatment1", "Treatment2"],
      "Current_Status": "Patient's Current Condition",
      "Prognosis": "Expected future outcome"
    }}

    Conversation:
    {text}
    """
    model = genai.GenerativeModel("gemini-1.5-pro")
    response = model.generate_content(prompt)
    return clean_json_response(response.text)

def sentiment_intent_analysis(text):
    prompt = f"""
    Analyze sentiment & intent in JSON:
    {{
      "Sentiment": "Emotion of the patient",
      "Intent": "Purpose of the conversation"
    }}

    Conversation:
    {text}
    """
    model = genai.GenerativeModel("gemini-1.5-pro")
    response = model.generate_content(prompt)
    return clean_json_response(response.text)

def generate_soap_notes(text):
    prompt = f"""
    Extract SOAP notes:
    {{
      "Subjective": {{"Chief_Complaint": "Primary issue", "History_of_Present_Illness": "Condition details"}},
      "Objective": {{"Physical_Exam": "Findings", "Observations": "General observations"}},
      "Assessment": {{"Diagnosis": "Condition", "Severity": "Severity level"}},
      "Plan": {{"Treatment": "Recommended care", "Follow-Up": "Next steps"}}
    }}

    Conversation:
    {text}
    """
    model = genai.GenerativeModel("gemini-1.5-pro")
    response = model.generate_content(prompt)
    return clean_json_response(response.text)

#  Streamlit UI
st.title("🩺 Medical NLP Pipeline")
st.write("Upload a **.txt file** with a physician-patient conversation to extract structured details.")

uploaded_file = st.file_uploader("Upload a text file", type=["txt"])

if uploaded_file is not None:
    conversation_text = uploaded_file.read().decode("utf-8")

    # ✅ Display extracted text
    st.subheader("📝 Extracted Text")
    st.text_area("Raw Conversation", conversation_text, height=300)

    # ✅ Run API Calls & Show JSON Outputs
    with st.spinner("Extracting Medical Details..."):
        summary_output = medical_nlp_summarization(conversation_text)

    with st.spinner("Analyzing Sentiment & Intent..."):
        sentiment_output = sentiment_intent_analysis(conversation_text)

    with st.spinner("Generating SOAP Notes..."):
        soap_output = generate_soap_notes(conversation_text)

    #  Display JSON Outputs Properly
    st.subheader("📌 Medical NLP Summary")
    st.json(summary_output)

    st.subheader("💡 Sentiment & Intent")
    st.json(sentiment_output)

    st.subheader("📋 SOAP Notes")
    st.json(soap_output)

    #  Provide Download Options
    st.download_button("⬇ Download Summary", json.dumps(summary_output, indent=2), "summary.json", "application/json")
    st.download_button("⬇ Download Sentiment & Intent", json.dumps(sentiment_output, indent=2), "sentiment.json", "application/json")
    st.download_button("⬇ Download SOAP Notes", json.dumps(soap_output, indent=2), "soap_notes.json", "application/json")

Overwriting app.py


In [21]:
from pyngrok import ngrok

# Open the tunnel using 'addr' instead of 'port'
public_url = ngrok.connect(addr="8501", proto="http")

print(f"🚀 Streamlit is live at: {public_url}")

🚀 Streamlit is live at: NgrokTunnel: "https://c3a3-34-138-156-203.ngrok-free.app" -> "http://localhost:8501"
