In [1]:
pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.3.16-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.16 (from langchain_community)
  Downloading langchain-0.3.17-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.32 (from langchain_community)
  Downloading langchain_core-0.3.33-py3-none-any.whl.metadata (6.3 kB)
Collecting langsmith<0.4,>=0.1.125 (from langchain_community)
  Downloading langsmith-0.3.2-py3-none-any.whl.metadata (14 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.7.1-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_communi

In [None]:
pip install ollama

# Examples

# Chain of thought

In [56]:
import ollama

# Define a function to create a medical AI agent with Chain of Thought reasoning and print reasoning steps
def create_medical_ai_agent_with_chain_of_thought(question, knowledge_base):
    # Prompt to guide DeepSeek AI through a step-by-step chain of thought for reasoning
    prompt = f"""
    You are a medical AI assistant. Your task is to answer medical questions by reasoning through them step by step. You should refer to the following knowledge base to help you reason through the answer.

    Knowledge base:
    {knowledge_base}

    The user has the following question: "{question}"

    Please reason through the answer carefully, explaining your thought process step by step before arriving at a conclusion.

    Begin by explaining the key symptoms, followed by possible treatments or management, and conclude with a final recommendation.
    """
    
    response = ollama.chat(model="deepseek-r1:1.5b", messages=[
        {"role": "user", "content": prompt}
    ])
    
    reasoning = response['message']['content']
    
    # Print each reasoning step separately
    print("Reasoning Steps:")
    steps = reasoning.split("Step")  # Split reasoning into steps based on the word "Step"
    
    for i, step in enumerate(steps):
        if step.strip():  # Avoid empty steps
            print(f"Step {i}: {step.strip()}")
    
    return reasoning

# Example usage: Medical AI agent answering a user query with Chain of Thought reasoning
medical_knowledge_base = """
- The human body has several organ systems, such as the cardiovascular, respiratory, and digestive systems.
- Hypertension (high blood pressure) can lead to serious health complications if untreated.
- Type 2 diabetes is a condition in which the body does not properly process glucose, leading to high blood sugar levels.
"""

user_question = "What are the symptoms of diabetes and how can it be managed?"
medical_ai_response = create_medical_ai_agent_with_chain_of_thought(user_question, medical_knowledge_base)

# Print the full response (which includes reasoning steps)
print("\nFull Response:")
print(medical_ai_response)


Reasoning Steps:
Step 0: <think>
Okay, so I need to figure out the symptoms of diabetes and how it can be managed. Let me start by recalling what I know about diabetes from my textbook.

Diabetes is a chronic condition where the body doesn't process sugar properly. There are two main types: type 1 (ketoseketic) and type 2 (glucose-glucagon imbalance). Type 2 is more common, especially in people with existing kidney problems or heart conditions.

Symptoms of diabetes include high blood pressure, irregular blood sugar levels, weight loss, fatigue, and difficulty breathing. It can also lead to severe complications like heart disease, kidney issues, and even cancer in some cases.

For treatment, lifestyle changes seem crucial. Things like exercise, diet control (more vegetables and fruits instead of sugary carbs), eating bland foods with lots of fiber, avoiding sugar intake, and staying hydrated are all part of managing diabetes. Medications include insulin for type 2 diabetes, blood press

# Patient dataframe

In [57]:
import requests

# Load the JSON file from the provided GitHub URL
url = "https://raw.githubusercontent.com/buithehai1994/EHR_data/refs/heads/main/gemini_embbedings/embedding/combined_embeddings_part_1.json"
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    combined_embeddings_part_1 = response.json()
else:
    combined_embeddings_part_1 = None
    print("Failed to load the file. Status code:", response.status_code)


In [58]:
import pandas as pd
# Convert the embedding dictionary to a DataFrame
embeddings_df = pd.DataFrame.from_dict(combined_embeddings_part_1, orient='index', columns=['explanation_embedding', 'symptom_embedding'])


In [59]:
import requests
# Load the JSON file containing the text data from the provided GitHub URL
url = "https://raw.githubusercontent.com/buithehai1994/EHR_data/refs/heads/main/gemini_embbedings/text/combined_text_part_1.json"
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    text_data = response.json()
    
    # Convert the text data into a DataFrame
    text_df = pd.DataFrame.from_dict(text_data, orient='index')

In [60]:
df = pd.concat([text_df[['explanation_text', 'symptom_text']], embeddings_df], axis=1)

In [61]:
df

Unnamed: 0,explanation_text,symptom_text,explanation_embedding,symptom_embedding
0,"Contracture, right hand.",The patient has a history of right hand injury...,"[-0.03271866, 0.071765654, -0.0060667074, -0.0...","[-0.009817047, 0.012372132, -0.012895934, -0.0..."
1,"Contracture, left hand.",The patient has a history of left hand injury ...,"[-0.023690319, 0.058072004, -0.006365649, -0.0...","[-0.030821104, 0.04130737, -0.005375788, -0.00..."
2,"Contracture, right knee.",The patient has a history of osteoarthritis in...,"[-0.011912893, 0.05268591, 0.04313283, -0.0315...","[0.009699812, -0.014461355, 0.0044916132, -0.0..."
3,"Ankylosis, left hip.",The patient has a history of ankylosing spondy...,"[0.026297545, -0.008600006, -0.0055727954, -0....","[0.010980713, -0.009944922, -0.033766493, 0.00..."
4,"Ankylosis, right knee.",The patient has a history of osteoarthritis in...,"[0.0062069264, 0.0075428705, 0.009734267, -0.0...","[0.028421467, -0.030635223, 0.0066009145, -0.0..."
...,...,...,...,...
95,Cervical disc disorder at C6-C7 level with rad...,The patient has a history of chronic neck pain...,"[0.027520735, -0.037193526, -0.03875883, 0.002...","[0.03314402, -0.043936115, -0.020325521, 0.016..."
96,"Other cervical disc displacement, unspecified ...",The patient has a history of chronic neck pain...,"[0.08419407, 0.017211054, -0.04339267, -0.0150...","[0.07503162, -0.03481984, -0.014823512, 0.0064..."
97,Other cervical disc displacement at C4-C5 level.,The patient has a history of chronic neck pain...,"[0.08260568, 0.033038124, -0.032680806, 0.0073...","[0.030560875, -0.029258406, -0.031536162, 0.03..."
98,"Other cervical disc degeneration, unspecified ...",The patient has a history of occasional neck p...,"[0.07661236, 0.022201793, -0.017057983, -0.008...","[0.06458364, -0.0045670103, -0.002989473, -0.0..."


In [62]:
df_copy=df.copy()

In [63]:
# Input symptom to find related explanations
input_symptom = "I have been experiencing tightness and stiffness in my right knee, particularly when I try to bend or straighten my leg. \
The area around my knee feels stiff, and I notice a loss of flexibility when walking or climbing stairs. I sometimes feel a pulling sensation, \
and my knee feels weak, making it difficult to fully extend or flex my leg. Additionally, I’ve noticed some mild swelling and discomfort after \
prolonged sitting or standing. I’m concerned this could be a sign of a more serious condition, \
and I’m looking for possible explanations and treatments that could help alleviate the discomfort and improve my knee's mobility."


In [64]:
import json
import requests

# Function to load API key from the config.json file
def load_api_key():
    with open('api.json', 'r') as file:
        config = json.load(file)
        return config.get('API_KEY')

# Retrieve the API key from the JSON file
api_key = load_api_key()

In [65]:
import google.generativeai as genai
import os
genai.configure(api_key=api_key) 

In [66]:
input_symptom_embedding = genai.embed_content(
            model="models/text-embedding-004",
            content=[input_symptom]  # Pass the symptom response as a list of texts
        )
input_symptom_embedding =input_symptom_embedding['embedding'][0] 

In [67]:
import numpy as np
# Convert 'symptom_embedding' to a list of numpy arrays for cosine similarity calculation
df_copy['symptom_embedding'] = df_copy['symptom_embedding'].apply(np.array)

In [68]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute cosine similarity between input symptom embedding and each row in the DataFrame
similarities = cosine_similarity([input_symptom_embedding], df_copy['symptom_embedding'].tolist())[0]

# Find the top 5 indices based on highest cosine similarity
top_indices = np.argsort(similarities)[-5:][::-1]  # Sort in descending order to get top 5 similarities

# Retrieve the top 5 most similar rows from the DataFrame
top_5_similar_explanations = df_copy.iloc[top_indices]

In [69]:
top_5_similar_explanations

Unnamed: 0,explanation_text,symptom_text,explanation_embedding,symptom_embedding
23,Pain in right knee.,The patient has a history of occasional knee p...,"[0.029045172, -0.032945734, 0.005549423, -0.06...","[0.015254361, -0.07659199, -0.008384538, -0.03..."
2,"Contracture, right knee.",The patient has a history of osteoarthritis in...,"[-0.011912893, 0.05268591, 0.04313283, -0.0315...","[0.009699812, -0.014461355, 0.0044916132, -0.0..."
4,"Ankylosis, right knee.",The patient has a history of osteoarthritis in...,"[0.0062069264, 0.0075428705, 0.009734267, -0.0...","[0.028421467, -0.030635223, 0.0066009145, -0.0..."
10,"Effusion, right knee.",The patient has a history of osteoarthritis in...,"[0.010197979, -0.03559391, -0.015860373, -0.03...","[0.04640463, -0.09728943, -0.013180755, -0.016..."
11,"Effusion, left knee.",The patient has a history of osteoarthritis in...,"[0.013522673, -0.052507658, -0.013048703, -0.0...","[0.052097745, -0.08051235, 0.011335841, 0.0077..."


# Summarize insights with deep seek ai

In [70]:
insights_prompt = f"""
Given the following input symptom and the top 5 most similar explanations from a medical database, summarize key insights about the patient's condition, possible causes, and treatment suggestions:

### Input Symptom:
{input_symptom}

### Top 5 Similar Explanations:
1. Explanation: {top_5_similar_explanations.iloc[0]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[0]['symptom_text']}
2. Explanation: {top_5_similar_explanations.iloc[1]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[1]['symptom_text']}
3. Explanation: {top_5_similar_explanations.iloc[2]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[2]['symptom_text']}
4. Explanation: {top_5_similar_explanations.iloc[3]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[3]['symptom_text']}
5. Explanation: {top_5_similar_explanations.iloc[4]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[4]['symptom_text']}

### Insights:
Summarize the patient's condition based on the similarities and suggest possible medical risks or treatments.
"""

In [71]:
import requests
import json

# Endpoint URL
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key={api_key}"

# Function to interact with the Gemini API using the predefined insights prompt
def chat(insights_prompt):
    # Payload for the API request
    payload = {
        "contents": [
            {
                "parts": [
                    {"text": insights_prompt}
                ]
            }
        ]
    }

    # Headers for the request
    headers = {
        "Content-Type": "application/json"
    }

    # Send the request
    response = requests.post(url, headers=headers, data=json.dumps(payload))

    # Process and display the response
    if response.status_code == 200:
        response_json = response.json()
        candidates = response_json.get("candidates", [])
        if candidates:
            content_parts = candidates[0].get("content", {}).get("parts", [])
            if content_parts:
                for part in content_parts:
                    text = part.get("text", "")
                    formatted_text = text.replace("**", "")
                    print(formatted_text)
                    print("\n" + "-" * 80 + "\n")  # Add separators for readability
            else:
                print("No content parts found.")
        else:
            print("No candidates found in the response.")
    else:
        print(f"Error: {response.status_code}")
        print("Message:", response.text)


In [72]:
# Define input symptom and generate its embedding
input_symptom = "I have been experiencing tightness and stiffness in my right knee..."
input_symptom_embedding = genai.embed_content(
    model="models/text-embedding-004",
    content=[input_symptom]  # Pass as a list
)

# Extract the actual embedding vector
input_embedding = np.array(input_symptom_embedding['embedding'][0])

# Compute cosine similarity
similarities = cosine_similarity([input_embedding], np.stack(df_copy["symptom_embedding"].values))[0]

# Get the top 5 most similar explanations
top_5_indices = np.argsort(similarities)[::-1][:5]  
top_5_similar_explanations = df_copy.iloc[top_5_indices]

# Print results
top_5_similar_explanations[["explanation_text", "symptom_text"]]

Unnamed: 0,explanation_text,symptom_text
23,Pain in right knee.,The patient has a history of occasional knee p...
2,"Contracture, right knee.",The patient has a history of osteoarthritis in...
4,"Ankylosis, right knee.",The patient has a history of osteoarthritis in...
10,"Effusion, right knee.",The patient has a history of osteoarthritis in...
11,"Effusion, left knee.",The patient has a history of osteoarthritis in...


In [73]:
# Proceed to generate insights using DeepSeek AI for reasoning
insights_prompt = f"""
### Input Symptom:
{input_symptom}

### Top 5 Similar Explanations:
1. Explanation: {top_5_similar_explanations.iloc[0]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[0]['symptom_text']}
2. Explanation: {top_5_similar_explanations.iloc[1]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[1]['symptom_text']}
3. Explanation: {top_5_similar_explanations.iloc[2]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[2]['symptom_text']}
4. Explanation: {top_5_similar_explanations.iloc[3]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[3]['symptom_text']}
5. Explanation: {top_5_similar_explanations.iloc[4]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[4]['symptom_text']}

### Insights:
Summarize the patient's condition based on the similarities and suggest possible medical risks or treatments.
"""

# Use DeepSeek for reasoning
response = ollama.chat(model="deepseek-r1:1.5b", messages=[{"role": "user", "content": insights_prompt}])

# Print response
print(response)

model='deepseek-r1:1.5b' created_at='2025-01-30T21:49:46.4237434Z' done=True done_reason='stop' total_duration=239407490300 load_duration=315109700 prompt_eval_count=904 prompt_eval_duration=79931000000 eval_count=979 eval_duration=159149000000 message=Message(role='assistant', content='<think>\nAlright, let me tackle this query step by step. The user has provided a detailed scenario about someone experiencing tightness and stiffness in their right knee. They\'ve also given five possible explanations for this symptom, each with different medical approaches and outcomes.\n\nFirst, I need to analyze the symptoms and the top five explanations provided. It seems like all these explanations are related to the same underlying issue—knee pain or swelling—but they\'re categorized under different diagnoses: pain in knee, contracture, ankylosis, effusion, or muscle weakness.\n\nThe patient is female from the Caucasian population. That\'s useful for narrowing down possible conditions, especially 

In [74]:
print(response['message']['content'])

<think>
Alright, let me tackle this query step by step. The user has provided a detailed scenario about someone experiencing tightness and stiffness in their right knee. They've also given five possible explanations for this symptom, each with different medical approaches and outcomes.

First, I need to analyze the symptoms and the top five explanations provided. It seems like all these explanations are related to the same underlying issue—knee pain or swelling—but they're categorized under different diagnoses: pain in knee, contracture, ankylosis, effusion, or muscle weakness.

The patient is female from the Caucasian population. That's useful for narrowing down possible conditions, especially if there were any additional symptoms mentioned but it seems like the main issue is pain/condition with physical examination findings.

Looking at the possible causes:

1. **Pain in knee**: This could be osteoarthritis or a contracture, as both involve joint inflammation leading to discomfort.
2

# Lang chain and Gemini

In [75]:
import requests
import json

# Endpoint URL for Gemini API
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key={api_key}"

# Function to interact with the Gemini API using the predefined insights prompt
def chat_with_gemini(insights_prompt):
    # Payload for the API request
    payload = {
        "contents": [
            {
                "parts": [
                    {"text": insights_prompt}
                ]
            }
        ]
    }

    # Headers for the request
    headers = {
        "Content-Type": "application/json"
    }

    # Send the request to the Gemini API
    response = requests.post(url, headers=headers, data=json.dumps(payload))

    # Process and display the response
    if response.status_code == 200:
        response_json = response.json()
        candidates = response_json.get("candidates", [])
        if candidates:
            content_parts = candidates[0].get("content", {}).get("parts", [])
            if content_parts:
                for part in content_parts:
                    text = part.get("text", "")
                    formatted_text = text.replace("**", "")
                    print("Gemini Response:\n", formatted_text)
                    print("\n" + "-" * 80 + "\n")  # Add separators for readability
            else:
                print("No content parts found in Gemini response.")
        else:
            print("No candidates found in Gemini response.")
    else:
        print(f"Error: {response.status_code}")
        print("Message:", response.text)

# Define the insights prompt based on your user's input
user_input = "What are the possible causes and treatments for knee pain?"

insights_prompt = f"""
The user has asked the following medical question:
"{user_input}"

You are given a medical database with symptoms and explanations, as well as embeddings that represent the content in numerical form. The embeddings help capture the meaning of the content, which will allow for better matching and retrieval of relevant explanations based on similarity. These embeddings are not explicitly required for you to use, but they provide additional context for finding the most relevant information.

Here is the dataset with symptoms, explanations, and their corresponding embeddings:
{df[['symptom_text', 'explanation_text', 'symptom_embedding', 'explanation_embedding']].to_string(index=False)}

Embedding Generation:
You can also generate embeddings for any query using the code `genai.embed_content(model="models/text-embedding-004", content=[user_input])`. This will create an embedding for the user's query, which can be used to compare similarity with the embeddings in the dataset.

### Reasoning:
Your task is to:
1. Analyze the user's query and decide the next steps.
2. Search the provided dataset to find the top 5 most relevant explanations that match the user's query.
3. Summarize the key insights from these explanations, including possible causes, potential treatments, and next steps for the user.
"""

# Call Gemini API to compare the response
chat_with_gemini(insights_prompt)


Gemini Response:
 The user's query "What are the possible causes and treatments for knee pain?" is broad.  To provide a helpful response, we need to focus on knee-specific causes and treatments from the provided dataset.  The dataset, however, primarily contains information about hand, wrist, shoulder, hip, and other joint problems, with only a few entries related to the knee.  Therefore, a direct "top 5 most relevant explanations" approach is insufficient.  We will instead extract all knee-related information and synthesize a response.

Knee-related information from the dataset:

There are three entries related to the knee:

* Entry 1 (Contracture, right knee): This entry discusses osteoarthritis as a cause of knee contracture (stiffness and limited movement). Treatment involved physical therapy and NSAIDs.  The patient responded well.

* Entry 2 (Ankylosis, right knee):  This describes osteoarthritis progressing to ankylosis (complete bony fusion) of the right knee.  Treatment includ

# Gemini Reasoning Model

A reasoning model typically handles just one piece of the puzzle—it focuses on logic and reasoning to process data and make decisions. However, LangChain provides a full toolkit that allows you to build a complete workflow. It integrates different components such as data sources, large language models (LLMs), and reasoning models, enabling you to handle complex scenarios more effectively. This integration allows for seamless workflows where the reasoning model can be part of a larger process that involves data collection, analysis, decision-making, and communication of results in a way that is actionable and user-friendly.

In [81]:
pip install google-genai

Collecting google-genai
  Downloading google_genai-0.7.0-py3-none-any.whl.metadata (23 kB)
Downloading google_genai-0.7.0-py3-none-any.whl (122 kB)
Installing collected packages: google-genai
Successfully installed google-genai-0.7.0
Note: you may need to restart the kernel to use updated packages.


In [84]:
from google.genai import Client

# Construct the prompt for insights using the symptom data and top explanations
insights_prompt = f"""
### Input Symptom:
{input_symptom}

### Top 5 Similar Explanations:
1. Explanation: {top_5_similar_explanations.iloc[0]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[0]['symptom_text']}
2. Explanation: {top_5_similar_explanations.iloc[1]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[1]['symptom_text']}
3. Explanation: {top_5_similar_explanations.iloc[2]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[2]['symptom_text']}
4. Explanation: {top_5_similar_explanations.iloc[3]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[3]['symptom_text']}
5. Explanation: {top_5_similar_explanations.iloc[4]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[4]['symptom_text']}

### Insights:
Summarize the patient's condition based on the similarities and suggest possible medical risks or treatments.
"""

# Set up your configuration (if you need to include thoughts)
config = {'thinking_config': {'include_thoughts': True}}

# Initialize the client with your API key
client = Client(api_key=api_key, http_options={'api_version': 'v1alpha'})

# Generate content from the model (Gemini model in this case)
response = client.models.generate_content(
    model='gemini-2.0-flash-thinking-exp',
    contents=insights_prompt,
    config=config
)

# Output the thinking process and the final insights (based on how the response is structured)
print(response.candidates[0].content.parts[0].text)  # The thinking process (if any)

### Insights:
Based on the top 5 similar explanations provided, here's a summary of the patient's condition and possible medical risks or treatments:

**Summary of Patient's Condition:**

The patient is experiencing **tightness and stiffness in their right knee**, which aligns with symptoms described in all the provided explanations.  The common threads across the similar cases are:

* **Right Knee Involvement:**  Four out of five examples specifically mention right knee issues, suggesting the patient's symptom location is relevant and a common presentation.
* **Stiffness and Limited Range of Motion:**  Terms like "stiffness," "limited range of motion," and "contracture" are frequently used, directly mirroring the patient's input symptom.
* **Potential Underlying Osteoarthritis:**  Many of the examples mention a history of osteoarthritis in the knee, suggesting this could be a possible underlying condition contributing to the patient's symptoms.
* **Pain as an Associated Symptom:** Whi

# Using deep seek ai

In [51]:
df_copy.head()

Unnamed: 0,explanation_text,symptom_text,explanation_embedding,symptom_embedding
0,"Contracture, right hand.",The patient has a history of right hand injury...,"[-0.03271866, 0.071765654, -0.0060667074, -0.0...","[-0.009817047, 0.012372132, -0.012895934, -0.0..."
1,"Contracture, left hand.",The patient has a history of left hand injury ...,"[-0.023690319, 0.058072004, -0.006365649, -0.0...","[-0.030821104, 0.04130737, -0.005375788, -0.00..."
2,"Contracture, right knee.",The patient has a history of osteoarthritis in...,"[-0.011912893, 0.05268591, 0.04313283, -0.0315...","[0.009699812, -0.014461355, 0.0044916132, -0.0..."
3,"Ankylosis, left hip.",The patient has a history of ankylosing spondy...,"[0.026297545, -0.008600006, -0.0055727954, -0....","[0.010980713, -0.009944922, -0.033766493, 0.00..."
4,"Ankylosis, right knee.",The patient has a history of osteoarthritis in...,"[0.0062069264, 0.0075428705, 0.009734267, -0.0...","[0.028421467, -0.030635223, 0.0066009145, -0.0..."


In [52]:
import numpy as np
import pandas as pd
import json

# Function to safely convert embeddings
def safe_convert_embedding(embedding):
    if isinstance(embedding, str):  # Convert only if it's a string
        return np.array(json.loads(embedding))
    elif isinstance(embedding, list):  # Already a list, just convert to NumPy array
        return np.array(embedding)
    elif isinstance(embedding, np.ndarray):  # Already a NumPy array
        return embedding
    else:
        raise TypeError(f"Unexpected embedding format: {type(embedding)}")

# Apply conversion to the dataframe
df_copy["explanation_embedding"] = df_copy["explanation_embedding"].apply(safe_convert_embedding)
df_copy["symptom_embedding"] = df_copy["symptom_embedding"].apply(safe_convert_embedding)

In [53]:
# Define input symptom and generate its embedding
input_symptom = "I have been experiencing tightness and stiffness in my right knee..."
input_symptom_embedding = genai.embed_content(
    model="models/text-embedding-004",
    content=[input_symptom]  # Pass as a list
)

# Extract the actual embedding vector
input_embedding = np.array(input_symptom_embedding['embedding'][0])

# Compute cosine similarity
similarities = cosine_similarity([input_embedding], np.stack(df_copy["symptom_embedding"].values))[0]

# Get the top 5 most similar explanations
top_5_indices = np.argsort(similarities)[::-1][:5]  
top_5_similar_explanations = df_copy.iloc[top_5_indices]

# Print results
top_5_similar_explanations[["explanation_text", "symptom_text"]]

Unnamed: 0,explanation_text,symptom_text
23,Pain in right knee.,The patient has a history of occasional knee p...
2,"Contracture, right knee.",The patient has a history of osteoarthritis in...
4,"Ankylosis, right knee.",The patient has a history of osteoarthritis in...
10,"Effusion, right knee.",The patient has a history of osteoarthritis in...
11,"Effusion, left knee.",The patient has a history of osteoarthritis in...


In [54]:
# Proceed to generate insights using DeepSeek AI for reasoning
insights_prompt = f"""
### Input Symptom:
{input_symptom}

### Top 5 Similar Explanations:
1. Explanation: {top_5_similar_explanations.iloc[0]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[0]['symptom_text']}
2. Explanation: {top_5_similar_explanations.iloc[1]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[1]['symptom_text']}
3. Explanation: {top_5_similar_explanations.iloc[2]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[2]['symptom_text']}
4. Explanation: {top_5_similar_explanations.iloc[3]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[3]['symptom_text']}
5. Explanation: {top_5_similar_explanations.iloc[4]['explanation_text']}
   Symptom: {top_5_similar_explanations.iloc[4]['symptom_text']}

### Insights:
Summarize the patient's condition based on the similarities and suggest possible medical risks or treatments.
"""

# Use DeepSeek for reasoning
response = ollama.chat(model="deepseek-r1:1.5b", messages=[{"role": "user", "content": insights_prompt}])

# Print response
print(response)

model='deepseek-r1:1.5b' created_at='2025-01-30T21:38:08.1466646Z' done=True done_reason='stop' total_duration=132997281200 load_duration=84039600 prompt_eval_count=904 prompt_eval_duration=213000000 eval_count=814 eval_duration=132698000000 message=Message(role='assistant', content='<think>\nOkay, I\'m trying to help this user who has a right knee with tightness and stiffness. Let me look at their input and top 5 explanations to understand what they\'re dealing with.\n\nFirst, the patient is female and from the Caucasian population. That might not be too surprising, but it\'s good to know for medical information.\n\nLooking at the top five explanations, all of them are talking about joint pain in the right knee. The first one mentions physical therapy because of tenderness and swelling. The second explanation discusses contracture with limited motion. The third is about ankylosis, which means complete fusion of the joint without space to move. Fourth is effusion leading to more pain, 

In [55]:
print(response['message']['content'])

<think>
Okay, I'm trying to help this user who has a right knee with tightness and stiffness. Let me look at their input and top 5 explanations to understand what they're dealing with.

First, the patient is female and from the Caucasian population. That might not be too surprising, but it's good to know for medical information.

Looking at the top five explanations, all of them are talking about joint pain in the right knee. The first one mentions physical therapy because of tenderness and swelling. The second explanation discusses contracture with limited motion. The third is about ankylosis, which means complete fusion of the joint without space to move. Fourth is effusion leading to more pain, especially when weight-bearing. Fifth also has effusion but in the left knee.

So the common thread here is that all these explanations are talking about joint pain in the right knee. The key differences seem to be whether it's tenderness/symptoms, contracture, ankylosis (which seems like com

- Gemini Model provides a broader range of potential conditions (osteoarthritis, effusion, pain) and emphasizes conservative treatments like RICE, NSAIDs, and physical therapy, along with diagnostic follow-ups.
- Deep Seek AI delves more into the structural and anatomical possibilities like contracture, ankylosis, and effusion, with more tailored treatment options for each condition, including possible surgical intervention.

Both models present a solid set of insights, with Gemini offering a more general overview and Deep Seek AI focusing more on structural details and potential severity of conditions. Depending on the severity and specifics of the knee condition, both approaches provide useful starting points for further exploration.