# **Project 2**

## **Business Use Case**

### **Problem Statement:**

In the fast-paced environment of healthcare centers, healthcare professionals often face the challenge of quick and accurate diagnosis of patients while managing an ever-increasing volume of medical information. Ensuring that healthcare providers have access to the latest and most comprehensive medical knowledge is crucial for improving patient outcomes and reducing the time needed to make informed decisions.

There are multiple challenges that these professionals encounter daily, a few being

- Information Overload: Medical professionals need to go through vast amounts of data and research to make accurate diagnoses and treatment plans. This can be overwhelming and time-consuming.
- Efficiency: For overall patient care and quality health outcomes, quick and accurate diagnosis is vital, especially in emergency situations.
- Access to Trusted Knowledge: In the ever-evolving healthcare industry, providing access to reliable and up-to-date medical information from renowned manuals and research papers is essential for maintaining high standards of care.


### **Objective:**

A renowned chain of hospitals has decided to leverage AI to build a state-of-the-art solution to help healthcare professionals overcome the aforementioned challenges. They have recruited you as an AI specialist and tasked you with building a RAG-based AI solution that leverages renowned medical manuals as its knowledge base. This AI system will act as a POC towards an end product that’ll assist healthcare professionals in making better, quicker, and more accurate diagnoses, ultimately leading to faster patient resolutions and enabling better patient outcomes by reducing errors in diagnosis, saving valuable time for information retrieval, and standardizing care practices across the board.

### **Questions:**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### **How This Application Empowers Professionals and Elevates Healthcare Organizations**

- **Enhanced Diagnostic Accuracy:**
Provides evidence-based insights for better diagnosis and treatment.

- **Time Efficiency:**
Instantly retrieves critical information, saving valuable time.

- **Improved Patient Care:**
Ensures informed decisions with up-to-date medical knowledge.

- **Cost-Effective Operations:**
Reduces redundant tests and consultation delays, lowering costs.

- **Knowledge Empowerment:**
Keeps doctors updated on the latest advancements.

- **Competitive Edge for the Hospital:**
Positions the hospital as a leader in healthcare innovation.

This collaboration between St. Bernard’s Medical Center and InnoviTech Solutions highlights the transformative potential of AI in revolutionizing healthcare.

## **1. Install and Import Required Libraries**

In [6]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Environment
from azure.identity import DefaultAzureCredential

# Step 1: Authenticate
credential = DefaultAzureCredential()
from azure.identity import InteractiveBrowserCredential
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="f7694127-2f65-4c53-aaef-a2750f7337d3",
    resource_group_name="defalt_resource_group",
    workspace_name="FitwellWorkspace"
)

# Step 2: Define environment
env_name = "denv"

custom_env = Environment(
    name=env_name,
    description="Custom environment with azureml-rag",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:20231010.v1",  # Compatible base image
    conda_file={
        "name": env_name,
        "channels": ["defaults"],
        "dependencies": [
            "python=3.10",
            "pip",
            {
                "pip": [
                    "azureml-rag",
                    "azureml-rag[faiss,hugging_face]>=0.2.36",
                    "azure-ai-ml"
                ]
            }
        ],
    },
)

# Step 3: Register the environment
ml_client.environments.create_or_update(custom_env)
print(f"Registered environment: {env_name}")


Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


Registered environment: denv


In [32]:
"""!conda deactivate
!conda create -n my_rag_env python=3.10 -y
!conda activate my_rag_env
!pip install azure-ai-ml[rag,promptflow] --upgrade --no-cache-dir -y
!pip install azure-ai-ml
!pip install -U 'azureml-rag[faiss,hugging_face]>=0.2.36'
!pip install azureml-rag==0.1.8
"""



CondaError: Run 'conda init' before 'conda deactivate'

Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ done
Solving environment: / done


    current version: 25.3.1
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /anaconda/envs/my_rag_env

  added / updated specs:
    - python=3.10


The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main 
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 
  ca-certificates    pkgs/main/linux-64::ca-certificates-2025.2.25-h06a4308_0 
  expat              pkgs/main/linux-64::expat-2.7.1-h6a678d5_0 
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 
  libffi             pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 
  libgcc-ng     

In [7]:
import numpy
import pandas
print(numpy.__version__)
#print(pandas.__version__)



1.23.5


## **2. Configure Azure Machine Learning Workspace**

### Get client for AzureML Workspace

In [8]:
# Import necessary AzureML and authentication libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient
from azureml.core import Workspace

In [9]:
# Define workspace configuration (replace with your details)
workspace_config = {
    "subscription_id": "f7694127-2f65-4c53-aaef-a2750f7337d3",  # Replace with your Azure subscription ID
    "resource_group": "defalt_resource_group",    # Replace with your Azure resource group name
    "workspace_name": "FitwellWorkspace"     # Replace with your AzureML workspace name
}

In [10]:
%%writefile workspace.json
{
    "subscription_id": "f7694127-2f65-4c53-aaef-a2750f7337d3",
    "resource_group": "defalt_resource_group",
    "workspace_name": "FitwellWorkspace"
}

Overwriting workspace.json


In [11]:
%%writefile config.json
{
    "AZURE_OPENAI_ENDPOINT":"https://test-moadel-openai-12345.openai.azure.com/",
    "AZURE_OPENAI_KEY":"YOUR_AZURE_OPENAI_API_KEY_HERE",
    "AZURE_OPENAI_APIVERSION":"2025-01-01-preview",
    "CHATGPT_MODEL":"gpt-4o-mini"
}

Overwriting config.json


In [12]:
# Initialize credentials for Azure authentication
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()


from azure.ai.ml.identity import AzureMLOnBehalfOfCredential  # auto for ML notebooks

credential = AzureMLOnBehalfOfCredential()
# Initialize the MLClient to connect with AzureML
ml_client = MLClient.from_config(credential=credential, path="workspace.json")



# Create an AzureML Workspace object
ws = Workspace(
    subscription_id=ml_client.subscription_id,
    resource_group=ml_client.resource_group_name,
    workspace_name=ml_client.workspace_name,
)


# Verify the client and workspace details
print(ml_client)

Found the config file in: workspace.json
Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


MLClient(credential=<azure.ai.ml.identity._credentials.aml_on_behalf_of.AzureMLOnBehalfOfCredential object at 0x75d3fd1424a0>,
         subscription_id=f7694127-2f65-4c53-aaef-a2750f7337d3,
         resource_group_name=defalt_resource_group,
         workspace_name=FitwellWorkspace)


Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


## **3. Register the Reports Dataset as a Data Asset**

Register the dataset in AzureML for further processing.

In [13]:
# Import libraries for data registration
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
import zipfile
import os

# Path to the ZIP file containing Tesla annual reports
zip_file_path = 'MedicalDiagnosisManuals.zip'

# Directory to extract the reports
extract_to_directory = './extracted_dataset_reports'
os.makedirs(extract_to_directory, exist_ok=True)

# Extract the ZIP file containing the reports
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_to_directory)


In [15]:
from azure.identity import InteractiveBrowserCredential
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="f7694127-2f65-4c53-aaef-a2750f7337d3",
    resource_group_name="defalt_resource_group",
    workspace_name="FitwellWorkspace"
)


Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


In [16]:

# Register the extracted data as a Data asset in AzureML
local_data_path = extract_to_directory
data_asset_name = "Medical-Manuals"
data_asset_description = "A collection of medical manuals used by St. Bernard's Medical Center for embedding generation and knowledge retrieval in the RAG system."

data_asset = Data(
    path=local_data_path,
    type=AssetTypes.URI_FOLDER,  # Registering as a folder URI
    description=data_asset_description,
    name=data_asset_name
)


# Use the MLClient to register the data asset
ml_client.data.create_or_update(data_asset)
print(f"Data asset '{data_asset.name}' registered successfully.")


Data asset 'Medical-Manuals' registered successfully.


## **4. Set Up Azure OpenAI Connection**

### Run the cells under _either_ heading (OpenAI or HuggingFace) to use the respective embedding model

#### **Note:**

When selecting an embedding model, ensure consistency between the embedding model used here and the one configured for your vectorstore retriever. Using mismatched models may result in dimension mismatches, leading to errors during vector retrieval. For example:

- If you use the `text-embedding-ada-002` model from Azure OpenAI for embedding creation, ensure the same model is specified for the vectorstore retriever.
- Similarly, if using a HuggingFace model like `all-mpnet-base-v2`, configure the vectorstore retriever with this model.

Maintaining alignment between the embedding model and vectorstore retriever is crucial for the proper functioning of your pipeline.

#### OpenAI

In [17]:
# # Azure Open AI redentials and the id of the deployed chat model are stored as
# # key value pairs in a json file

with open('config.json', 'r') as az_creds:
    data = az_creds.read()

# # Credentials to authenticate to the personalized Open AI model server
import json
creds = json.loads(data)

In [67]:
import sys
print("Python Executable:", sys.executable)
print("Python Path:", sys.path)

Python Executable: /anaconda/envs/azureml_py310_sdkv2/bin/python
Python Path: ['/anaconda/envs/azureml_py310_sdkv2/lib/python310.zip', '/anaconda/envs/azureml_py310_sdkv2/lib/python3.10', '/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/lib-dynload', '', '/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages', '/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/setuptools/_vendor', '/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azureml/_project/vendor']


In [68]:
#pip list | grep azure-ai-ml
!pip list | grep azureml-rag
#pip list | grep promptflow

azureml-rag                                0.1.8


In [69]:
!pip install azureml-rag


Collecting azure-storage-blob<=12.19.0,>=12.5.0
  Using cached azure_storage_blob-12.19.0-py3-none-any.whl (394 kB)
Installing collected packages: azure-storage-blob
  Attempting uninstall: azure-storage-blob
    Found existing installation: azure-storage-blob 12.25.1
    Uninstalling azure-storage-blob-12.25.1:
      Successfully uninstalled azure-storage-blob-12.25.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
azure-storage-file-datalake 12.20.0 requires azure-storage-blob>=12.25.1, but you have azure-storage-blob 12.19.0 which is incompatible.
adlfs 2024.12.0 requires fsspec>=2023.12.0, but you have fsspec 2023.10.0 which is incompatible.[0m[31m
[0mSuccessfully installed azure-storage-blob-12.19.0


In [18]:
import azureml
print(dir(azureml))


['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_async', '_base_sdk_common', '_common', '_compute', '_file_utils', '_history', '_html', '_logging', '_model_management', '_project', '_restclient', '_run_impl', '_tracing', '_vendor', '_workspace', 'core', 'data', 'exceptions', 'history']


In [71]:
!which python

/anaconda/envs/azureml_py38/bin//python


In [None]:
from azureml.rag.utils.connections import get_connection_by_name_v2, create_connection_v2

# # Define the connection name for Azure OpenAI
aoai_connection_name = "Custom_AzureOpenAI_Connection"

# # If the connection doesn't exist, create a new one
target = "https://test-moadel-openai-12345.openai.azure.com/"  # Replace with your Azure OpenAI endpoint
key = "YOUR_AZURE_OPENAI_API_KEY_HERE"          # Replace with your Azure OpenAI API key
api_version = "2025-01-01-preview"    # Replace with the appropriate API version

aoai_connection = create_connection_v2(
    workspace=ws,
    name=aoai_connection_name,
    category="AzureOpenAI",
    target=target,
    auth_type="ApiKey",
    credentials={"key": key},
    metadata={"ApiType": "azure", "ApiVersion": api_version},
)

aoai_connection_id = aoai_connection["id"]

print(f"Azure OpenAI connection created or retrieved successfully: {aoai_connection_id}")

ModuleNotFoundError: No module named 'azureml.rag'

To use the Azure OpenAI `text-embedding-ada-002` model for embedding generation, ensure the model has been deployed and is ready for inference. [Follow these instructions](https://learn.microsoft.com/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model) to deploy the embedding model in your Azure OpenAI resource.

Once deployed, obtain the credentials for the text-embedding-ada-002 model and save them in your `config.json` file. This is necessary for the following code to execute successfully.

Ensure that the `config.json` file includes the appropriate keys:

- `AZURE_OPENAI_EMBEDDING_MODEL`: Name of the Azure OpenAI embedding model.
- `AZURE_OPENAI_EMBEDING_DEPLOYMENT`: Deployment name for the embedding model.

Finally we will combine the deployment and model information into a uri form which the AzureML embeddings components expect as input.

In [None]:
aoai_embedding_model_name = "text-embedding-ada-002" # creds["AZURE_OPENAI_EMBEDDING_MODEL"]
aoai_embedding_deployment_name = "test-moadel-openai-12345" # creds["AZURE_OPENAI_EMBEDING_DEPLOYMENT"]
embeddings_model_uri = f"azure_open_ai://deployment/{aoai_embedding_deployment_name}/model/{aoai_embedding_model_name}"
print(f"Embedding Model URI: {embeddings_model_uri}")
#https://test-moadel-openai-12345.openai.azure.com/
embeddings_model_url="https://test-moadel-openai-12345.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15"

#### HuggingFace

In [None]:
embeddings_model_uri = "hugging_face://model/sentence-transformers/___________"   #Fill the blank with embedding model

## **5. Setup Pipeline to process data into Index**

### **Define Pipeline Components**

In [None]:

ml_registry = MLClient(credential=DefaultAzureCredential(), registry_name="azureml")

# Retrieve components for processing data, generating embeddings, and creating the FAISS index
crack_and_chunk_component = ml_registry.components.get(
    "llm_rag_crack_and_chunk", label="latest"
)
generate_embeddings_component = ml_registry.components.get(
    "llm_rag_generate_embeddings", label="latest"
)
create_faiss_index_component = ml_registry.components.get(
    "llm_rag_create_faiss_index", label="latest"
)
register_mlindex_component = ml_registry.components.get(
    "llm_rag_register_mlindex_asset", label="latest"
)


### **Build the AzureML Pipeline**

In [None]:
from azure.ai.ml import Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities._job.pipeline._io import PipelineInput
from typing import Optional

In [None]:
# Utility function for automatic compute configuration
def use_automatic_compute(component, instance_count=1, instance_type="Standard_NC4as_T4_v3"):
    """Configure a component to use automatic compute."""
    component.set_resources(
        instance_count=instance_count,
        instance_type=instance_type,
        properties={"compute_specification": {"automatic": True}},
    )
    return component


# Utility function to check if optional pipeline inputs are provided
def optional_pipeline_input_provided(input: Optional[PipelineInput]):
    """Check if optional pipeline inputs are provided."""
    return input is not None and input._data is not None

In [None]:
@pipeline(default_compute="tconn")    #Fill the blank with name of the pipeline and compute
def diagnosismanuals_to_faiss(
    data_asset_path: str,
    embeddings_model: str,
    asset_name: str,
    chunk_size: int = 1024,    #Provide the chink size 
    data_source_glob: str = None,
    document_path_replacement_regex: str = None,
    aoai_connection_id=None,
    embeddings_container=None,
):
    """Pipeline to process medical diagnosis manuals and create a FAISS vector index for knowledge retrieval."""
    
    # Step 1: Chunk data into smaller pieces
    crack_and_chunk = crack_and_chunk_component(
        input_data=Input(type="uri_folder", path=data_asset_path),  # Input data asset and fill the blank with proper type 
        input_glob=data_source_glob,
        chunk_size=chunk_size,
        document_path_replacement_regex=document_path_replacement_regex,
    )
    use_automatic_compute(crack_and_chunk)  # Apply compute configuration

    # Step 2: Generate embeddings for the data chunks
    generate_embeddings = generate_embeddings_component(
        chunks_source=crack_and_chunk.outputs.output_chunks,
        embeddings_container=embeddings_container,
        embeddings_model=embeddings_model,
    )
    use_automatic_compute(generate_embeddings)  # Apply compute configuration

    # Step 3: Create a FAISS vector index from embeddings
    create_faiss_index = create_faiss_index_component(
        embeddings=generate_embeddings.outputs.embeddings,    #Fill the balnk with proper function
    )
    use_automatic_compute(create_faiss_index)  # Apply compute configuration

    # Step 4: Register the FAISS index as an MLIndex asset
    register_mlindex = register_mlindex_component(
        storage_uri=create_faiss_index.outputs.index,    #Fill the balnk with proper function
        asset_name=asset_name
    )
    use_automatic_compute(register_mlindex) # Apply compute configuration
    
    return {
        "mlindex_asset_uri": create_faiss_index.outputs.index,
        "mlindex_asset_id": register_mlindex.outputs.asset_id,
    }

## **6.Submit the Pipeline**

This section covers how to instantiate the AzureML pipeline, configure its inputs, and submit it for execution. The pipeline processes the data, generates embeddings, creates a FAISS-based vector index, and registers the output as an AzureML asset.

#### **Note**
- When setting up the FAISS index asset in the pipeline, it is crucial to maintain a separate asset name for each embedding model used. This practice ensures that the FAISS index assets created with different models, such as HuggingFace or OpenAI embeddings, are clearly distinguished and managed independently. 
- The reason for this separation is to avoid any confusion or conflicts when downloading and using the FAISS index later in the pipeline. Each embedding model generates unique vector representations, and mismatches between the index and the model used for retrieval can lead to dimension mismatch errors or incorrect results. By assigning distinct asset names for each model, you can ensure the correct FAISS index is used for vector-based similarity searches, enabling a seamless and error-free workflow.

In [None]:
# Define the asset name and data source glob pattern
asset_name = "finsights_faiss"  # Name for the FAISS index asset
data_source_glob = "**/*.pdf"  # Pattern to match input data files

In [None]:
# Get the input data asset path from the workspace datastore
datastore_path = ml_client.data.get(data_asset_name, version="1").path
print(f"Datastore path: {datastore_path}")

In [None]:
# Create the pipeline job by calling the defined pipeline function
pipeline_job = diagnosismanuals_to_faiss(
    embeddings_model=embeddings_model_uri,  # URI of the embeddings model
    #aoai_connection_id=aoai_connection_id,  # Connection ID for Azure OpenAI (optional)
    embeddings_container=Input(
        type="uri_folder",
        path=f"azureml://datastores/workspaceblobstore/paths/embeddings/{asset_name}"    
    ),  # Path for storing generated embeddings
    data_asset_path=Input(
        type="uri_folder",
        path=datastore_path
    ),  # Input data asset path
    chunk_size=1024,  # Size of chunks for processing
    data_source_glob=data_source_glob,  # Glob pattern for input files
    asset_name=asset_name  # Name of the MLIndex asset
)

In [None]:
# Add properties for better indexing and artifact tracking in the AzureML UI
pipeline_job.properties["azureml.mlIndexAssetName"] = asset_name
pipeline_job.properties["azureml.mlIndexAssetKind"] = "faiss"
pipeline_job.properties["azureml.mlIndexAssetSource"] = "Data asset"

In [None]:
# Submit the pipeline job for execution
submitted_pipeline = ml_client.jobs.create_or_update(pipeline_job)
print(f"Pipeline submitted successfully! Job ID: {submitted_pipeline.id}")

In [None]:
#!pip install numpy --no-binary :all:
!pip uninstall numpy --y
!pip uninstall pandas --y
!pip install numpy==1.26.4
!pip install pandas==2.2.0

In [None]:
# Stream the pipeline job logs for real-time monitoring
ml_client.jobs.stream(submitted_pipeline.name)

# **Information Retrieval and Response Generation Using LangChain-FAISS and Azure OpenAI**

## **1.Installing Required Libraries**

In [None]:
# Install the required LangChain and HuggingFace libraries
%pip install -U langchain-community
%pip install -U langchain-huggingface
%pip install -U langchain-openai

## **2. Setting Up Data Retrieval**

### **Downloading and Setting Up FAISS Index Assets**

In [None]:
# Import necessary utilities for artifact retrieval
import azure.ai.ml._artifacts._artifact_utilities as artifact_utils

# Retrieve the path to the latest FAISS index asset from Azure ML
data_info = ml_client.data.get(name=asset_name, label="_______________").path

# Download the FAISS index asset to a local directory
artifact_utils.download_artifact_from_aml_uri(
    uri=data_info,
    destination="_______________________",
    datastore_operation=ml_client.____________
)

# The FAISS index asset will be used for vector-based similarity search.

## **3. Loading the FAISS Index**

### **Loading the FAISS Index and Preparing the Retriever**

We load the FAISS index from the downloaded files and connect it to an embedding model. This embedding model ensures that queries are converted into vector space to match the stored documents effectively.

### **Note:**
When loading the FAISS index and setting up the retriever, it is crucial to use the same embedding model that was used during the creation of the FAISS index. This ensures that the dimensionality of the embeddings produced by the query matches the dimensionality of the embeddings stored in the FAISS index.

If a different embedding model is used between these two steps, a dimension mismatch will occur, leading to errors in retrieving relevant documents or performing similarity searches.

In [None]:
# Path to the directory containing FAISS index files
index_folder_path = "___________________"

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

# Specify the embedding model used during FAISS index creation
embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
embedding_model = HuggingFaceEmbeddings(model_name=embedding_model_name)

In [None]:
# from langchain_openai import AzureOpenAIEmbeddings

# # Specify the embedding model used during FAISS index creation
# embedding_model = AzureOpenAIEmbeddings(
#     model=creds["AZURE_OPENAI_EMBEDDING_MODEL"],
#     azure_endpoint= creds["AZURE_OPENAI_ENDPOINT"],
#     api_key= creds["AZURE_OPENAI_KEY"],
#     openai_api_version=creds["AZURE_OPENAI_APIVERSION"]
# )

In [None]:
from langchain.vectorstores import FAISS

# Load the FAISS index and associate it with the embedding model
retriever = FAISS.load_local(
    folder_path=index_folder_path, 
    embeddings=embedding_model, 
    allow_dangerous_deserialization=True  # Acknowledge the source of the data for safe loading
)

# The retriever is now ready to perform similarity searches.

## **4. Performing a Similarity Search**

In [None]:
# Define a query to test the retriever - you can feel free to pick any disease or medical condition for this 

query = "_____________________________________________"

# Retrieve the top 3 most relevant documents
results = retriever.____________________(query, k=_______________)

# Display the results
for doc in results:
    print(f"Document: {doc.page_content}\nMetadata: {doc.metadata}")

# This step helps validate that the retriever is functioning as expected.

## **5: Creating the System and User Prompt Templates**

In [None]:
# Define the system prompt for the Azure OpenAI model
qna_system_message = """
    #Write you are message here
"""
# Define the user message template
qna_user_message_template = """
###Context

#Wrie you are context here

{context}  

###Question  
{question}  
"""

## **6. Generating the Response**

In [None]:
# Install the rquired packages
#!pip install openai==1.55.3 tiktoken==0.6 session-info --quiet

In [None]:
# Import required libraries
import json
import tiktoken
import pandas as pd
from openai import AzureOpenAI

In [None]:
# Load Azure OpenAI credentials
with open('_____________', 'r') as az_creds:
    data = az_creds._________()

creds = json._________(data)

In [None]:
# Initialize the Azure OpenAI client
client = AzureOpenAI(
    azure_endpoint=___________,
    api_key=____________,
    api_version=______________
)

In [None]:
def generate_rag_response(user_input):
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.______________(user_input, k=_____________)
    context_list = [d.__________ for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    # Compose the prompt
    prompt = [
        {'role': '__________', 'content': qna_system_message},
        {'role': '__________', 'content': qna_user_message_template.format(
            context=_______,
            question=___________
            )
        }
    ]

    # Generate the response using Azure OpenAI
    try:
        response = client.chat.completions.create(
            model=creds["CHATGPT_MODEL"],
            messages=prompt,
            temperature=___________
        )

        # Extract and print the model's response
        response = response.choices[____________].message.content.strip()
    except Exception as e:
        response = f'_____________________: \n {e}'


    print(response)

#### Question 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "__________________________________"    # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Question 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input = "__________________________________"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Question 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = "__________________________________"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Question 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = "__________________________________"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Question 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = "__________________________________"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Power Ahead!