# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

The dataset chosen is a medical dataset which consists in 16K rows of questions and answers related to susceptibility, symptoms, treatments and prevention of diseases. This dataset is adequate for the task as it's segregated in questions and answers and users can ask directly how to prevent diseases, how is most at risk, what are the most common symptoms, and other characteristics directly related to health issues.

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [1]:
#Import all relevant python libraries
import pandas as pd
import numpy as np
from pathlib import Path
import tiktoken

In [2]:
# Load the selected dataset
df = pd.read_csv("data/medDataset_processed.csv")


#Rename the Answer column to "text"
df.rename(columns={'Answer':'text'},inplace=True)
df.head()

Unnamed: 0,qtype,Question,text
0,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...
1,symptoms,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...
2,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...
3,exams and tests,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos..."
4,treatment,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen..."


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [3]:
import openai
openai.api_key = "YOUR API KEY"

In [4]:
# Create a tokenizer to align with the embeddings
tokenizer = tiktoken.get_encoding("cl100k_base")
token_limit = 1000

In [5]:
#Define basic completion query using gpt-3.5-turbo-instruct as our selected model 
def basic_completion_query(question):
    prompt = f"Question: {question}\nAnswer:"
    tokens = tokenizer.encode(prompt) #Tokenize prompt
    response = openai.Completion.create(
        model="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=500
    )
    return response.choices[0].text.strip()

#Define function to generate custom query
def custom_completion_query(question):
    custom_context = "You're a medical assistant specializing in providing detailed and accurate answers to medical questions"
    prompt = f"{custom_context}\n\nQuestion: {question}\nAnswer:"
    tokens=tokenizer.encode(prompt) #Tokenize prompt
    response=openai.Completion.create(
        model="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=500
    )
    return response.choices[0].text.strip()



## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [6]:
#Process question No.1
question1 = df['Question'][0]

basic_answer1 = basic_completion_query(question1)
custom_answer1 = custom_completion_query(question1)


# Display the results for the first question
print(f"Question: {question1}")
print("\n" + "="*50 + "\n")
print(f"Basic Answer: {basic_answer1}")
print("\n" + "="*50 + "\n")
print(f"Custom Answer: {custom_answer1}")


Question: Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?


Basic Answer: People who are at risk for Lymphocytic Choriomeningitis (LCM) include those who come into contact with rodents, specifically the house mouse (Mus musculus), or their secretions. This can include pet mice, wild mice, or exposure to areas with high rodent populations such as storage facilities or agricultural settings. People with weakened immune systems, pregnant women, and infants are also at an increased risk for developing LCM.


Custom Answer: Individuals who come into contact with infected rodents or their urine, droppings, or saliva, as well as those with weakened immune systems, pregnant women, and newborns are at a higher risk for Lymphocytic Choriomeningitis (LCM). Additionally, individuals who live in crowded or unsanitary living conditions, or who engage in activities such as cleaning or working in areas where rodents are present, are also at an increased risk for LCM.


### Question 2

In [7]:
#Process question No.2
question2 = df['Question'][10] 

basic_answer2 = basic_completion_query(question2)
custom_answer2 = custom_completion_query(question2)

# Display the results for the second question
print(f"Question: {question2}")
print("\n" + "="*50 + "\n")
print(f"Basic Answer: {basic_answer2}")
print("\n" + "="*50 + "\n")
print(f"Custom Answer: {custom_answer2}")


Question: How to prevent Parasites - Cysticercosis ?


Basic Answer: To prevent Parasites - Cysticercosis, it is important to practice good hygiene and sanitation habits, including:

1. Wash your hands frequently and thoroughly before handling food and after using the bathroom.

2. Cook meat to a safe internal temperature to kill any potential tapeworm larvae.

3. Avoid eating raw or undercooked meat, especially pork.

4. Wash and peel fruits and vegetables before eating them.

5. Avoid drinking untreated water, especially from rivers or lakes.

6. Maintain proper sanitation and hygiene practices when raising or slaughtering animals.

7. Keep your living space clean and free of pests.

8. If traveling to an area where Cysticercosis is common, avoid eating street food and make sure to drink only bottled or boiled water. 

9. Regularly deworm pets and avoid allowing them to eat raw or undercooked meat.

10. If you suspect you may have been exposed to Cysticercosis, seek medical attention