## Data Ingestion

In [19]:
# Import libraries and modules
import pandas as pd
import minsearch
from tqdm.auto import tqdm
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()  # This loads the variables from .env into the environment

True

In [13]:
df=pd.read_csv('../App/data/Mental_Health_FAQ.csv')

df.isnull().sum()  # null
df.head()
df.Question_ID.unique() # unique IDs 98
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 98 entries, 0 to 97
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Question_ID  98 non-null     int64 
 1   Questions    98 non-null     object
 2   Answers      98 non-null     object
dtypes: int64(1), object(2)
memory usage: 2.4+ KB


In [6]:
documents = df.to_dict(orient='records')
import json
with open('../App/data/documents.json', 'w') as file:
    json.dump(documents, file)

In [7]:
print(len(documents))
documents[0]


98


{'Question_ID': 1590140,
 'Questions': 'What does it mean to have a mental illness?',
 'Answers': 'Mental illnesses are health conditions that disrupt a personâ€™s thoughts, emotions, relationships, and daily functioning. They are associated with distress and diminished capacity to engage in the ordinary activities of daily life.\nMental illnesses fall along a continuum of severity: some are fairly mild and only interfere with some aspects of life, such as certain phobias. On the other end of the spectrum lie serious mental illnesses, which result in major functional impairment and interference with daily life. These include such disorders as major depression, schizophrenia, and bipolar disorder, and may require that the person receives care in a hospital.\nIt is important to know that mental illnesses are medical conditions that have nothing to do with a personâ€™s character, intelligence, or willpower. Just as diabetes is a disorder of the pancreas, mental illness is a medical condit

### Minsearch

In [12]:
# I decided not to use keywords as I discovered that it was helping with hit rate but slightly lowering it

In [14]:
index = minsearch.Index(
    text_fields=['Answers'],
    keyword_fields=['Questions']
)

In [15]:
index

<minsearch.Index at 0x215c660b040>

In [16]:
index.fit(documents)

<minsearch.Index at 0x215c660b040>

In [17]:
index.text_fields

['Answers']

## RAG Flow

In [20]:
client = OpenAI()

In [45]:
query = 'What are the symptoms of mental illness?'

In [46]:
def search(query):
    boost = {}

    results = index.search(
        query=query,
        filter_dict={},
        boost_dict=boost,
        num_results=5
    )

    return results

In [47]:
search(query)

[{'Question_ID': 1259439,
  'Questions': 'If I become involved in treatment what do I need to know?',
  'Answers': 'Beginning treatment is a big step for individuals and families and can be very overwhelming. It is important to continue involvement in the treatment process as much as possible. Some questions you will need to have answered include:\nWhat is known about the cause of this particular illness?\nAre there other diagnoses where these symptoms are common?\nDo you normally include a physical or neurological examination?\nAre there any additional tests or exams that you would recommend at this point?\nWould you advise an independent opinion from another psychiatrist at this point?\nWhat program of treatment is the most helpful with this diagnosis?\nWill this program involve services by other specialists? If so, who will be responsible for coordinating these services?\nWhat do you see as the familyâ€™s role in this program of treatment?\nHow much access will the family have to th

In [48]:
## LLM response
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{"role": "user", "content": query}]
)

response.choices[0].message.content


'Symptoms of mental illness can vary widely depending on the specific disorder, but there are some common signs that may indicate the presence of a mental health condition. Here are some general symptoms:\n\n1. **Emotional Symptoms:**\n   - Persistent sadness or low mood\n   - Excessive worry or anxiety\n   - Irritability or mood swings\n   - Feelings of hopelessness or helplessness\n   - Emotional numbness or detachment\n\n2. **Cognitive Symptoms:**\n   - Difficulty concentrating or making decisions\n   - Memory problems\n   - Paranoia or unfounded fears\n   - Delusions or false beliefs\n   - Hallucinations (seeing or hearing things that are not there)\n\n3. **Behavioral Symptoms:**\n   - Social withdrawal or isolation\n   - Changes in appetite or weight\n   - Changes in sleep patterns (insomnia or excessive sleeping)\n   - Decline in work performance or academic achievement\n   - Increased use of substances like alcohol or drugs\n\n4. **Physical Symptoms:**\n   - Unexplained physical

In [49]:
print(_)

Symptoms of mental illness can vary widely depending on the specific disorder, but there are some common signs that may indicate the presence of a mental health condition. Here are some general symptoms:

1. **Emotional Symptoms:**
   - Persistent sadness or low mood
   - Excessive worry or anxiety
   - Irritability or mood swings
   - Feelings of hopelessness or helplessness
   - Emotional numbness or detachment

2. **Cognitive Symptoms:**
   - Difficulty concentrating or making decisions
   - Memory problems
   - Paranoia or unfounded fears
   - Delusions or false beliefs
   - Hallucinations (seeing or hearing things that are not there)

3. **Behavioral Symptoms:**
   - Social withdrawal or isolation
   - Changes in appetite or weight
   - Changes in sleep patterns (insomnia or excessive sleeping)
   - Decline in work performance or academic achievement
   - Increased use of substances like alcohol or drugs

4. **Physical Symptoms:**
   - Unexplained physical symptoms (headaches, sto

In [56]:
def build_prompt(query, search_results):
    prompt_template = """
    You're a mental health psychiatrist. Answer the QUESTION based on the CONTEXT from our mental questions and answer database.
    Use only the facts from the CONTEXT when answering the QUESTION.
    
    QUESTION: {question}
    
    CONTEXT:
    {context}
    """.strip()
    
    entry_template = """
    ANSWER: {Answers}
    """.strip()
    context = ""
    
    for doc in search_results:
        context = context + entry_template.format(**doc) + "\n\n"

    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [57]:
query='what are the symptoms of mental illness?'
search_results=search(query)
build_prompt(query, search_results)

"You're a mental health psychiatrist. Answer the QUESTION based on the CONTEXT from our mental questions and answer database.\n    Use only the facts from the CONTEXT when answering the QUESTION.\n    \n    QUESTION: what are the symptoms of mental illness?\n    \n    CONTEXT:\n    ANSWER: Beginning treatment is a big step for individuals and families and can be very overwhelming. It is important to continue involvement in the treatment process as much as possible. Some questions you will need to have answered include:\nWhat is known about the cause of this particular illness?\nAre there other diagnoses where these symptoms are common?\nDo you normally include a physical or neurological examination?\nAre there any additional tests or exams that you would recommend at this point?\nWould you advise an independent opinion from another psychiatrist at this point?\nWhat program of treatment is the most helpful with this diagnosis?\nWill this program involve services by other specialists? If

In [58]:
print(_)

You're a mental health psychiatrist. Answer the QUESTION based on the CONTEXT from our mental questions and answer database.
    Use only the facts from the CONTEXT when answering the QUESTION.
    
    QUESTION: what are the symptoms of mental illness?
    
    CONTEXT:
    ANSWER: Beginning treatment is a big step for individuals and families and can be very overwhelming. It is important to continue involvement in the treatment process as much as possible. Some questions you will need to have answered include:
What is known about the cause of this particular illness?
Are there other diagnoses where these symptoms are common?
Do you normally include a physical or neurological examination?
Are there any additional tests or exams that you would recommend at this point?
Would you advise an independent opinion from another psychiatrist at this point?
What program of treatment is the most helpful with this diagnosis?
Will this program involve services by other specialists? If so, who will 

In [59]:
def llm(prompt, model='gpt-4o-mini'):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [60]:
query='What are the symptoms of mental illness?'
def rag(query, model='gpt-4o-mini'):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    #print(prompt)
    answer = llm(prompt, model=model)
    return answer



In [61]:
print(rag(query))

The symptoms of mental illness can vary widely depending on the specific condition, but generally, they can impact the way a person thinks, feels, behaves, or interacts with others. Common signs to be aware of include changes in mood, personality shifts, alterations in personal habits, and social withdrawal. In children under 18, these symptoms might be categorized as serious emotional disturbances (SEDs). It's crucial for parents and caregivers to observe any changes in their child's emotional state, as mental illnesses often first manifest in adolescence or early adulthood.


In [30]:
import pandas as pd

### Retrieval evaluation

In [62]:

df_questions = pd.read_csv('../App/data/ground-truth-data.csv')

In [63]:
ground_truth=df_questions.to_dict(orient='records')
len(ground_truth)

485

In [64]:
len(ground_truth)
for q in ground_truth[0:6]:
    print(q['question'])
type(ground_truth)

What are the different levels of severity for mental illnesses?
Can you explain how mental illnesses impact daily functioning?
Are mental illnesses linked to a person's character or intelligence?
What treatments are available for mental illnesses?
How effective is treatment for individuals with mental illness?
What percentage of mental health conditions develop by age 24?


list

In [70]:
ground_truth[0]

{'id': 1590140,
 'question': 'What are the different levels of severity for mental illnesses?'}

In [77]:
def hit_rate(relevance_total):
    cnt = 0

    for line in relevance_total:
        if True in line:
            cnt = cnt + 1

    return cnt / len(relevance_total)

def mrr(relevance_total):
    total_score = 0.0

    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:
                total_score = total_score + 1 / (rank + 1)

    return total_score / len(relevance_total)

In [78]:
def minsearch_search(query):
    boost = {}

    results = index.search(
        query=query,
        filter_dict={},
        boost_dict=boost,
        num_results=5
    )

    return results

In [82]:
def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['id']
        results = search_function(q)
        relevance = [d['Question_ID'] == doc_id for d in results]
        relevance_total.append(relevance)

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

In [83]:
from tqdm.auto import tqdm

In [84]:
evaluate(ground_truth, lambda q: minsearch_search(q['question']))

  0%|          | 0/485 [00:00<?, ?it/s]

{'hit_rate': 0.9134020618556701, 'mrr': 0.7261168384879727}

### Finding the best parameters

In [85]:
df_validation = df_questions[:100]
df_test = df_questions[100:]

In [86]:
import random

def simple_optimize(param_ranges, objective_function, n_iterations=10):
    best_params = None
    best_score = float('-inf')  # Assuming we're minimizing. Use float('-inf') if maximizing.

    for _ in range(n_iterations):
        # Generate random parameters
        current_params = {}
        for param, (min_val, max_val) in param_ranges.items():
            if isinstance(min_val, int) and isinstance(max_val, int):
                current_params[param] = random.randint(min_val, max_val)
            else:
                current_params[param] = random.uniform(min_val, max_val)
        
        # Evaluate the objective function
        current_score = objective_function(current_params)
        
        # Update best if current is better
        if current_score > best_score:  # Change to > if maximizing
            best_score = current_score
            best_params = current_params
    
    return best_params, best_score

In [87]:
gt_val = df_validation.to_dict(orient='records')

In [88]:
def minsearch_search(query, boost=None):
    if boost is None:
        boost = {}

    results = index.search(
        query=query,
        filter_dict={},
        boost_dict=boost,
        num_results=10
    )

    return results

In [89]:
param_ranges = {
    'Answers': (0.0, 4.0),
    
    
}

def objective(boost_params):
    def search_function(q):
        return minsearch_search(q['question'], boost_params)

    results = evaluate(gt_val, search_function)
    return results['mrr']

In [90]:
simple_optimize(param_ranges, objective, n_iterations=20)

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

({'Answers': 3.001079082237011}, 0.6187460317460317)

In [94]:
def minsearch_improved(query):
    boost = {
    'Answers':3.001079
   
    
    }

    results = index.search(
        query=query,
        filter_dict={},
        boost_dict=boost,
        num_results=5
    )

    return results

evaluate(ground_truth, lambda q: minsearch_improved(q['question']))

  0%|          | 0/485 [00:00<?, ?it/s]

{'hit_rate': 0.9134020618556701, 'mrr': 0.7261168384879727}

## RAG evaluation

In [95]:
prompt2_template = """
You are an expert evaluator for a RAG system.
Your task is to analyze the relevance of the generated answer to the given question.
Based on the relevance of the generated answer, you will classify it
as "NON_RELEVANT", "PARTLY_RELEVANT", or "RELEVANT".

Here is the data for evaluation:

Question: {question}
Generated Answer: {answer_llm}

Please analyze the content and context of the generated answer in relation to the question
and provide your evaluation in parsable JSON without using code blocks:

{{
  "Relevance": "NON_RELEVANT" | "PARTLY_RELEVANT" | "RELEVANT",
  "Explanation": "[Provide a brief explanation for your evaluation]"
}}
""".strip()

In [96]:
len(ground_truth)

485

In [97]:
record = ground_truth[0]
record

{'id': 1590140,
 'question': 'What are the different levels of severity for mental illnesses?'}

In [98]:
record = ground_truth[0]
question = record['question']
answer_llm = rag(question)
print(answer_llm)

Mental illnesses fall along a continuum of severity. Some are fairly mild and may only interfere with certain aspects of life, such as specific phobias. On the other end of the spectrum are serious mental illnesses, which can result in significant functional impairment and disruption of daily life. Examples of these more severe conditions include major depression, schizophrenia, and bipolar disorder, which may require hospitalization for care. This indicates that the severity of mental illness can range from mild disturbances to major impairments, impacting a person's ability to engage in ordinary daily activities.


In [99]:
prompt = prompt2_template.format(question=question, answer_llm=answer_llm)
print(prompt)

You are an expert evaluator for a RAG system.
Your task is to analyze the relevance of the generated answer to the given question.
Based on the relevance of the generated answer, you will classify it
as "NON_RELEVANT", "PARTLY_RELEVANT", or "RELEVANT".

Here is the data for evaluation:

Question: What are the different levels of severity for mental illnesses?
Generated Answer: Mental illnesses fall along a continuum of severity. Some are fairly mild and may only interfere with certain aspects of life, such as specific phobias. On the other end of the spectrum are serious mental illnesses, which can result in significant functional impairment and disruption of daily life. Examples of these more severe conditions include major depression, schizophrenia, and bipolar disorder, which may require hospitalization for care. This indicates that the severity of mental illness can range from mild disturbances to major impairments, impacting a person's ability to engage in ordinary daily activitie

In [100]:
import json

In [101]:
df_sample = df_questions.sample(n=200, random_state=1)

In [102]:
sample = df_sample.to_dict(orient='records')

In [103]:
evaluations = []

for record in tqdm(sample):
    question = record['question']
    answer_llm = rag(question,model='gpt-3.5-turbo') 

    prompt = prompt2_template.format(
        question=question,
        answer_llm=answer_llm
    )

    evaluation = llm(prompt)
    evaluation = json.loads(evaluation)

    evaluations.append((record, answer_llm, evaluation))

  0%|          | 0/200 [00:00<?, ?it/s]

In [108]:
df_eval = pd.DataFrame(evaluations, columns=['record', 'answer', 'evaluation'])
df_eval.head()

Unnamed: 0,record,answer,evaluation
0,"{'id': 1030153, 'question': 'How can I find a ...",To find a suitable mental health professional ...,"{'Relevance': 'RELEVANT', 'Explanation': 'The ..."
1,"{'id': 6981545, 'question': 'Can you provide d...",The Bounce Back program is a guided self-help ...,"{'Relevance': 'RELEVANT', 'Explanation': 'The ..."
2,"{'id': 9250044, 'question': 'What are some com...",Joining a support group for individuals experi...,"{'Relevance': 'RELEVANT', 'Explanation': 'The ..."
3,"{'id': 1667863, 'question': 'Why is it importa...",It is important to seek help early for mental ...,"{'Relevance': 'RELEVANT', 'Explanation': 'The ..."
4,"{'id': 4211025, 'question': 'What options are ...",Treatment options for individuals experiencing...,"{'Relevance': 'RELEVANT', 'Explanation': 'The ..."


In [110]:
df_eval = pd.DataFrame(evaluations, columns=['record', 'answer', 'evaluation'])

df_eval['id'] = df_eval.record.apply(lambda d: d['id'])
df_eval['question'] = df_eval.record.apply(lambda d: d['question'])

df_eval['relevance'] = df_eval.evaluation.apply(lambda d: d['Relevance'])
df_eval['explanation'] = df_eval.evaluation.apply(lambda d: d['Explanation'])

del df_eval['record']
del df_eval['evaluation']

In [111]:
type(evaluations[0])

tuple

In [112]:
df_eval.relevance.value_counts()

relevance
RELEVANT           191
PARTLY_RELEVANT      5
NON_RELEVANT         4
Name: count, dtype: int64

In [113]:
df_eval.relevance.value_counts(normalize=True)

relevance
RELEVANT           0.955
PARTLY_RELEVANT    0.025
NON_RELEVANT       0.020
Name: proportion, dtype: float64

In [114]:
df_eval.to_csv('../App/data/rag-eval-gpt-3.5-turbo.csv', index=False)

In [115]:
evaluations_gpt4o_mini = []

for record in tqdm(sample):
    question = record['question']
    answer_llm = rag(question, model='gpt-4o-mini') 

    prompt = prompt2_template.format(
        question=question,
        answer_llm=answer_llm
    )

    evaluation = llm(prompt)
    evaluation = json.loads(evaluation)
    
    evaluations_gpt4o_mini.append((record, answer_llm, evaluation))

  0%|          | 0/200 [00:00<?, ?it/s]

In [120]:
df_eval = pd.DataFrame(evaluations_gpt4o_mini, columns=['record', 'answer', 'evaluation'])

df_eval['id'] = df_eval.record.apply(lambda d: d['id'])
df_eval['question'] = df_eval.record.apply(lambda d: d['question'])

df_eval['relevance'] = df_eval.evaluation.apply(lambda d: d['Relevance'])
df_eval['explanation'] = df_eval.evaluation.apply(lambda d: d['Explanation'])

del df_eval['record']
del df_eval['evaluation']

In [121]:
df_eval.relevance.value_counts()

relevance
RELEVANT           198
PARTLY_RELEVANT      2
Name: count, dtype: int64

In [122]:
df_eval.relevance.value_counts(normalize=True)

relevance
RELEVANT           0.99
PARTLY_RELEVANT    0.01
Name: proportion, dtype: float64

In [123]:

df_eval.to_csv('../App/data/rag-eval-gpt_4o_mini.csv', index=False)


In [124]:
import pandas as pd
df=pd.read_csv('../App/data/rag-eval-gpt_4o_mini.csv')

In [125]:
df.relevance.value_counts(normalize=True)

relevance
RELEVANT           0.99
PARTLY_RELEVANT    0.01
Name: proportion, dtype: float64