In [None]:
from nltk.translate.bleu_score import sentence_bleu
from rouge_score import rouge_scorer

In [2]:
def calculate_f1(golden, generated):
    
    golden_tokens = set(golden.split())
    generated_tokens = set(generated.split())
    common_tokens = golden_tokens.intersection(generated_tokens)

    if not common_tokens:
        return 0.0

    precision = len(common_tokens) / len(generated_tokens)
    recall = len(common_tokens) / len(golden_tokens)
    f1 = 2 * (precision * recall) / (precision + recall)
    return f1

In [3]:
golden_answers = [
    "Whoever obstructs an Inspector in the exercise of any power conferred upon him by or under this Act, or disobeys the lawful authority of any Inspector, shall be punishable with imprisonment for a term which may extend to one year, or with fine which may extend to one lakh rupees, or with both.",
    "Shall be punishable with imprisonment for a term which shall not be less than three years or more than ten years and with fine which may extend to ten lakh rupees, and imprisonment for a term of less than three years.",
    "Drugs mentioned in SCHEDULE D.",
    "No fine defined in the corpus.",
    "The applicant must have a valid national identity card and passport. a person must first apply to the licensing authority in Form 8(A) or Form 8(B). The applicant must be a qualified pharmacist or medical practitioner. The applicant must have a valid and relevant drug sale license (DSL) from the pharmacy council."
]

#### LLAMA 3.2

*Chroma Retriever*

In [11]:
llama_chroma_answers = [
    "The text does not explicitly state the specific legal consequences for obstructing an officer during an investigation. However, based on the information provided in the document (Section 29 of The Drugs Act 1976), it can be inferred that obstruction of an inspector or other authority may result in punishment with imprisonment for a term not exceeding one year, or with fine not exceeding one lakh rupees, or both. However, to provide more specific guidance, I'll need to refer to the Code of Criminal Procedure, 1898 (Act V of 1898), which mentions that an offence punishable under Schedule-III other than an offence mentioned in clause (1) of that Schedule shall be non-cognizable. It's also worth noting that the penalties and consequences mentioned in Section 29 are for contraventions under The Drugs Act 1976, and not specifically for obstructing an officer during an investigation. Therefore, it would be best to consult with a legal professional or a relevant authority for more specific guidance on this matter.",
    "The penalty for exporting, importing, manufacturing, or selling spurious or unregistered drugs in Pakistan can be taken under the Drug Regulatory Authority of Pakistan Act 2012. As per the act, there are severe penalties and legal consequences for those involved in such activities. These include:* Imprisonment * Fine * Seizure of goods.",
    "According to the text, there is no specific list of drugs that are explicitly mentioned as being restricted to sale only with a prescription from a registered medical practitioner. However, it can be inferred that 'special medicinal products' such as: * Dietary supplements * Nutraceuticals * Cosmeceuticals * Contrast/dyes * Radiopharmaceuticals * Surgical devices * Implants * Alternative therapies are regulated by specific policies and procedures, and their use may require a prescription or authorization from a registered medical practitioner. Additionally, the text mentions that there are 'policies and procedures' for the procurement, control, and use of these special medicinal products, which suggests that they may be subject to certain restrictions or requirements.",
    "The question does not mention anything about imposing a fine specifically for labeling a drug with the words 'Physician Sample - Not for Sale'. However, I can provide some general information. According to Rule 6(a) of the Punjab-Drug-Sale-Rules, any person who sells or offers to sell without a licence is punishable with imprisonment for a term which may extend to three years and fine which may extend to twenty-five thousand rupees.",
    "According to the text, the prerequisites for obtaining a license to open and operate a pharmacy or medical store in Pakistan are: 1. The applicant must submit an application in Form 8(A) or Form 8(B) along with the required fee. 2. The fee for a pharmacy license is Rs. 3,000, and for a medical store license, it is Rs. 2,000 (for new applications). 3. For renewal of licenses: * Pharmacy license: Rs. 1,000 * Medical store license: Rs. 1,000 4. The applicant must pay 50% of the fee for changing the qualified person or obtaining a duplicate copy of the license. 5. The licensing authority may refuse an application if it is not expedient in public interest to grant a license Additionally, the following conditions must be met: * The applicant must meet the storage conditions and store the drugs properly (for medical stores). If the applicant intends to sell or distribute drugs at more than one place, they must apply for separate licenses for each location."
]

In [12]:
lcr_exact_matches = 0
lcr_f1_scores = []
lcr_bleu_scores = []
lcr_rouge_scores = []

In [13]:
for golden, chatbot in zip(golden_answers, llama_chroma_answers):
    # Exact Match
    if golden.strip().lower() == chatbot.strip().lower():
        lcr_exact_matches += 1
    
    # F1 Score
    lcr_f1_scores.append(calculate_f1(golden, chatbot))
    
    # BLEU Score
    lcr_bleu_scores.append(sentence_bleu([golden.split()], chatbot.split()))
    
    # ROUGE Score
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    rouge = scorer.score(golden, chatbot)
    lcr_rouge_scores.append(rouge)

# Accuracy (Exact Match Score)
accuracy = (lcr_exact_matches / len(golden_answers)) * 100

# Results Summary
print("Evaluation Metrics for LLAMA 3.2 - Chroma Retriever")
print("*" * 25)
print(f"Exact Match (EM) Score: {accuracy:.2f}%")
print(f"Average F1 Score: {sum(lcr_f1_scores) / len(lcr_f1_scores):.2f}")
print(f"Average BLEU Score: {sum(lcr_bleu_scores) / len(lcr_bleu_scores):.2f}")
rouge1_avg = sum([score['rouge1'].fmeasure for score in lcr_rouge_scores]) / len(lcr_rouge_scores)
rougeL_avg = sum([score['rougeL'].fmeasure for score in lcr_rouge_scores]) / len(lcr_rouge_scores)
print(f"Average ROUGE-1 Score: {rouge1_avg:.2f}")
print(f"Average ROUGE-L Score: {rougeL_avg:.2f}")

# Error Analysis
print("\nError Analysis:")
for idx, (golden, chatbot) in enumerate(zip(golden_answers, llama_chroma_answers), 1):
    if golden.strip().lower() != chatbot.strip().lower():
        print(f"Question {idx}:")
        print(f"  Golden Answer: {golden}")
        print(f"  Chatbot Answer: {chatbot}")
        print("  Issue: Potential mismatch or incomplete response\n")

Evaluation Metrics for LLAMA 3.2 - Chroma Retriever
*************************
Exact Match (EM) Score: 0.00%
Average F1 Score: 0.18
Average BLEU Score: 0.02
Average ROUGE-1 Score: 0.19
Average ROUGE-L Score: 0.13

Error Analysis:
Question 1:
  Golden Answer: Whoever obstructs an Inspector in the exercise of any power conferred upon him by or under this Act, or disobeys the lawful authority of any Inspector, shall be punishable with imprisonment for a term which may extend to one year, or with fine which may extend to one lakh rupees, or with both.
  Chatbot Answer: The text does not explicitly state the specific legal consequences for obstructing an officer during an investigation. However, based on the information provided in the document (Section 29 of The Drugs Act 1976), it can be inferred that obstruction of an inspector or other authority may result in punishment with imprisonment for a term not exceeding one year, or with fine not exceeding one lakh rupees, or both. However, to p

The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


*BM25 Retriever*

In [15]:
bmans1 = """
The provided text does not mention anything about obstructing an officer during an investigation or its related legal consequences. It appears to be a document providing guidelines for reporting Adverse Events (AEs) and Adverse Reactions to Medicinal Products (ARMPs) through a mobile application, specifically Med Safety.

However, if you're looking for general information on the legal consequences of obstructing an officer during an investigation, it can vary depending on the jurisdiction and specific circumstances. In many places, obstructing an officer or interfering with their duties can be considered a crime, such as:

1. Obstruction of Justice: This is a federal offense in some countries that carries penalties such as fines, imprisonment, or both.
2. Interference with Law Enforcement: Some jurisdictions consider this a crime, punishable by law, and may result in fines or prison time.\n3. Resisting Arrest: If an individual resists arrest or attempts to flee from the scene, they may be charged with resisting arrest.

The specific penalties can range widely depending on the jurisdiction, circumstances, and the severity of the offense.
"""

bmans2 = """
According to the provided text snippet, there is no explicit mention of a specific penalty for exporting, importing, manufacturing, or selling spurious or unregistered drugs. However, the text does mention that the Authority may engage in regulating, enforcing, and monitoring advertisement rules and banning false advertisements.

In general, the regulations regarding the sale and distribution of pharmaceuticals, including penalties for spurious or unregistered drugs, are typically outlined by government agencies such as the Food and Drug Administration (FDA) or equivalent bodies in other countries. These regulations can vary depending on the jurisdiction, but they often include provisions for severe penalties for those involved in the unauthorized manufacture, sale, or distribution of pharmaceuticals.

Some possible penalties for exporting, importing, manufacturing, or selling spurious or unregistered drugs could include:

*   Fines and monetary penalties
*   Imprisonment or imprisonment terms
*   Revocation of licenses to manufacture or sell pharmaceuticals
*   Forfeiture of assets or profits derived from the sale of spurious or unregistered drugs

It's essential to consult specific laws, regulations, or guidelines in a particular country or region for accurate information on penalties related to the sale or distribution of spurious or unregistered drugs.
"""

bmans3 = """
The question is not explicitly stated in the text, but based on the context of the Adverse Events Reporting Guidelines for Healthcare Professionals, it can be inferred that the answer refers to therapeutic goods.

According to the guidelines, there is no specific mention of "drugs" being restricted to sale only with a prescription from a registered medical practitioner. However, section 7.1.2 mentions that products which are classified as Class A or Class B under the Drugs and Medicines Ordinance (DMO) should not be sold without a valid prescription.

In Pakistan, Class A and Class B drugs are typically those that require a prescription for sale due to their potential for abuse, dependency, or other safety concerns. Examples of such drugs might include controlled substances, such as opioids or sedatives, as well as certain medications used to treat psychiatric disorders or addictive diseases.

Examples of such restricted drugs in Pakistan would include:

* Opioids (e.g., morphine, oxycodone)* Benzodiazepines (e.g., diazepam, alprazolam)
* Stimulants (e.g., amphetamines, methylphenidate)
* Certain psychiatric medications (e.g., risperidone, aripiprazole)

Please note that this is not an exhaustive list, and the specific drugs restricted to sale only with a prescription from a registered medical practitioner may vary depending on local regulations and guidelines.
"""

bmans4 = """
The text does not provide information on the fine imposed for selling a drug labeled as "Physician Sample – Not for Sale". However, I can tell you that according to the DRA Guidelines, selling a drug labeled as "Physician Sample – Not for Sale" is considered an offense and is punishable under the law. The exact penalty or fine imposed would depend on the specific circumstances and the laws in place at the time of the incident.
"""

bmans5 = """
Unfortunately, this text does not appear to be related to the prerequisites for opening and operating a pharmacy or medical store. The text seems to be a pharmaceutical regulatory guideline document, specifically a set of guidelines for regulatory appeals.

However, I can try to provide some general information on the typical prerequisites for obtaining a license to open and operate a pharmacy or medical store.

In most countries, the prerequisites for opening and operating a pharmacy or medical store typically include:

1. Obtaining a pharmacy license or medical store registration from the relevant authorities.
2. Completing a training program or apprenticeship in pharmacy or medicine.
3. Passing a licensing examination to demonstrate proficiency in pharmacy or medical knowledge.
4. Meeting specific requirements related to education, experience, and character.
5. Having a valid business license and tax identification number.

Please note that these prerequisites may vary depending on the country, state, or region where you plan to open your pharmacy or medical store. It's essential to check with the relevant authorities for specific requirements and regulations.
"""

In [16]:
llama_bm25_answers = [
    bmans1,
    bmans2,
    bmans3,
    bmans4,
    bmans5
]

In [17]:
lbm_exact_matches = 0
lbm_f1_scores = []
lbm_bleu_scores = []
lbm_rouge_scores = []

In [18]:
for golden, chatbot in zip(golden_answers, llama_bm25_answers):
    # Exact Match
    if golden.strip().lower() == chatbot.strip().lower():
        lbm_exact_matches += 1
    
    # F1 Score
    lbm_f1_scores.append(calculate_f1(golden, chatbot))
    
    # BLEU Score
    lbm_bleu_scores.append(sentence_bleu([golden.split()], chatbot.split()))
    
    # ROUGE Score
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    rouge = scorer.score(golden, chatbot)
    lbm_rouge_scores.append(rouge)

# Accuracy (Exact Match Score)
accuracy = (lbm_exact_matches / len(golden_answers)) * 100

# Results Summary
print("Evaluation Metrics for LLAMA 3.2 - BM25 Retriever")
print("*" * 25)
print(f"Exact Match (EM) Score: {accuracy:.2f}%")
print(f"Average F1 Score: {sum(lbm_f1_scores) / len(lbm_f1_scores):.2f}")
print(f"Average BLEU Score: {sum(lbm_bleu_scores) / len(lbm_bleu_scores):.2f}")
rouge1_avg = sum([score['rouge1'].fmeasure for score in lbm_rouge_scores]) / len(lbm_rouge_scores)
rougeL_avg = sum([score['rougeL'].fmeasure for score in lbm_rouge_scores]) / len(lbm_rouge_scores)
print(f"Average ROUGE-1 Score: {rouge1_avg:.2f}")
print(f"Average ROUGE-L Score: {rougeL_avg:.2f}")

# Error Analysis
print("\nError Analysis:")
for idx, (golden, chatbot) in enumerate(zip(golden_answers, llama_bm25_answers), 1):
    if golden.strip().lower() != chatbot.strip().lower():
        print(f"Question {idx}:")
        print(f"  Golden Answer: {golden}")
        print(f"  Chatbot Answer: {chatbot}")
        print("  Issue: Potential mismatch or incomplete response\n")

Evaluation Metrics for LLAMA 3.2 - BM25 Retriever
*************************
Exact Match (EM) Score: 0.00%
Average F1 Score: 0.13
Average BLEU Score: 0.00
Average ROUGE-1 Score: 0.14
Average ROUGE-L Score: 0.09

Error Analysis:
Question 1:
  Golden Answer: Whoever obstructs an Inspector in the exercise of any power conferred upon him by or under this Act, or disobeys the lawful authority of any Inspector, shall be punishable with imprisonment for a term which may extend to one year, or with fine which may extend to one lakh rupees, or with both.
  Chatbot Answer: 
The provided text does not mention anything about obstructing an officer during an investigation or its related legal consequences. It appears to be a document providing guidelines for reporting Adverse Events (AEs) and Adverse Reactions to Medicinal Products (ARMPs) through a mobile application, specifically Med Safety.

However, if you're looking for general information on the legal consequences of obstructing an officer dur

The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


*Ensemble Retriever*

In [19]:
ens1 = """
The text provided does not specifically mention the legal consequences for obstructing an officer during an investigation. However, I can provide general information on this topic.

Obstructing an officer during an investigation is considered a serious offense in many jurisdictions and can result in various legal consequences. The specific penalties may vary depending on the jurisdiction, the severity of the obstruction, and other factors.

In some cases, obstructing an officer during an investigation may be charged as a misdemeanor or felony offense, and can carry penalties such as:
*   Fines
*   Imprisonment (including short-term or extended sentences)
*   Probation
*   Community service

Additionally, in some jurisdictions, obstructing an officer during an investigation may also result in additional consequences, such as:

*   Suspension of driver's license
*   Revocation of professional licenses
*   Mandatory attendance at a court-ordered anger management program or other treatment programs
*   Restitution to the victim(s)

It is essential to note that these are general examples and the specific penalties may vary depending on the jurisdiction, the severity of the obstruction, and other factors.

If you have concerns about obstructing an officer during an investigation, I recommend consulting with a qualified attorney or law enforcement professional for guidance.
"""

ens2 = """
The text does not specifically mention a penalty for exporting, importing, manufacturing, or selling spurious or unregistered drugs. However, it mentions that "each company shall provide a detailed summary of all expenditures incurred on institutions or healthcare professionals on account of marketing, honoraria, travel, subsistence expenses, grants and any other related financial transaction" to the relevant tax authorities and the Authority.

It is likely that exporting, importing, manufacturing, or selling spurious or unregistered drugs would be considered a breach of the rules and could result in penalties, but the specific penalty is not mentioned in this text.
"""

ens3 = """
According to the provided text, High Alert Medication is restricted to sale only with a prescription from a registered medical practitioner.
"""

ens4 = """
The question does not mention the imposition of a fine, but rather asks about the label on a sample of a drug. According to the rules, a sample of a drug may be provided to physicians or dentists or pharmacists or veterinarians or medical institutions in reasonable quantities and in reduced packings marked with the words "Physician Sample - Not for Sale".
"""

ens5 = """
According to the rules mentioned in the text, the prerequisites for obtaining a license to open and operate a pharmacy or medical store include:

1. The pharmacy or medical store must be registered with the relevant authorities.
2. The premises where the pharmacy or medical store is located must meet certain requirements, such as having proper infrastructure, facilities, and working conditions.
3. The licensee (the person or entity applying for the license) must have a valid prescription from a licensed medical practitioner.
4. The licensee must provide proof of qualifications, experience, and character to the licensing authority.
5. The pharmacy or medical store must comply with all relevant laws, regulations, and rules governing the sale and dispensing of pharmaceuticals and other medicinal products.

Additionally, certain documents such as an invoice and warranty shall bear the full name and address of the purchaser and shall be signed by the warrantor clearly indicating his name and shall be dated. The manufacturer, importer or seller of a drug must maintain record of purchase or sale of a drug and preserve the record for at least three years containing the required particulars.

It is also mentioned that certain substances specified in Schedule E and listed poisons should be stored in a specific manner to prevent leakage arising from ordinary risks of handling and transport.
"""

In [22]:
llama_ensemble_answers = [
    ens1,
    ens2,
    ens3,
    ens4,
    ens5
]

In [20]:
lens_exact_matches = 0
lens_f1_scores = []
lens_bleu_scores = []
lens_rouge_scores = []

In [23]:
for golden, chatbot in zip(golden_answers, llama_ensemble_answers):
    # Exact Match
    if golden.strip().lower() == chatbot.strip().lower():
        lens_exact_matches += 1
    
    # F1 Score
    lens_f1_scores.append(calculate_f1(golden, chatbot))
    
    # BLEU Score
    lens_bleu_scores.append(sentence_bleu([golden.split()], chatbot.split()))
    
    # ROUGE Score
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    rouge = scorer.score(golden, chatbot)
    lens_rouge_scores.append(rouge)

# Accuracy (Exact Match Score)
accuracy = (lens_exact_matches / len(golden_answers)) * 100

# Results Summary
print("Evaluation Metrics for LLAMA 3.2 - Ensemble Retriever")
print("*" * 25)
print(f"Exact Match (EM) Score: {accuracy:.2f}%")
print(f"Average F1 Score: {sum(lens_f1_scores) / len(lens_f1_scores):.2f}")
print(f"Average BLEU Score: {sum(lens_bleu_scores) / len(lens_bleu_scores):.2f}")
rouge1_avg = sum([score['rouge1'].fmeasure for score in lens_rouge_scores]) / len(lens_rouge_scores)
rougeL_avg = sum([score['rougeL'].fmeasure for score in lens_rouge_scores]) / len(lens_rouge_scores)
print(f"Average ROUGE-1 Score: {rouge1_avg:.2f}")
print(f"Average ROUGE-L Score: {rougeL_avg:.2f}")

# Error Analysis
print("\nError Analysis:")
for idx, (golden, chatbot) in enumerate(zip(golden_answers, llama_ensemble_answers), 1):
    if golden.strip().lower() != chatbot.strip().lower():
        print(f"Question {idx}:")
        print(f"  Golden Answer: {golden}")
        print(f"  Chatbot Answer: {chatbot}")
        print("  Issue: Potential mismatch or incomplete response\n")

Evaluation Metrics for LLAMA 3.2 - Ensemble Retriever
*************************
Exact Match (EM) Score: 0.00%
Average F1 Score: 0.13
Average BLEU Score: 0.00
Average ROUGE-1 Score: 0.14
Average ROUGE-L Score: 0.09

Error Analysis:
Question 1:
  Golden Answer: Whoever obstructs an Inspector in the exercise of any power conferred upon him by or under this Act, or disobeys the lawful authority of any Inspector, shall be punishable with imprisonment for a term which may extend to one year, or with fine which may extend to one lakh rupees, or with both.
  Chatbot Answer: 
The text provided does not specifically mention the legal consequences for obstructing an officer during an investigation. However, I can provide general information on this topic.

Obstructing an officer during an investigation is considered a serious offense in many jurisdictions and can result in various legal consequences. The specific penalties may vary depending on the jurisdiction, the severity of the obstruction, a

The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


#### Mistral 7b

*Ensemble Retriever - Equal Weights (0.5, 0.5)*

In [4]:
mistral_equal_answers = [
    "According to Section 12(1) of the Drugs Act, 1961, any person who obstructs or hinders an officer in the performance of his duties under this Act shall be punishable with imprisonment for a term which may extend to six months, or a fine, or both.",
    "The penalty for exporting, importing, manufacturing, or selling spurious or unregistered drugs is imprisonment for up to five years or a fine of up to Rs. 10 million, or both.",
    "Drugs specified in the Schedules B and D and a preparation containing such drug shall not be sold except on and in accordance with the prescription (original to be retained by the pharmacy or the medical store) of a registered medical practitioner.",
    "The fine for selling a drug labeled as “Physician Sample – Not for Sale” is not specified in the given document.",
    "Education and qualification: The owner or manager of the pharmacy or medical store must have a degree in pharmacy or a related field. Registration: The pharmacy or medical store must be registered with the relevant authorities, such as the Pakistan Medical and Dental Council (PM&DC) and the Drug Regulatory Authority of Pakistan (DRAP). Licensing: The pharmacy or medical store must obtain a license from the relevant authorities, such as the PM&DC and the DRAP, to operate legally. Facilities: The pharmacy or medical store must have adequate facilities, such as a clean and well-ventilated space, proper lighting, and adequate storage facilities for medicines and medical supplies. Equipment: The pharmacy or medical store must have the necessary equipment, such as scales, measuring cups, and dispensing equipment, to accurately measure and dispense medicines. Staff: The pharmacy or medical store must have qualified and trained staff, such as pharmacists and assistants, to dispense and advise on medicines. 7. Compliance: The pharmacy or medical store must comply with all relevant laws and regulations, such as the Pakistan Medical and Dental Council Act, 1962, and the Drug Regulatory Authority of Pakistan Act, 1990."
]

In [5]:
mf_exact_matches = 0
mf_f1_scores = []
mf_bleu_scores = []
mf_rouge_scores = []

In [6]:
for golden, chatbot in zip(golden_answers, mistral_equal_answers):
    # Exact Match
    if golden.strip().lower() == chatbot.strip().lower():
        mf_exact_matches += 1
    
    # F1 Score
    mf_f1_scores.append(calculate_f1(golden, chatbot))
    
    # BLEU Score
    mf_bleu_scores.append(sentence_bleu([golden.split()], chatbot.split()))
    
    # ROUGE Score
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    rouge = scorer.score(golden, chatbot)
    mf_rouge_scores.append(rouge)

# Accuracy (Exact Match Score)
accuracy = (mf_exact_matches / len(golden_answers)) * 100

# Results Summary
print("Evaluation Metrics for Mistral 7b - Ensemble Retriever (Equal Weights)")
print("*" * 25)
print(f"Exact Match (EM) Score: {accuracy:.2f}%")
print(f"Average F1 Score: {sum(mf_f1_scores) / len(mf_f1_scores):.2f}")
print(f"Average BLEU Score: {sum(mf_bleu_scores) / len(mf_bleu_scores):.2f}")
rouge1_avg = sum([score['rouge1'].fmeasure for score in mf_rouge_scores]) / len(mf_rouge_scores)
rougeL_avg = sum([score['rougeL'].fmeasure for score in mf_rouge_scores]) / len(mf_rouge_scores)
print(f"Average ROUGE-1 Score: {rouge1_avg:.2f}")
print(f"Average ROUGE-L Score: {rougeL_avg:.2f}")

# Error Analysis
print("\nError Analysis:")
for idx, (golden, chatbot) in enumerate(zip(golden_answers, mistral_equal_answers), 1):
    if golden.strip().lower() != chatbot.strip().lower():
        print(f"Question {idx}:")
        print(f"  Golden Answer: {golden}")
        print(f"  Chatbot Answer: {chatbot}")
        print("  Issue: Potential mismatch or incomplete response\n")

Evaluation Metrics for Mistral 7b - Ensemble Retriever (Equal Weights)
*************************
Exact Match (EM) Score: 0.00%
Average F1 Score: 0.30
Average BLEU Score: 0.05
Average ROUGE-1 Score: 0.30
Average ROUGE-L Score: 0.25

Error Analysis:
Question 1:
  Golden Answer: Whoever obstructs an Inspector in the exercise of any power conferred upon him by or under this Act, or disobeys the lawful authority of any Inspector, shall be punishable with imprisonment for a term which may extend to one year, or with fine which may extend to one lakh rupees, or with both.
  Chatbot Answer: According to Section 12(1) of the Drugs Act, 1961, any person who obstructs or hinders an officer in the performance of his duties under this Act shall be punishable with imprisonment for a term which may extend to six months, or a fine, or both.
  Issue: Potential mismatch or incomplete response

Question 2:
  Golden Answer: Shall be punishable with imprisonment for a term which shall not be less than thre

The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


*Ensemble Retriever - Varied Weights (0.7, 0.3)*

In [7]:
mistral_ensemble_answers = [
    "According to Section 18(1)(i) of the Act, an officer may seize any drug or other material if they believe it to be in violation of the Act or the rules. If a person obstructs an officer during an investigation, they may be charged with obstruction of justice under Section 18(1)(ii) of the Act. This offense is punishable by imprisonment for a term not exceeding six months, or a fine not exceeding one hundred thousand rupees, or both.",
    "The penalty for exporting, importing, manufacturing, or selling spurious or unregistered drugs is imprisonment for up to three years or a fine of up to PKR 5 million, or both.",
    "Drugs specified in the Schedules B and D and a preparation containing such drug shall not be sold except on and in accordance with the prescription (original to be retained by the pharmacy or the medical store) of a registered medical practitioner.",
    "No fine is imposed for selling a drug labeled as “Physician Sample – Not for Sale”.",
    "Completion of a recognized degree in pharmacy or a related field. Passing the Professional and Technical Competency Assessment (PTCA) conducted by the Pakistan Pharmacy Council (PPC). Obtaining a license from the provincial or federal government to operate a pharmacy or medical store. Complying with the regulations and guidelines set by the Drug Regulatory Authority of Pakistan (DRAP) and the PPC. Having adequate financial resources to invest in the establishment and operation of the pharmacy or medical store. Meeting the minimum standards set by the PPC and DRAP for the establishment and operation of a pharmacy or medical store."
]

In [8]:
mens_exact_matches = 0
mens_f1_scores = []
mens_bleu_scores = []
mens_rouge_scores = []

In [10]:
for golden, chatbot in zip(golden_answers, mistral_ensemble_answers):
    # Exact Match
    if golden.strip().lower() == chatbot.strip().lower():
        mens_exact_matches += 1
    
    # F1 Score
    mens_f1_scores.append(calculate_f1(golden, chatbot))
    
    # BLEU Score
    mens_bleu_scores.append(sentence_bleu([golden.split()], chatbot.split()))
    
    # ROUGE Score
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    rouge = scorer.score(golden, chatbot)
    mens_rouge_scores.append(rouge)

# Accuracy (Exact Match Score)
accuracy = (mens_exact_matches / len(golden_answers)) * 100

# Results Summary
print("Evaluation Metrics for Mistral 7b - Ensemble Retriever (0.7, 0.3)")
print("*" * 25)
print(f"Exact Match (EM) Score: {accuracy:.2f}%")
print(f"Average F1 Score: {sum(mens_f1_scores) / len(mens_f1_scores):.2f}")
print(f"Average BLEU Score: {sum(mens_bleu_scores) / len(mens_bleu_scores):.2f}")
rouge1_avg = sum([score['rouge1'].fmeasure for score in mens_rouge_scores]) / len(mens_rouge_scores)
rougeL_avg = sum([score['rougeL'].fmeasure for score in mens_rouge_scores]) / len(mens_rouge_scores)
print(f"Average ROUGE-1 Score: {rouge1_avg:.2f}")
print(f"Average ROUGE-L Score: {rougeL_avg:.2f}")

# Error Analysis
print("\nError Analysis:")
for idx, (golden, chatbot) in enumerate(zip(golden_answers, mistral_ensemble_answers), 1):
    if golden.strip().lower() != chatbot.strip().lower():
        print(f"Question {idx}:")
        print(f"  Golden Answer: {golden}")
        print(f"  Chatbot Answer: {chatbot}")
        print("  Issue: Potential mismatch or incomplete response\n")

Evaluation Metrics for Mistral 7b - Ensemble Retriever (0.7, 0.3)
*************************
Exact Match (EM) Score: 0.00%
Average F1 Score: 0.27
Average BLEU Score: 0.01
Average ROUGE-1 Score: 0.28
Average ROUGE-L Score: 0.20

Error Analysis:
Question 1:
  Golden Answer: Whoever obstructs an Inspector in the exercise of any power conferred upon him by or under this Act, or disobeys the lawful authority of any Inspector, shall be punishable with imprisonment for a term which may extend to one year, or with fine which may extend to one lakh rupees, or with both.
  Chatbot Answer: According to Section 18(1)(i) of the Act, an officer may seize any drug or other material if they believe it to be in violation of the Act or the rules. If a person obstructs an officer during an investigation, they may be charged with obstruction of justice under Section 18(1)(ii) of the Act. This offense is punishable by imprisonment for a term not exceeding six months, or a fine not exceeding one hundred thou