# Self Consistency Prompting

One of the more advanced techniques in prompt engineering is self-consistency, introduced by `Wang et al. (2022)`. 

This method seeks to improve upon the traditional greedy decoding typically used in chain-of-thought (CoT) prompting. 

The core concept involves sampling multiple diverse reasoning paths through few-shot CoT and leveraging these variations to determine the most consistent answer. The technique  enhances the effectiveness of CoT prompting, particularly for tasks requiring arithmetic and commonsense reasoning.

## References:
* [Wang et al. (2022)](https://arxiv.org/abs/2203.11171)

## Running this code on MyBind.org

Note: remember that you will need to **adjust CONFIG** with **proper URL and API_KEY**!

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/GenILab-FAU/prompt-eng/HEAD?urlpath=%2Fdoc%2Ftree%2Fprompt-eng%2Fself_consistency.ipynb)



In [None]:
##
## ZERO SHOT PROMPTING
##

from _pipeline import create_payload, model_req

#### (1) Adjust the inbounding  Prompt, simulating inbounding requests from users or other systems
MESSAGE = "What is 984 * log(2)"

#### (2) Adjust the Prompt Engineering Technique to be applied, simulating Workflow Templates

## @TODO 
PROMPT = MESSAGE 

#### (3) Configure the Model request, simulating Workflow Orchestration
# Documentation: https://github.com/ollama/ollama/blob/main/docs/api.md
payload = create_payload(target="ollama",
                         model="llama3.2:latest", 
                         prompt=PROMPT, 
                         temperature=1.0, 
                         num_ctx=100, 
                         num_predict=100)

### YOU DONT NEED TO CONFIGURE ANYTHING ELSE FROM THIS POINT
# Send out to the model
time, response = model_req(payload=payload)
print(response)
if time: print(f'Time taken: {time}s')

To calculate this, we need to know that the logarithm of 2 (base 10) is approximately 0.301.

So,

984 * log(2) ≈ 984 * 0.301
≈ 295.584
Time taken: 5.399s


In [4]:
from _pipeline import create_payload, model_req
import numpy as np
import re

#### (1) Adjust the inbounding Prompt, simulating inbounding requests from users or other systems
JOB_DESCRIPTION = """
We are hiring a Data Scientist with expertise in Python, Machine Learning, and SQL.
The candidate must have at least 3 years of experience and a background in statistical modeling.
Experience with cloud platforms (AWS, GCP) is a plus.
"""

RESUME = """
John Doe is a Data Scientist with 4 years of experience. Skilled in Python, Machine Learning,
and SQL. He has worked on predictive modeling projects and statistical analysis.
Familiar with AWS but lacks GCP experience.
"""

#### (2) Self-Consistency Prompting Technique
SELF_CONSISTENCY_PROMPT = f"""
You are an AI hiring assistant. Analyze the job description and resume multiple times using different reasoning paths. Compare answers and provide the most consistent evaluation.

### **Evaluation Criteria:**
- **Required skills missing:** -0.10 per skill
- **Preferred skills missing:** -0.05 per skill
- **Exceeds required experience:** +0.02 per extra year (max +0.05)
- **Perform 5 independent evaluations** and determine the most **consistent match score**.

---

**Job Description:**  
{JOB_DESCRIPTION}  

**Candidate Resume:**  
{RESUME}  

---

**Output Format:**  
1️⃣ **Evaluation Summary:**  
- Hard Skills: ✅ Matched / ❌ Not Matched  
- Soft Skills: ✅ Matched / ❌ Not Matched  
- Cloud Platforms: ✅ AWS / ❌ GCP  

2️⃣ **Match Score (numerical only, format as "Match Score: XX.XX%")**  
3️⃣ **Final Hiring Recommendation**  

Return the structured output exactly as specified.
"""

#### (3) Generate multiple responses and aggregate the most consistent result
def extract_match_score(response):
    """Extracts the numerical match score from model output."""
    match = re.search(r"Match Score[:\s]+(\d+\.?\d*)", response)
    if match:
        return float(match.group(1))  # Extracted score as float
    return None  # If no match is found

def get_consistent_score(prompt, attempts=5):
    """Runs multiple evaluations and selects the most consistent match score."""
    scores = []
    
    for _ in range(attempts):
        payload = create_payload(target="ollama",
                                 model="mistral",  
                                 prompt=prompt, 
                                 temperature=1.0,  # Sampling diversity
                                 num_ctx=768,  
                                 num_predict=350,  
                                 stream=False)  

        _, response = model_req(payload=payload)
        score = extract_match_score(response)  # Extract match score
        
        if score is not None:
            scores.append(score)

    if scores:
        final_score = np.median(scores)  # Select the most consistent match score
    else:
        final_score = "N/A"  # If no valid scores, return N/A
    
    return final_score

### Compute the final match score using self-consistency
final_match_score = get_consistent_score(SELF_CONSISTENCY_PROMPT, attempts=5)

### Display the result
print("\n=== AI Resume Screening Result (Self-Consistency) ===\n")
print(f"Final Match Score (Most Consistent Result): {final_match_score}%")

{'model': 'mistral', 'prompt': '\nYou are an AI hiring assistant. Analyze the job description and resume multiple times using different reasoning paths. Compare answers and provide the most consistent evaluation.\n\n### **Evaluation Criteria:**\n- **Required skills missing:** -0.10 per skill\n- **Preferred skills missing:** -0.05 per skill\n- **Exceeds required experience:** +0.02 per extra year (max +0.05)\n- **Perform 5 independent evaluations** and determine the most **consistent match score**.\n\n---\n\n**Job Description:**  \n\nWe are hiring a Data Scientist with expertise in Python, Machine Learning, and SQL.\nThe candidate must have at least 3 years of experience and a background in statistical modeling.\nExperience with cloud platforms (AWS, GCP) is a plus.\n  \n\n**Candidate Resume:**  \n\nJohn Doe is a Data Scientist with 4 years of experience. Skilled in Python, Machine Learning,\nand SQL. He has worked on predictive modeling projects and statistical analysis.\nFamiliar with