## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [4]:
# Always remember to do this!
load_dotenv(override=True)

True

In [5]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_


In [6]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [7]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [8]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


How would you approach designing a fair and unbiased algorithm for evaluating job applicants, considering factors like inherent cognitive biases, diverse backgrounds, and the need for equitable outcomes?


In [9]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

In [10]:
# The API we know well

model_name = "gpt-4o-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Designing a fair and unbiased algorithm for evaluating job applicants is a multifaceted challenge that requires careful consideration of various aspects, including bias mitigation, representation, and equitable outcomes. Here’s a structured approach to tackle this problem:

### 1. **Define Criteria for Evaluation**

   - **Identify Job Requirements:** Clearly outline the key competencies, skills, and experiences that are essential for the position. This should be based on data-driven insights and organizational needs.
   - **Performance Metrics:** Establish what success looks like in the role, using historical performance data and stakeholder input.

### 2. **Data Collection and Representation**

   - **Diverse Data Sources:** Collect a comprehensive dataset that includes applicants from various backgrounds. Ensure representation of different demographic groups (gender, race, socioeconomic status, etc.).
   - **Bias Audit of Existing Data:** Analyze historical hiring data to identify any inherent biases in past hiring practices. This includes examining patterns that may disadvantage certain groups.

### 3. **Bias Mitigation**

   - **Blind Recruitment:** Implement anonymized applications where potentially biased information (names, addresses, graduation institutions) is removed to focus on qualifications and experience.
   - **Algorithm Fairness Techniques:** Utilize techniques like re-weighting, adversarial debiasing, and fairness constraints to optimize the algorithm for equitable outcomes.
   - **Regular Audits:** Establish a schedule for auditing the algorithm’s outcomes for any signs of bias against protected classes.

### 4. **Algorithm Design**

   - **Explainable AI:** Choose machine learning models that are interpretable. This allows stakeholders to understand decision-making processes and provides insights into potential biases.
   - **Multi-modal Input:** Consider various evaluation methods (technical assessments, interviews, role plays) to capture the full range of candidate attributes.

### 5. **Incorporating Human Judgment**

   - **Hybrid Approach:** Use the algorithm in conjunction with human judgment. Recruiters can have the final say, particularly in culturally sensitive or nuanced evaluations.
   - **Training for Recruiters:** Provide training on unconscious bias, diversity, and inclusion, and how to interpret algorithm outputs critically.

### 6. **Feedback Loop and Continuous Improvement**

   - **Iterative Testing:** Regularly test the algorithm against real-world outcomes and adjust as necessary. Monitor how well the algorithm predicts on-the-job success for different groups.
   - **Stakeholder Feedback:** Create channels for feedback from candidates and hiring managers to understand the experience and gather qualitative data.

### 7. **Transparency and Accountability**

   - **Clear Communication:** Keep all stakeholders informed about how the algorithm is designed, the data it uses, and how decisions are made.
   - **Metrics for Success:** Define success metrics that reflect both hiring goals and diversity/inclusion objectives, and report on these metrics publicly.

### 8. **Legal and Ethical Considerations**

   - **Compliance:** Ensure the algorithm complies with all relevant laws and regulations (e.g., EEOC guidelines in the U.S.) regarding hiring practices and fairness.
   - **Ethical Standards:** Establish and adhere to ethical principles guiding the use of AI in hiring, prioritizing fairness, accountability, and transparency.

By employing a comprehensive approach that encompasses data integrity, algorithm fairness, human involvement, and continuous improvement, it’s possible to design an innovative and equitable job applicant evaluation system. This process requires collaboration across stakeholders, ongoing assessment, and a commitment to learning from outcomes to continuously refine methods for inclusivity and fairness.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-3-7-sonnet-latest"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}}

In [12]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Designing a fair and unbiased algorithm for evaluating job applicants is a complex task requiring a multi-faceted approach. Here's a breakdown of how I would tackle it, focusing on minimizing biases and promoting equitable outcomes:

**1. Define Fairness & Success Metrics Clearly:**

*   **Explicitly Define Fairness:**  The first step is to move beyond vague concepts of fairness and define *what fairness means in this specific context*. This involves stakeholder input (HR, hiring managers, employees, potential applicants).  Examples of fairness definitions include:
    *   **Demographic Parity (Statistical Parity):**  The selection rate should be approximately the same for different demographic groups. This is a common, but potentially controversial, metric.
    *   **Equal Opportunity:**  Individuals with similar qualifications should have an equal chance of being selected, regardless of demographic group.  This requires defining "qualification" carefully.
    *   **Predictive Parity:**  The algorithm's predictions should be equally accurate across different groups. For example, the rate of correctly identifying qualified candidates should be the same for all groups.
    *   **Counterfactual Fairness:** An individual should receive the same outcome had they belonged to a different demographic group. This is a more advanced and computationally expensive approach.

*   **Define Success Metrics:**  Clearly define what constitutes "success" in the role.  These metrics should be measurable and relevant to job performance. They should also be validated to ensure they are not inherently biased (e.g., relying on communication styles that favor certain cultural backgrounds).

**2. Data Acquisition & Preprocessing - Focus on Minimizing Bias:**

*   **Data Audit:** Conduct a thorough audit of existing historical hiring data. Identify potential sources of bias, such as:
    *   **Selection Bias:** The data used to train the algorithm only reflects those who *were* hired, not those who *could have been* successful.
    *   **Label Bias:**  Performance evaluations or other labels might be biased, reflecting existing biases in the organization.
    *   **Measurement Bias:** The way data is collected or measured might systematically disadvantage certain groups.  (e.g., using tests written primarily for one culture)

*   **Data Augmentation/Balancing:**
    *   **Oversampling/Undersampling:**  If certain demographic groups are underrepresented in the training data, oversample them or undersample the overrepresented groups.  However, be careful not to introduce synthetic data that perpetuates stereotypes.
    *   **Synthetic Data Generation:** Consider creating synthetic data that fills in gaps in the data and mitigates biases, especially when demographic data is sparse.  Use techniques like Generative Adversarial Networks (GANs) carefully to avoid amplifying existing biases.

*   **Feature Selection & Engineering:**
    *   **Blind the Algorithm:**  Remove or anonymize protected attributes (e.g., gender, race, age, zip code) from the data used to train the model.  However, be aware of *proxy variables* (features that are highly correlated with protected attributes).
    *   **Focus on Job-Relevant Skills:**  Prioritize features that directly reflect skills and experience needed for the job, and de-emphasize features that are less relevant or potentially discriminatory (e.g., hobbies, school prestige).
    *   **Consider Alternative Credentials:**  Value skills and experience gained through alternative pathways (e.g., online courses, bootcamps, volunteer work) to create a more inclusive talent pool.
    *   **Debias Feature Representations:** Use techniques like adversarial debiasing or word embedding debiasing to reduce biases in feature representations, especially when dealing with textual data (e.g., resumes, cover letters).

**3. Algorithm Selection & Development:**

*   **Choose Appropriate Algorithms:**  Certain algorithms are more prone to bias than others. For example, highly complex models (like deep neural networks) can easily learn and amplify biases present in the data.  Consider using simpler, more transparent models, especially if explainability is important.
*   **Regularization Techniques:**  Apply regularization techniques (e.g., L1 or L2 regularization) to prevent the model from overfitting to biased patterns in the data.
*   **Adversarial Debiasing:** Train the algorithm to *minimize* its ability to predict protected attributes while *maximizing* its ability to predict job performance.
*   **Ensemble Methods:**  Combine multiple models trained on different subsets of the data or with different debiasing techniques to reduce the overall bias.

**4. Model Evaluation & Validation - Rigorous Bias Auditing:**

*   **Bias Auditing:** Continuously monitor the algorithm's performance for biases across different demographic groups.  Use metrics like:
    *   **Disparate Impact:**  Calculate the selection rate for different groups and compare them to see if there is a statistically significant difference.  (Often measured using the 80% rule).
    *   **False Positive/Negative Rates:**  Analyze whether the algorithm makes more errors for certain groups (e.g., falsely rejecting qualified candidates from underrepresented groups).
    *   **Calibration Analysis:**  Assess whether the algorithm's predicted probabilities accurately reflect the true probability of success for different groups.
*   **A/B Testing:**  Compare the performance of the algorithm to a traditional hiring process using A/B testing to evaluate its impact on diversity and inclusion.
*   **Explainable AI (XAI):**  Use XAI techniques (e.g., SHAP values, LIME) to understand which features are driving the algorithm's decisions and identify potential sources of bias.  This allows for a more transparent and accountable system.

**5. Implementation & Ongoing Monitoring:**

*   **Transparency & Explainability:**  Provide applicants with clear and understandable explanations of how the algorithm works and how their application was evaluated.
*   **Human Oversight:**  Don't rely solely on the algorithm.  Incorporate human review at key stages of the hiring process to ensure fairness and address any potential biases.  Provide training to hiring managers on how to interpret the algorithm's results and avoid perpetuating biases.
*   **Regular Monitoring & Re-Training:**  Continuously monitor the algorithm's performance and re-train it periodically with updated data to ensure it remains fair and accurate.
*   **Feedback Mechanisms:**  Establish a process for applicants to provide feedback on the hiring process, including the algorithm.
*   **Ethical Review Board:**  Establish an ethical review board to oversee the development and deployment of the algorithm and ensure it aligns with the organization's values.

**Important Considerations:**

*   **Context Matters:**  The specific techniques used to mitigate bias will depend on the specific job, the data available, and the organization's values.
*   **No Perfect Solution:**  It is impossible to eliminate all bias from an algorithm. The goal is to minimize bias as much as possible and ensure that the algorithm is fair and equitable.
*   **Legal Compliance:**  Ensure that the algorithm complies with all applicable laws and regulations regarding discrimination.
*   **Continuous Improvement:**  Fairness is an ongoing process, not a one-time fix. Continuously monitor the algorithm's performance and adapt it as needed to address any emerging biases.

By following these steps, you can create an algorithm that is more fair and unbiased than a traditional hiring process, and that promotes equitable outcomes for all applicants. It requires a commitment to ethical considerations, data analysis, and ongoing monitoring. Remember to always prioritize fairness and transparency throughout the entire process.


In [13]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

APIStatusError: Error code: 402 - {'error': {'message': 'Insufficient Balance', 'type': 'unknown_error', 'param': None, 'code': 'invalid_request_error'}}

In [14]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


Designing a fair and unbiased algorithm for evaluating job applicants requires a multidisciplinary approach, involving expertise in machine learning, psychology, sociology, and ethics. Here's a comprehensive framework to help you create a fair and equitable algorithm:

**Pre-design phase**

1. **Define the problem and goals**: Identify the specific job requirements, the type of applicants you're looking for, and the desired outcomes. Ensure that the goals are aligned with the organization's values and diversity, equity, and inclusion (DEI) policies.
2. **Assemble a diverse team**: Bring together a team with diverse backgrounds, expertise, and perspectives to co-design the algorithm. This team should include data scientists, psychologists, sociologists, ethicists, and HR professionals.
3. **Conduct a thorough literature review**: Research existing algorithms, biases, and fairness metrics to understand the current state of the field.

**Data collection and preprocessing**

1. **Collect diverse and representative data**: Gather data from various sources, including job applications, resumes, cover letters, and assessments. Ensure that the data is representative of the diverse pool of applicants you're targeting.
2. **Remove identifiable information**: Anonymize the data by removing identifiable information such as names, ages, addresses, and other demographic characteristics that could introduce biases.
3. **Preprocess data**: Clean, transform, and normalize the data to prepare it for analysis.

**Algorithm development**

1. **Select a suitable algorithm**: Choose an algorithm that is transparent, explainable, and auditable, such as a decision tree or a logistic regression model.
2. **Use fairness metrics**: Integrate fairness metrics, such as demographic parity, equal opportunity, or calibration, to evaluate the algorithm's performance and identify potential biases.
3. **Regularization techniques**: Implement regularization techniques, such as L1 or L2 regularization, to reduce overfitting and prevent the algorithm from relying too heavily on any single feature.
4. **Feature selection**: Select features that are relevant to the job requirements and minimally correlated with protected characteristics (e.g., age, sex, ethnicity).
5. **Model interpretability**: Ensure that the algorithm is interpretable, allowing for the identification of features that contribute to the decision-making process.

**Bias detection and mitigation**

1. **Bias detection tools**: Utilize tools, such as fairness metrics, bias detection algorithms, or audits, to identify potential biases in the algorithm.
2. **Debiasing techniques**: Apply debiasing techniques, such as data preprocessing, feature engineering, or regularization, to mitigate identified biases.
3. **Human oversight**: Implement human oversight and review processes to detect and correct potential biases.

**Evaluation and validation**

1. **Evaluate the algorithm**: Assess the algorithm's performance using fairness metrics, accuracy, and other relevant metrics.
2. **Validate the results**: Validate the algorithm's results by comparing them to human evaluations or other fairness metrics.
3. **Iterate and refine**: Refine the algorithm based on the evaluation and validation results, and iterate until the desired level of fairness and accuracy is achieved.

**Deployment and monitoring**

1. **Deploy the algorithm**: Deploy the algorithm in a controlled environment, with clear guidelines and protocols for its use.
2. **Monitor performance**: Continuously monitor the algorithm's performance, fairness, and accuracy, and update it as needed.
3. **Transparency and explainability**: Provide transparency and explainability of the algorithm's decisions, ensuring that applicants and stakeholders understand the evaluation process.

**Addressing inherent cognitive biases**

1. **Awareness and education**: Educate the development team and users about inherent cognitive biases, such as confirmation bias, anchoring bias, or affinity bias.
2. **Design for diversity**: Incorporate diverse perspectives and ideas into the algorithm's design to minimize the impact of cognitive biases.
3. **Regular audits**: Conduct regular audits to detect and address potential biases introduced by cognitive biases.

**Diverse backgrounds and equitable outcomes**

1. **Inclusive design**: Design the algorithm to be inclusive of diverse backgrounds, cultures, and perspectives.
2. **Equitable outcomes**: Ensure that the algorithm prioritizes equitable outcomes, such as equal opportunities for underrepresented groups.
3. **Continuous improvement**: Continuously monitor and improve the algorithm to ensure that it promotes equitable outcomes and minimizes biases.

By following this framework, you can develop a fair and unbiased algorithm for evaluating job applicants, taking into account inherent cognitive biases, diverse backgrounds, and the need for equitable outcomes. Remember that fairness and bias are ongoing concerns, and continuous monitoring and improvement are essential to maintaining a fair and equitable algorithm.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [15]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB                         [K[?25h[?2026l[?2026h[?25l[A[A[A[A[1Gpulling manifest [K
pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB                     

In [16]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Designing a fair and unbiased algorithm for evaluating job applicants requires a multi-faceted approach that addresses inherent cognitive biases, diverse backgrounds, and the need for equitable outcomes. Here's a step-by-step guide to help you create an algorithm that minimizes bias and ensures fairness:

1. **Define clear goals and criteria**: Establish specific evaluation criteria and goals for each position. Ensure these are measurable, objective, and relevant to the job requirements.
2. **Data collection and preprocessing**:
	* Gather data on applicants, including their resumes, cover letters, social media profiles, and any other relevant information.
	* Preprocess the data by removing irrelevant information, normalizing data formats, and converting text data into numerical representations (e.g., sentiment analysis).
3. **Identify bias-reducing techniques**:
	* Use de-biased resume screening tools that focus on relevant skills and qualifications rather than demographic information.
	* Implement algorithms that detect and mitigate biases in natural language processing (NLP) tasks, such as sentiment analysis and named entity recognition.
4. **Mitigate inherent cognitive biases**:
	* Use techniques like randomization or stratification to ensure diverse samples of applicants are evaluated.
	* Implement blind hiring practices, where the algorithm is not trained on identifiable information (e.g., names, ages).
5. **Incorporate diversity and inclusion metrics**:
	* Monitor and track data on applicant diversity, including demographics, education levels, and work experience.
	* Use algorithms that recognize and reward diversity in the evaluation process.
6. **Algorithmic audit and testing**: Regularly review and test your algorithm to ensure it is free from bias and produces equitable results. You can use techniques like:
	* Bias detection tools (e.g., AI Fairness 360, Demographic Prediction Tool)
	* Simulations of diverse scenarios to test the algorithm's fairness
7. **Regular evaluation and adaptation**:
	* Continuously monitor the performance of your algorithm and make adjustments as needed.
	* Include regular review of diversity metrics to ensure the pipeline is fair and inclusive.

Best Practices for Developing an Algorithmic Ecosystem:

1.  **Engage diverse stakeholders**: Involve HR, diversity experts, and representatives from underrepresented groups in the design and development process.
2.  **Establish metrics for fairness**: Use universally accepted metrics (e.g., accuracy metrics) to measure your algorithm's performance and biases.
3.  **Regularly review data quality**: Ensure high-quality data is used as input for decision-making.
4.  **Implement feedback mechanisms**: Create an open platform for applicants to provide feedback on the hiring process.

By incorporating these best practices, strategies, and algorithms that mitigate bias, you can develop a fair and unbiased algorithm that advances diversity and inclusion in your organization's hiring practices.

**Example: AI Model**

Here's an algorithm to help evaluate job applicants using AI models:

*   **Step 1:** Preprocess application data by normalizing skills, education level, and other relevant information into numerical representations.
*   **Step 2:** Train a machine learning (ML) model on the curated preprocessed data, which can identify relevant qualities, skills, and experience in job seekers.
*   **Key Metric**:

    *   90% accuracy rate for predicting candidate's ability to fit company culture
    *   60% diversity bonus when including applicants underrepresented groups

In [17]:
# So where are we?

print(competitors)
print(answers)


['gpt-4o-mini', 'gemini-2.0-flash', 'llama-3.3-70b-versatile', 'llama3.2']
['Designing a fair and unbiased algorithm for evaluating job applicants is a multifaceted challenge that requires careful consideration of various aspects, including bias mitigation, representation, and equitable outcomes. Here’s a structured approach to tackle this problem:\n\n### 1. **Define Criteria for Evaluation**\n\n   - **Identify Job Requirements:** Clearly outline the key competencies, skills, and experiences that are essential for the position. This should be based on data-driven insights and organizational needs.\n   - **Performance Metrics:** Establish what success looks like in the role, using historical performance data and stakeholder input.\n\n### 2. **Data Collection and Representation**\n\n   - **Diverse Data Sources:** Collect a comprehensive dataset that includes applicants from various backgrounds. Ensure representation of different demographic groups (gender, race, socioeconomic status, etc

In [18]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-4o-mini

Designing a fair and unbiased algorithm for evaluating job applicants is a multifaceted challenge that requires careful consideration of various aspects, including bias mitigation, representation, and equitable outcomes. Here’s a structured approach to tackle this problem:

### 1. **Define Criteria for Evaluation**

   - **Identify Job Requirements:** Clearly outline the key competencies, skills, and experiences that are essential for the position. This should be based on data-driven insights and organizational needs.
   - **Performance Metrics:** Establish what success looks like in the role, using historical performance data and stakeholder input.

### 2. **Data Collection and Representation**

   - **Diverse Data Sources:** Collect a comprehensive dataset that includes applicants from various backgrounds. Ensure representation of different demographic groups (gender, race, socioeconomic status, etc.).
   - **Bias Audit of Existing Data:** Analyze historical 

In [19]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [20]:
print(together)

# Response from competitor 1

Designing a fair and unbiased algorithm for evaluating job applicants is a multifaceted challenge that requires careful consideration of various aspects, including bias mitigation, representation, and equitable outcomes. Here’s a structured approach to tackle this problem:

### 1. **Define Criteria for Evaluation**

   - **Identify Job Requirements:** Clearly outline the key competencies, skills, and experiences that are essential for the position. This should be based on data-driven insights and organizational needs.
   - **Performance Metrics:** Establish what success looks like in the role, using historical performance data and stakeholder input.

### 2. **Data Collection and Representation**

   - **Diverse Data Sources:** Collect a comprehensive dataset that includes applicants from various backgrounds. Ensure representation of different demographic groups (gender, race, socioeconomic status, etc.).
   - **Bias Audit of Existing Data:** Analyze histor

In [21]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [22]:
print(judge)

You are judging a competition between 4 competitors.
Each model has been given this question:

How would you approach designing a fair and unbiased algorithm for evaluating job applicants, considering factors like inherent cognitive biases, diverse backgrounds, and the need for equitable outcomes?

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

Designing a fair and unbiased algorithm for evaluating job applicants is a multifaceted challenge that requires careful consideration of various aspects, including bias mitigation, representation, and equitable outcomes. Here’s a structured approach to tackle this problem:

### 1. **Define Criteria for Evaluation**

   - **Identify J

In [23]:
judge_messages = [{"role": "user", "content": judge}]

In [24]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["2", "1", "3", "4"]}


In [25]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gemini-2.0-flash
Rank 2: gpt-4o-mini
Rank 3: llama-3.3-70b-versatile
Rank 4: llama3.2


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>