# Cohere Internship Applicant Evaluator

### Purpose
Utilize the Cohere LLM model, Command-R, to determine my qualification for Cohere's Machine Learning Fall 2024 internship.

### Steps
**1. Extract Job Posting Information**<br>
Summarize the job posting to include company mission, job responsibilities, and required/preferred skills.
<br>**2. Create a Checklist**<br>
Develop a checklist based on the job description brief to evaluate applicants and resumes.
<br>**3. Extract Resume and Personal Statement Information**<br>
Identify technical skills and work experience from the resume and personal statement.
<br>**4. Evaluate Qualification**<br>
Use the checklist to assess my resume and personal statement. If qualified, prepare interview questions to further evaluate suitability for the role.

In [1]:
pip install cohere guardrails-ai -q

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Import necessary packages
import cohere
import guardrails as gd
from pydantic import BaseModel, Field
from rich import print
from typing import List

# Create a Cohere client
co = cohere.Client(api_key="69q2ogFaOK5RTeMclPXnzMKcWMmeA90QIeTG4DW8")

#### Define the Ouput Schema

Define an ouput schema using Pydantic to specify the expected format and structure of LLM responses. Using Guardrails AI, responses are validated to ensure they conform to predefined JSON schemas. If LLM responses do not match the expected format or contain errors, Guardrails AI can implement corrective measures to regenerate or adjust responses accordingly. This process ensures consistent and strcutured output from LLMs. 

Reference: [Cohere: Validating Outputs](https://docs.cohere.com/docs/validating-outputs#wrap-an-llm-call-with-the-guard-object)

In [3]:
# JOB POSTING INFORMATION EXTRACTION
class JobPosting(BaseModel):
    position: str = Field(
        description="Name of the job position."
    )
    company_mission_values: str = Field(
        description="Overarching goals and principles of the hiring company, helping candidates align their values with the company. This can include information such as what the company does and the environment/culture."
    )
    job_description: List[str] = Field(
        description="Responsibilities and duties expected of the candidate in the role, giving a clear understanding of what the job entails."
    )
    skills: List[str] = Field(
        description="Skills candidates posses to be considered for the position. Specifically skills pertaining to their technical background, including programming languages, technologies, and skills"
    )
        
        
# JOB POSTING CHECKLIST
class ValueAlignment(BaseModel):
    align: bool = Field(
        description="Value determining if the applicant aligns with company mission values."
    )
    reasoning: str = Field(
        description="Reasoning why the applicant does or does not align with the companies mission values. Be sure to include specific examples."
    )
        
class JobDescriptionAlignment(BaseModel):
    align: bool = Field(
        description="Value determining if the applicant aligns with the job description of the role."
    )
    reasoning: str = Field(
        description="Reasoning why the applicant does or does not align with the role. Be sure to include specific examples based on certain points of the job description."
    )
        
class SkillRequirement(BaseModel):
    skill: str = Field(
        description="Name of the skill."
    )
    possess: bool = Field(
        description="Value determining if the applicant has this skill."
    )
    reasoning: str = Field(
        description="Reasoning why the applicant has or does not have this skill."
    )

class JobChecklist(BaseModel):
    company_mission_values: ValueAlignment = Field(
        description="Evaluates whether the applicant's values align with the mission and values of the company."
    )
    job_description: JobDescriptionAlignment = Field(
        description="Assesses whether the applicant's skills and experiences align with the details outlined in the job description."
    )
    skills: List[SkillRequirement] = Field(
        description="List of specific skills required for the role, along with an assessment of whether the applicant possesses each skill."
    )
        
# RESUME INFORMATION EXTRACTION
class Role(BaseModel):
    role: str = Field(
        description="Specific title of the job role."
    )
    company: str = Field(
        description="Name of the company."
    )
    duties: List[str] = Field(
        description="Tasks and responsibilities accomplished during the applicants time at the company."
    )
          
class ResumeInformation(BaseModel):
    name: str = Field(
        description="Full name of the applicant."
    )
    skills: List[str] = Field(
        description="Technological skills the the applicant, such as programming languages and tools and technologies they've used."
    )
    experience: List[Role] = Field(
        description="Applicants professional work history, including details of roles held and responsibilities undertaken."
    )

   

In [4]:
# Information fields to be sent to LLM ie) job description, resume information, personal statement

job_description = """
Machine Learning Intern/Co-op (Fall 2024)
Canada / London / New York City / San Francisco / TorontoInternships /Intern, Remote /Remote


Who are we?
Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is the one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future!

Why this role?
Ship state of the art models to production.
Design and implement novel research ideas.
Build elegant training/deployment pipelines.
Join us at a pivotal moment, shape what we build and wear multiple hats as an intern!

Our recruitment process will begin in the upcoming weeks, and we will be carefully reviewing applications and assessing potential candidates for our internships. Should we find a suitable match with your qualifications and our requirements, we will be in touch to discuss the opportunity further and to advance your application to the next stage. Please apply by June 28th.

Please Note: To be eligible for this position you should be a student currently enrolled in a post-secondary program, available for a full-time 3-6 month internship, co-op, or research work term. We have offices in Toronto, San Francisco, New York, and London but embrace being remote-friendly! There are no restrictions on where you can be located for this role.
As a Machine Learning Intern, you will:
Design, train and improve upon cutting-edge models.
Help us develop new techniques to train and serve models safer, better, and faster.
Train extremely large-scale models on massive datasets.
Explore continual and active learning strategies for streaming data.
Learn from experienced senior machine learning technical staff.
Work closely with product teams to develop solutions.
You may be a good fit if you have these skills:
- Proficiency in Python and related ML frameworks such as Tensorflow, TF-Serving, JAX, and XLA/MLIR.
- Experience using large-scale distributed training strategies.
- Familiarity with autoregressive sequence models, such as Transformers.
- Strong communication and problem-solving skills.
- A demonstrated passion for applied NLP models and products.
- Bonus: experience writing kernels for GPUs using CUDA.
- Bonus: experience training on TPUs.
- Bonus: papers at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).

If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! If you consider yourself a thoughtful worker, a lifelong learner, and a kind and playful team member, Cohere is the place for you.

We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants of all kinds and are committed to providing an equal opportunity process. Cohere provides accessibility accommodations during the recruitment process. Should you require any accommodation, please let us know and we will work with you to meet your needs.

Our Perks:
An open and inclusive culture and work environment 
Work closely with a team on the cutting edge of AI research 
Free daily lunch 
Full health and dental benefits, including a separate budget to take care of your mental health 
Personal enrichment benefits towards arts and culture, fitness and well-being
Remote-flexible, offices in Toronto, New York, San Francisco and London and coworking stipends
Paid vacation
"""

resume_description = """
Joshua Chang
joshuachang.me | j79chang@uwaterloo.ca | joshjchang | JoshChang8
SKILLS
Languages: Python, SQL, R, Java, C++, JavaScript, BASH, Git
Tools and Technologies: Cohere Command-R, TensorFlow, CUDA, JAX, Pandas, Open AI, Langfuse, Scikit-Learn, PyTorch, Git
Technical Skills: Natural Language Processing, RAG, Prompt Engineering, Generative AI, Statistical Analysis, Data Collection

EXPERIENCE
Machine Learning Engineer | Keplar.io                                                                                                                    Jan 2024 – Apr 2024	
●Developed a Constitutional AI Evaluator LLM by implementing Prompt Engineering methodologies, using a set of guiding principles for data quality assessment, emulating Reinforcement Learning from Human Feedback (RLHF)
●Leveraged Optimization by Prompting (OPRO) techniques to enhance LLM performance, increasing precision by ~10%
●Constructed a benchmarking suite of quantitative and qualitative tests to assess the performance of LLMs
●Utilized image generation models with ControlNet to generate product design variations, providing canny and depth image conditioning inputs to refine diffusion models 
Data Scientist | Quantolio Financial Technologies                                                                                             May 2023 – Aug 2023
●Led end-to-end feature development, conceptualizing, and implementing an interactive UI using Streamlit to showcase diverse portfolio optimization algorithms and visualizing cumulative returns through dynamic graph visualizations
●Containerized the application using Docker and orchestrated deployment on Heroku, ensuring accessibility for clients
●Communicated with clients and investors, effectively conveying product features and value propositions to foster strong relationships and drive investment interest
Data Engineer and Computer Vision Research Assistant | University of Waterloo                                   Jan 2023 – May 2023
●Utilize YOLOv5 object detection to detect the pitcher and analyze the 3D Pose Estimation of their biomechanics to improve performance and aid in injury prevention
●Aggregate and clean Hawkeye baseball data from SQL databases and AWS S3 storages to train various models 
Site Reliability Engineer | The Globe and Mail                                                                                                     Sept 2022 – Dec 2022
●Created an AWS Lambda function using various AWS services, such as CloudTrail, EventBridge, and IAM to scale Kinesis data streams to eliminate under and over utilization, saving costs on resources 
●Created monitoring and alerting for 250+ Airflow DAGs to catch 30+ anomalies by creating a Datadog metrics monitor provisioned using Terraform

PROJECTS
Job Qualification Evaluator                                                                                                 Cohere Command-R, Guardrails-AI, Jupyter	
●Developed a recruiter assessment tool using the Cohere Command-R LLM model to evaluate applicant suitability based on job descriptions and applicant resumes
●Utilized Guardrails AI to ensure validated and structured JSON responses from the LLM, improving the accuracy and reliability of LLM responses
Spotify Song Recommender                                                                                                                  Python, Pandas, NumPy, Jupyter	
●Designed and developed a Spotify recommendation application conducting comprehensive analyses on user’s playlists utilizing the Spotify API and delivering personalized recommendations based on mathematical similarity algorithms
●Collected, processed, and refined dataset of 1M+ songs using Pandas and NumPy to construct item-feature matrices for mathematical similarity measures
●Implemented multiprocessing techniques to enhance efficiency of CPU-bound operations, resulting in 35% reduction in computation time
                                                                                                                                                                                                                                                                                                           
EDUCATION
University of Waterloo Waterloo, ON
Candidate of Bachelor of Applied Science in Systems Design Engineering, Artificial Intelligence Option
•Relevant Coursework: Data Structures and Algorithms, Human Factors in Design
"""

job_description_checklist = """
{
    'position': 'Machine Learning Intern/Co-op',
    'company_mission_values': 'Scale intelligence to serve humanity. Train and deploy frontier models for developers building AI systems. Focus on semantic search, RAG and agents. Move fast, work hard and build great products as part of a diverse team.',
    'job_description': [
        'Ship state-of-the-art models to production',
        'Design and implement novel research ideas',
        'Build elegant training and deployment pipelines',
        'Work closely with product teams to develop solutions',
        'Train large-scale models',
        'Explore continual and active learning strategies',
        'Learn from senior technical staff'
    ],
    'skills': [
        'Python',
        'Tensorflow',
        'TF-Serving',
        'JAX',
        'XLA/MLIR',
        'Distributed training strategies',
        'Autoregressive sequence models',
        'Strong communication',
        'Applied NLP models',
        'CUDA',
        'TPUs',
        'Publishing research papers'
    ]
}
"""

personal_statement = """
I am a third year Systems Design Engineering student at the University of Waterloo with a strong passion for machine learning and AI. I have completed over four internships in diverse roles, including Machine Learning, Research, Data Science, Cloud Infrastructure, Data Engineering, and Backend Engineering. 

Most recently, I worked at a small start-up with significant project ownership, specializing in LLMs and Generative AI. My responsibilities included integrating recent advancements in LLMs, leveraging prompt engineering, fine-tuning techniques, and benchmarking to enhance the performance of LLMs. 

I also love to build projects! I recently built a recruiter assessment tool using Cohere’s Command-R LLM model to evaluate applicant suitability based on job descriptions and applicant resumes. 

In my next role, I aim to further explore the Machine Learning space, working with LLMs to expand my knowledge and their capabilities. I seek opportunities to work with experienced professionals in AI and NLP for mentorship and insights into industry and research trends.
"""

In [5]:
# Prompt Cohere LLM with guard object to ensure the response is validated in the format needed
def run_prompt(response_format, prompt, prompt_params):
    
    # Initialize a Guard object from the Pydantic model 
    guard = gd.Guard.from_pydantic(response_format, prompt=prompt)
    # print(guard.base_prompt)

    # Wrap API call with `guard` object
    response = guard(
        co.chat,
        prompt_params=prompt_params,
        model='command-r')
    
    # outputs Guardrails logging, showing any validation errors and reasks 
    # print(guard.history.last.tree)

    return response.validated_output

In [6]:
# Extract information from job posting. Information position, job description, company values, and skills are extracted.

JOB_DESCRIPTION_PROMPT = """Provided is information of a job posting, please extract a dictionary that contains the information of the job. 

Job Description: 
${job_description}

${gr.complete_xml_suffix_v2}
"""

job_description = run_prompt(JobPosting, JOB_DESCRIPTION_PROMPT, {"job_description": job_description})
print(job_description)

In [7]:
# Convert information extracted from job description to a checklist to evaluate applicants 

JOB_DESCRIPTION_CHECKLIST_PROMPT = """Provided is a dictionary of a job description, please use the dictionary to create a checklist that will be used to evaluate applicants applying for this role. 
# Note: Make sure the number of skill objects in the checklist EQUAL the same number of skills in the job description response. The boolean values are to be initialized to False and all reasoning string values are to be empty

${job_description_checklist}

${gr.complete_xml_suffix_v2}
"""

checklist = run_prompt(JobChecklist, JOB_DESCRIPTION_CHECKLIST_PROMPT, {"job_description_checklist": job_description_checklist})
print(checklist)

In [8]:
# Extract applicant's information (skills & experience) from their Resume. 

RESUME_DESCRIPTION_PROMPT = """Provided is an applicant's resume, please extract a dictionary that contains relevant information of the applicant. 

${resume_description}

${gr.complete_xml_suffix_v2}
"""

resume_description = run_prompt(ResumeInformation, RESUME_DESCRIPTION_PROMPT, {"resume_description": resume_description})
print(resume_description)

  warn(


In [9]:
# Complete the checklist based on the applicants extractred resume information and personal statement 

CANDIDATE_EVALUATION_CHECKLIST_PROMPT = """Provided is a job description checklist along with an applicant’s information. Extracted details of the applicants Resume and their personal statement is provided. Based on the candidate’s information, update the values of the Job Description Checklist. 

Job Description Checklist:
${job_description_checklist}

Resume Description:
${resume_description_response.validated_output}

Personal Statement:
${personal_statement}


${gr.complete_json_suffix_v2}
"""

checklist_evaluation = run_prompt(JobChecklist, CANDIDATE_EVALUATION_CHECKLIST_PROMPT, {"job_description_checklist": job_description_checklist})
print(checklist_evaluation)

  warn(


In [10]:
# Using the now filled-out checklist, determine with reasoning if the applicant should be interviewed. Also include potential questions you could ask the applicant in the interview. 

CANDIDATE_REFLECTION_PROMPT = f"""Review the applicant’s suitability based on the provided checklist, resume and personal statement. Determine if the applicant should be interviewed, providing reasons for your decision. If an interview is recommended, suggest questions tailored to the applicant’s experience, skills and personal statements to assess their fit for the role. 

Job Description Checklist:
{checklist_evaluation}

Resume Description:
{resume_description}

Personal Statement:
{personal_statement}
"""

response = co.chat(
    message=CANDIDATE_REFLECTION_PROMPT,
    preamble="You are a skilled AI chatbot that helps recruiters at Cohere find suitable applicants during the screening process. You assess resumes and additional information to evaluate if applicants are a good fit for the available positions."
)

print(response.text)
