In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
job_desc = """Job Overview:

We are seeking a Machine Learning Scientist to join our Discovery Science ML team at Coursera, focusing on creating the next generation of personalised search and recommender  systems. The candidate will play an instrumental role in researching and developing state-of-the-art techniques for relevant, personalized, and context-aware search and recommendations — redefining the learning experience on our platform. In addition to helping build a robust IR system, this role requires keeping abreast of emerging trends and innovations in machine learning, information retrieval,  and online education.

Responsibilities:

Design, develop, deploy, and maintain advanced recommendations ranking models, leveraging machine learning techniques such as two tower models, natural language processing (NLP), label collection, learning-to-rank, user behavior analysis, & LLMs
Collaborate with cross-functional teams to align research goals with business needs and ensure successful deployment of innovative solutions into production.
Build and manage large-scale datasets, including corpora, relevance labels, and user interactions, utilizing tools and techniques for data collection, cleaning, and preprocessing.
Conduct thorough evaluations of recommendations  models using industry-standard metrics, analyze results, and provide insights for model improvement and business strategy.
Stay up-to-date with the latest trends in ML, recommender systems, search science, and information retrieval, frequently attending conferences, workshops, and engaging in collaborative research projects.
Contribute to Coursera's research efforts by publishing in top-tier conferences such as SIGIR, WWW, CIKM, and similar venues.
Basic Qualifications:

PhD or Master's degree in Computer Science, Information Retrieval, or closely related fields.
Demonstrated experience in developing advanced recommendations  models, incorporating techniques like natural language processing (NLP) and learning-to-rank algorithms.
Familiarity with information retrieval metrics, evaluation methodologies, and scalable search system architecture.
Track record of publishing research in top-tier conferences such as SIGIR, EMNLP, WWW, CIKM, or similar venues.
Preferred Qualifications:

Proficiency in programming languages and deep learning frameworks such as Python, TensorFlow, or PyTorch.
Experience in working with large-scale datasets and tools for data collection, cleaning, and preprocessing.
Familiarity with ML deployment in production environments and tools for version control, such as Git.
Proven ability to stay current with emerging research and technologies in the ML and recommendations domain.
Experience with MLOps, ML engineering
Experience collaborating with cross-functional teams and excellent communication abilities.
Passion for driving impact in the field of online education through innovative ML and recommendations techniques.
Familiarity with Coursera's platform and course offerings, as well as active participation in wider AI and Machine Learning communities, is a plus.
Familiarity with data science concepts, including the ability to design, implement, and analyze A/B tests in an online environment to optimize product performance and user experience."""

In [3]:
import os
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import SummaryIndex, VectorStoreIndex, Document
from llama_index.core.tools import QueryEngineTool
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import ParallelAgentRunner, AgentRunner

In [4]:
llm = OpenAI(model="gpt-4o-mini")
Settings.llm = llm
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

In [5]:
# load documents
documents = SimpleDirectoryReader(input_files=["full cv.pdf"]).load_data()
splitter = SentenceSplitter(chunk_size=1024)
cv_nodes = splitter.get_nodes_from_documents(documents)

documents = [Document(text=job_desc)]
splitter = SentenceSplitter(chunk_size=1024)
job_nodes = splitter.get_nodes_from_documents(documents)

In [6]:
cv_vector_index = VectorStoreIndex(cv_nodes)
cv_vector_query_engine = cv_vector_index.as_query_engine()
cv_vector_tool = QueryEngineTool.from_defaults(
    name="cv_vector_tool",
    query_engine=cv_vector_query_engine,
    description=(
        "Useful for retrieving specific context about user's CV/resume."
    ),
)

job_vector_index = VectorStoreIndex(job_nodes)
job_vector_query_engine = job_vector_index.as_query_engine()
job_vector_tool = QueryEngineTool.from_defaults(
    name="job_vector_tool",
    query_engine=job_vector_query_engine,
    description=(
        "Useful for retrieving specific context about job posting user is applying to."
    ),
)

In [7]:
cv_summary_index = SummaryIndex(cv_nodes)
cv_summary_query_engine = cv_summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
cv_summary_tool = QueryEngineTool.from_defaults(
    name="cv_summary_tool",
    query_engine=cv_summary_query_engine,
    description=(
        "Useful for summarization questions related to user's CV/resume."
    ),
)

job_summary_index = SummaryIndex(job_nodes)
job_summary_query_engine = job_summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
job_summary_tool = QueryEngineTool.from_defaults(
    name="job_summary_tool",
    query_engine=job_summary_query_engine,
    description=(
        "Useful for summarization questions related to job posting user is applying to."
    ),
)

In [8]:
prompt="Tailor the user's CV (user's CV summary provided in cv_summary_tool and specific questions about user's CV answered using cv_vector_tool) to be the best possible fit for the job description (job summary provided in job_summary_tool and specific questions about job answered using job_vector_tool). Do not invent information not in the user's CV."


# Baseline

In [9]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [
        cv_vector_tool,
        job_vector_tool,
        cv_summary_tool,
        job_summary_tool,
    ],
    llm=llm,
    verbose=True,
)
agent = AgentRunner(agent_worker, verbose=True)
response = agent.query(prompt)

> Running step df97e605-d5b6-44a9-ad7b-12b84f7f5599. Step input: Tailor the user's CV (user's CV summary provided in cv_summary_tool and specific questions about user's CV answered using cv_vector_tool) to be the best possible fit for the job description (job summary provided in job_summary_tool and specific questions about job answered using job_vector_tool). Do not invent information not in the user's CV.
Added user message to memory: Tailor the user's CV (user's CV summary provided in cv_summary_tool and specific questions about user's CV answered using cv_vector_tool) to be the best possible fit for the job description (job summary provided in job_summary_tool and specific questions about job answered using job_vector_tool). Do not invent information not in the user's CV.
=== Calling Function ===
Calling function: cv_summary_tool with args: {"input": "Please provide a summary of the user's CV."}
=== Function Output ===
The user has a strong background in software engineering and da

SyntaxError: 'return' outside function (3287600079.py, line 13)

# Compact response mode

In [12]:
from llama_index.core.query_engine import RetrieverQueryEngine

In [13]:
cv_summary_compact_query_engine = RetrieverQueryEngine.from_args(
    retriever=cv_summary_index.as_retriever(), response_mode="compact"
)
cv_summary_tool_c = QueryEngineTool.from_defaults(
    name="cv_summary_tool",
    query_engine=cv_summary_compact_query_engine,
    description=(
        "Useful for summarization questions related to user's CV/resume."
    ),
)

job_summary_compact_query_engine = RetrieverQueryEngine.from_args(
    retriever=job_summary_index.as_retriever(), response_mode="compact"
)
job_summary_tool_c = QueryEngineTool.from_defaults(
    name="job_summary_tool",
    query_engine=job_summary_compact_query_engine,
    description=(
        "Useful for summarization questions related to job posting user is applying to."
    ),
)

cv_vector_compact_query_engine = RetrieverQueryEngine.from_args(
    retriever=cv_vector_index.as_retriever(), response_mode="compact"
)
cv_vector_tool_c = QueryEngineTool.from_defaults(
    name="cv_vector_tool",
    query_engine=cv_vector_compact_query_engine,
    description=(
        "Useful for retrieving specific context about user's CV/resume."
    ),
)

job_vector_compact_query_engine = RetrieverQueryEngine.from_args(
    retriever=job_vector_index.as_retriever(), response_mode="compact"
)
job_vector_tool_c = QueryEngineTool.from_defaults(
    name="job_vector_tool",
    query_engine=job_vector_compact_query_engine,
    description=(
        "Useful for retrieving specific context about job posting user is applying to."
    ),
)

In [14]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [
        cv_vector_tool_c,
        job_vector_tool_c,
        cv_summary_tool_c,
        job_summary_tool_c,
    ],
    llm=llm,
    verbose=True,
)
agent = AgentRunner(agent_worker, verbose=True)
response = agent.query(prompt)

> Running step 001e5762-ec94-4ece-ae3d-dce19fd65238. Step input: Tailor the user's CV (user's CV summary provided in cv_summary_tool and specific questions about user's CV answered using cv_vector_tool) to be the best possible fit for the job description (job summary provided in job_summary_tool and specific questions about job answered using job_vector_tool). Do not invent information not in the user's CV.
Added user message to memory: Tailor the user's CV (user's CV summary provided in cv_summary_tool and specific questions about user's CV answered using cv_vector_tool) to be the best possible fit for the job description (job summary provided in job_summary_tool and specific questions about job answered using job_vector_tool). Do not invent information not in the user's CV.
=== Calling Function ===
Calling function: cv_summary_tool with args: {"input": "Please summarize the user's CV."}
=== Function Output ===
The user has a strong background in software engineering and data science,

'To tailor your CV for the Machine Learning Scientist position at Coursera, we will emphasize your relevant skills, experiences, and academic background that align with the job description. Here’s how you can adjust your CV:\n\n### CV Tailoring Suggestions\n\n1. **Objective Statement**:\n   - Include a brief objective that highlights your passion for enhancing online education through innovative machine learning techniques.\n\n2. **Education**:\n   - Emphasize your MSc in Computer Science with a focus on Intelligent Systems, mentioning any relevant coursework or projects related to machine learning and natural language processing (NLP).\n\n3. **Technical Skills**:\n   - Highlight your proficiency in Python, TensorFlow, and PyTorch, as these are essential for the role.\n   - Mention your experience with large language models (LLMs) and any specific libraries or frameworks you have used for developing recommendation systems.\n\n4. **Professional Experience**:\n   - **LivePerson**:\n     

## same result

# Context Augmented Agent

In [10]:
from llama_index.core import Document
from llama_index.agent.openai_legacy import ContextRetrieverOpenAIAgent

ModuleNotFoundError: No module named 'llama_index.agent.openai_legacy'

## Deprecated

# React Agent

In [22]:
context_prompt="Use the user's CV to provide a tailored CV (user's CV summary provided in cv_summary_tool and specific questions about user's CV answered using cv_vector_tool) to be the best possible fit for the job description (job description provided in context). Do not invent information not in the user's CV."


In [23]:
from llama_index.core.agent import ReActAgent

In [24]:
react_agent = ReActAgent.from_tools(
    [
        cv_vector_tool,
        cv_summary_tool
    ],
    llm=llm,
    verbose=True,
    context=job_desc
)

In [25]:
response = react_agent.query(context_prompt)

> Running step 316c0ddd-0c89-4bb7-8c2f-e816433de996. Step input: Use the user's CV to provide a tailored CV (user's CV summary provided in cv_summary_tool and specific questions about user's CV answered using cv_vector_tool) to be the best possible fit for the job description (job description provided in context). Do not invent information not in the user's CV.
[1;3;38;5;200mThought: I need to retrieve the user's CV summary to tailor it according to the job description provided. I'll use the cv_summary_tool for this purpose.
Action: cv_summary_tool
Action Input: {'input': 'Please provide the summary of my CV.'}
[0m[1;3;34mObservation: The CV outlines a professional journey in software engineering and data science, highlighting extensive experience in developing and implementing advanced technologies, particularly in natural language processing and data management. 

Currently, as a Software Engineer at LivePerson, the focus is on generating customer support knowledge articles using 

## ReAct agent seems to work better and more logically. However I could not find a combination of ReAct agent and parallel agent, perhaps because the reasoning is sequential.

# Multi Step Query Engine

In [27]:
from llama_index.core.indices.query.query_transform.base import (
    StepDecomposeQueryTransform,
)

step_decompose_transform = StepDecomposeQueryTransform(llm=llm, verbose=True)

## React

In [30]:
from llama_index.core.query_engine import MultiStepQueryEngine

cv_vector_ms_qe = MultiStepQueryEngine(
    query_engine=cv_vector_index.as_query_engine(),
    query_transform=step_decompose_transform,
    index_summary="Used to retrieve specific context about CV.",
)
cv_vector_ms_tool = QueryEngineTool.from_defaults(
    name="cv_vector_ms_tool",
    query_engine=cv_vector_ms_qe,
    description=(
        "Used to retrieve specific context about CV."
    ),
)

cv_summary_ms_qe = MultiStepQueryEngine(
    query_engine=cv_summary_index.as_query_engine(),
    query_transform=step_decompose_transform,
    index_summary="Used for summarisation questions about CV.",
)
cv_summary_ms_tool = QueryEngineTool.from_defaults(
    name="cv_summary_ms_tool",
    query_engine=cv_summary_ms_qe,
    description=(
        "Used for summarisation questions about CV."
    ),
)

In [92]:
react_ms_agent = ReActAgent.from_tools(
    [
        cv_summary_ms_tool,
        cv_vector_ms_tool
    ],
    llm=llm,
    verbose=True,
    context=job_desc
)

In [93]:
ms_context_prompt="Tailor the user's CV (summary of user's cv is provided in cv_summary_ms_tool and specific questions about user's cv answered using cv_vector_ms_tool) to be the best possible fit for the job description (job description is provided as context). Do not invent information not in the user's CV."


In [34]:
react_ms_response = react_ms_agent.query(ms_context_prompt)

> Running step 1f447f2b-7e09-406a-add3-4c59bf009030. Step input: Tailor the user's CV (summary of user's cv is provided in cv_summary_ms_tool and specific questions about user's cv answered using cv_vector_ms_tool) to be the best possible fit for the job description (job description is provided as context). Do not invent information not in the user's CV.
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: cv_summary_ms_tool
Action Input: {'input': "Please provide a summary of the user's CV."}
[0m[1;3;33m> Current query: Please provide a summary of the user's CV.
[0m[1;3;38;5;200m> New query: What key experiences are highlighted in the user's CV?
[0m[1;3;33m> Current query: Please provide a summary of the user's CV.
[0m[1;3;38;5;200m> New query: What key experiences are highlighted in the user's CV?
[0m[1;3;33m> Current query: Please provide a summary of the user's CV.
[0m[1;3;38;5;200m> New query

## Low level API

In [102]:
react_ms_agent.reset()

In [104]:
task = react_ms_agent.create_task(ms_context_prompt)

In [114]:
step_output = react_ms_agent.run_step(task.task_id)

> Running step e9f1c931-038d-4a5f-8a1f-4e87c0f55b7e. Step input: None
[1;3;34mObservation: Error: Could not parse output. Please follow the thought-action-input format. Try again.
[0m

In [116]:
step_output.is_last

False

In [188]:
step_output = agent.finalize_response(task.task_id)

## Compare with normal query enginer

In [37]:
str(cv_vector_query_engine.query("What are the user's key skills and experiences related to machine learning, information retrieval, and recommender systems?"))

"The user has a solid foundation in machine learning, having studied various techniques such as supervised learning, unsupervised learning, neural networks, model evaluation, and optimization. They have also engaged in practical projects, including a narrative recommender system that utilized a Neo4J database to generate recommendations based on user activity and narrative attributes.\n\nIn the realm of information retrieval, the user has participated in advanced training focused on improving the relevancy of retrieved results, recognizing poor query results, and employing large language models (LLMs) to enhance queries. They have also learned about autonomous agent systems that intelligently navigate and analyze data, which is crucial for effective information retrieval.\n\nAdditionally, their professional development includes courses on generative AI and the application of LLMs, which further supports their expertise in machine learning and information retrieval. Overall, the user's 

## using the vector index directly misses some of the projects; when the multi step query enginer breaks the question into the 3 separate topics, propaganda detector and information search on news articles are also retrieved

## Agent Runner

In [40]:
from llama_index.core.query_engine import MultiStepQueryEngine

job_vector_ms_qe = MultiStepQueryEngine(
    query_engine=job_vector_index.as_query_engine(),
    query_transform=step_decompose_transform,
    index_summary="Used to retrieve specific context about job.",
)
job_vector_ms_tool = QueryEngineTool.from_defaults(
    name="job_vector_ms_tool",
    query_engine=job_vector_ms_qe,
    description=(
        "Used to retrieve specific context about job."
    ),
)

job_summary_ms_qe = MultiStepQueryEngine(
    query_engine=job_summary_index.as_query_engine(),
    query_transform=step_decompose_transform,
    index_summary="Used for summarisation questions about job.",
)
job_summary_ms_tool = QueryEngineTool.from_defaults(
    name="job_summary_ms_tool",
    query_engine=job_summary_ms_qe,
    description=(
        "Used for summarisation questions about job."
    ),
)

In [44]:
ms_prompt = "Tailor the user's CV (User's CV summary provided in cv_summary_ms_tool and specific questions about user's CV answered using cv_vector_ms_tool) to be the best possible fit for the job description (job summary provided in job_summary_ms_tool and specific questions about job answered using job_vector_ms_tool). Do not invent information not in the user's CV."

In [45]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [
        job_summary_ms_tool,
        cv_summary_ms_tool,
        job_vector_ms_tool,
        cv_vector_ms_tool
    ],
    llm=llm,
    verbose=True,
)
ms_agent = AgentRunner(agent_worker, verbose=True)
ms_response = agent.query(ms_prompt)

> Running step c4dcb26a-658a-434b-ac60-0d8d9929ffff. Step input: Tailor the user's CV (User's CV summary provided in cv_summary_ms_tool and specific questions about user's CV answered using cv_vector_ms_tool) to be the best possible fit for the job description (job summary provided in job_summary_ms_tool and specific questions about job answered using job_vector_ms_tool). Do not invent information not in the user's CV. Use the tools to get information about CV and job.
[1;3;38;5;200mThought: I need to gather information about the user's CV and the job description to tailor the CV effectively. I'll start by using the cv_summary_ms_tool to get a summary of the user's CV.
Action: cv_summary_ms_tool
Action Input: {'input': "User's CV summary"}
[0m[1;3;33m> Current query: User's CV summary
[0m[1;3;38;5;200m> New query: None
[0m[1;3;34mObservation: Empty Response
[0m> Running step dee1937a-bffe-45c5-b8b1-d935c5d9a273. Step input: None
[1;3;38;5;200mThought: It seems that I received 


KeyboardInterrupt



## Steps are less logical than ReAct agent. Multi step query engine seems to help but needs to be asked specifically about 'users cv summary' instead of 'cv summary'