In [2]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.ERROR)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

### Part 1: Make two knowledge bases. 

One specific to Tiara's skills and qualifications (resume and CV). One specific to roles that are good matches (job descriptions).

Large language model: Llama 3.1 8b
Embedding model: BAAI/bge-large-en-v1.5

In [5]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [6]:
from llama_index.llms.ollama import Ollama

In [7]:
# load documents

documents = SimpleDirectoryReader("data_tiara").load_data()

In [11]:
# set embedding model
# according to LangChain, "BGE models on the HuggingFace are the best open-source embedding models."

Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5")

In [12]:
# ollama to set model to llama3.1 8b

Settings.llm = Ollama(model="llama3.1:8b-instruct-q4_0", request_timeout=360.0)

In [15]:
print(Settings.chunk_size, Settings.chunk_overlap)

1024 200


In [17]:
Settings.chunk_size = 256
Settings.chunk_overlap = 50

In [19]:
# make vector database for document 

index_tiara = VectorStoreIndex.from_documents(
    documents,
)

In [33]:
# use vector database as reference

query_engine = index_tiara.as_query_engine()
response = query_engine.query("What did Tiara study?")
print(response)

Pathobiology and Molecular Medicine.


In [35]:
response = query_engine.query("Where did Tiara go for undergrad?")
print(response)

Michigan State University.


In [37]:
# load documents for the 2nd knowledge base
# make vector database

docs2 = SimpleDirectoryReader("data_jd").load_data()
index_jd = VectorStoreIndex.from_documents(docs2,)

In [39]:
# check 2nd index

query_engine = index_jd.as_query_engine()
response = query_engine.query("What are five most important skills for a clinical data scientist?")
print(response)

Data wrangling and cleaning, experience working with complex or nuanced datasets, fluency in Python and its essential data science tools (numpy, pandas), demonstrated experience in collaborative software development, and experience with clinical datasets in applied machine learning applications.


### Part 2: Set up multi-document agent.

Reference: https://docs.llamaindex.ai/en/stable/examples/agent/multi_document_agents/#building-multi-document-agents

In [41]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent

# ReAct agent does both reasoning and acting

In [43]:
# set up a different engine for each knowledge base

tiara_engine = index_tiara.as_query_engine()
jobs_engine = index_jd.as_query_engine()

In [45]:
# set up query engine tools

query_engine_tools = [
    QueryEngineTool(
        query_engine=tiara_engine,
        metadata=ToolMetadata(
            name="tiara_quals",
            description=(
                "Provides information about Tiara's professional skills and qualifications."
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=jobs_engine,
        metadata=ToolMetadata(
            name="job_descriptions",
            description=(
                "Provides information about jobs that match Tiara's skills, qualifications, and preferences based on industry and location."
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

In [47]:
# Add context (add later)

context = """ \
    You are a recruiter trying to find roles that match Tiara's skills, qualifications, and preferences.\
    You MUST use at least one of the tools provided when answering a question.\
"""

# set up the agent

llm = Ollama(model="llama3.1:8b-instruct-q4_0")

agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    context=context
)

In [49]:
response = agent.chat("Is Tiara qualified for a job that requires twenty years of experience in software engineering?")
print('*****')
print(str(response))

> Running step 3f007c20-cfa5-4125-ae0d-d864c52c3112. Step input: Is Tiara qualified for a job that requires twenty years of experience in software engineering?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: tiara_quals
Action Input: {'input': "What are Tiara's professional skills and qualifications?"}
[0m[1;3;34mObservation: Tiara Rinjani Ahmad is a highly skilled data analyst with expertise in research, analysis, and portfolio management. She has experience working in pharmaceutical industry, biomedical research, and philanthropy settings. Her key skills include:

* Team-oriented work style
* Resourcefulness and adaptability
* Ability to learn new techniques quickly
* Experience with predictive modeling using patient-level data
* Strong literature research and synthesis skills
* Effective communication of complex information to non-medical audiences

In terms of qualifications, Tiara holds a Ph.D. in