In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
# urls = [
#     "https://openreview.net/pdf?id=5atraF1tbg",
#     "https://openreview.net/pdf?id=YaEozn3y0G",
#     "https://openreview.net/pdf?id=P6NcRPb13w",
# ]

papers = [
    "privacy.pdf",
    "ml_topo.pdf",
    "ml4.pdf",
]

In [4]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: privacy.pdf
Getting tools for paper: ml_topo.pdf


Ignoring wrong pointing object 209 0 (offset 0)
Ignoring wrong pointing object 355 0 (offset 0)


Getting tools for paper: ml4.pdf


In [5]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [6]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [7]:
len(initial_tools)

6

In [8]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [9]:
response = agent.query(
    "Tell me about the Topology used in Machine Learning, "
    "and then tell me about privacy"
)

Added user message to memory: Tell me about the Topology used in Machine Learning, and then tell me about privacy
=== Calling Function ===
Calling function: vector_tool_ml_topo with args: {"query": "Topology in Machine Learning"}
=== Function Output ===
Topology in machine learning is utilized in the proposed framework for enhanced fine-mapping in whole-genome bacterial studies. The framework incorporates genomic context derived from graph-structured data, specifically based on the compacted de Bruijn graph for an assembled pangenome. This approach aims to improve control for population structure and enhance interpretability by leveraging unique mappings between the encoded feature space and sequential representations that tag specific genomic loci.
=== Calling Function ===
Calling function: vector_tool_privacy with args: {"query": "Privacy"}
=== Function Output ===
The concept of privacy in the context provided relates to the dependency between the model's output and the data it was t

In [10]:
response = agent.query("Give me a summary of both Adjusting Machine Learning and Topology in Machine Learning")
print(str(response))

Added user message to memory: Give me a summary of both Adjusting Machine Learning and Topology in Machine Learning
=== Calling Function ===
Calling function: summary_tool_ml4 with args: {"input": "Adjusting Machine Learning"}
=== Function Output ===
Adjusting Machine Learning involves modifying existing ML decision-makers to produce fair algorithmic decisions that meet criteria such as equal counterfactual opportunity (eco) and counterfactual fairness (cf). These adjustments aim to correct biases in historical decisions while maintaining fidelity to the original data, ensuring fairness without the need for retraining the ML model. By incorporating causal models and criteria, the adjustments strive to provide fair decisions while preserving accuracy and addressing issues of bias and discrimination in decision-making processes.
=== Calling Function ===
Calling function: summary_tool_ml_topo with args: {"input": "Topology in Machine Learning"}
=== Function Output ===
Topology in Machine 

In [11]:
# urls = [
#     "https://openreview.net/pdf?id=VtmBAGCN7o",
#     "https://openreview.net/pdf?id=6PmJoRfdaK",
#     "https://openreview.net/pdf?id=LzPWWPAdY4",
#     "https://openreview.net/pdf?id=VTF8yNQM66",
#     "https://openreview.net/pdf?id=hSyW5go0v8",
#     "https://openreview.net/pdf?id=9WD9KwssyT",
#     "https://openreview.net/pdf?id=yV6fD7LYkF",
#     "https://openreview.net/pdf?id=hnrB5YHoYu",
#     "https://openreview.net/pdf?id=WbWtOYIzIK",
#     "https://openreview.net/pdf?id=c5pwL0Soay",
#     "https://openreview.net/pdf?id=TpD2aG1h0D"
# ]

papers = [
    "ml2.pdf",
    "ml3.pdf",
    "ml4.pdf",
    "ML_blockchain.pdf",
    "ML_packages.pdf",
    "ML_pipelines.pdf",
    "ml_topo.pdf",
    "privacy.pdf",
    "MachineLearning.pdf",
    "memorization.pdf",
    "ai.pdf"
]

In [12]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: ml2.pdf
Getting tools for paper: ml3.pdf


Ignoring wrong pointing object 209 0 (offset 0)
Ignoring wrong pointing object 355 0 (offset 0)


Getting tools for paper: ml4.pdf
Getting tools for paper: ML_blockchain.pdf
Getting tools for paper: ML_packages.pdf
Getting tools for paper: ML_pipelines.pdf
Getting tools for paper: ml_topo.pdf
Getting tools for paper: privacy.pdf
Getting tools for paper: MachineLearning.pdf
Getting tools for paper: memorization.pdf
Getting tools for paper: ai.pdf


In [13]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [14]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [15]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [16]:
tools = obj_retriever.retrieve(
    "Tell me about the formal definition of memorisation used in Machine Learning and Regularization"
)

In [17]:
tools[2].metadata

ToolMetadata(description='Useful for summarization questions related to ml4', name='summary_tool_ml4', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [18]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [19]:
response = agent.query(
    "Tell me about the formal definition of memorisation used" 
    "in Machine Learning and Regularization"
)
print(str(response))

Added user message to memory: Tell me about the formal definition of memorisation usedin Machine Learning and Regularization
=== Calling Function ===
Calling function: summary_tool_memorization with args: {"input": "Formal definition of memorisation in Machine Learning"}
=== Function Output ===
The formal definition of memorisation in Machine Learning is the impact a particular sample has on its own prediction, known as self-influence. It is quantified as the difference in performance on the sample when it is included in the training dataset compared to when it is not included in the dataset. This definition is context agnostic and can be applied to various learning settings by selecting a suitable performance metric.
=== Calling Function ===
Calling function: summary_tool_MachineLearning with args: {"input": "Formal definition of Regularization"}
=== Function Output ===
Regularization is a method in machine learning that involves adding a penalty term to the model's loss function. Thi

In [20]:
response = agent.query(
    "Logistic Regression in Machine Learning"
    "Analyze the abstract in each paper first. "
)

Added user message to memory: Logistic Regression in Machine LearningAnalyze the abstract in each paper first. 
=== Calling Function ===
Calling function: summary_tool_MachineLearning with args: {"input": "Logistic Regression in Machine Learning"}
=== Function Output ===
Logistic regression in machine learning is commonly used to predict paper acceptance in academic conferences based on factors such as average reviewer scores, author reputation, institutional bias, and visibility on platforms like arXiv. By analyzing these variables, logistic regression models provide insights into the decision-making process of paper reviews, identifying biases, correlations with impact, and trends over time. The models aim to understand how different factors influence the acceptance decisions, allowing for a more informed and data-driven approach to the review process.
=== Calling Function ===
Calling function: summary_tool_ml4 with args: {"input": "Logistic Regression in Machine Learning"}
=== Funct