In [1]:

# Part 1: LangGraph - Building Cyclic Applications with LangChain

# LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.
# Why Cycles?

# In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

# Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.
# Why LangGraph?

# Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!
# Task 1: Dependencies

# We'll first install all our required libraries.

# !pip install -qU langchain langchain_openai langchain-community langgraph arxiv duckduckgo_search==5.3.1b1

# Task 2: Environment Variables

# We'll want to set both our OpenAI API key and our LangSmith environment variables.

# import os
# import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

# from uuid import uuid4

# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_PROJECT"] = f"- LangGraph - {uuid4().hex[0:8]}"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")



In [2]:
# Task 3: Creating our Tool Belt
# We'll leverage:

#     Duck Duck Go Web Search
#     Arxiv
# from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
# from langchain_community.tools.arxiv.tool import ArxivQueryRun

# tool_belt = [

#     DuckDuckGoSearchRun(),
#     ArxivQueryRun()
# ]

# tool_belt



In [3]:
# Model

# Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not necessary to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

#     OpenAI's GPT-3.5 and GPT-4
#     Anthropic's Claude
#     Google's Gemini
# from langchain_openai import ChatOpenAI

# model = ChatOpenAI(model="gpt-4o", temperature=0)

# Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

# model = model.bind_tools(tool_belt)



In [4]:
# How does the model determine which tool to use?

# ANSWER

#     Automatic Tool Selection Based on Input Parsing

#     The system (often a language model or an agent framework) parses the user’s input and determines which tool is most relevant based on the context, keywords, or the specific nature of the query.
#     For example, if the user’s input involves searching for general information on the web, the system might recognize that and select DuckDuckGoSearchRun.
#     If the input is related to searching academic papers or scientific literature, it might select ArxivQueryRun.

#     Explicit Tool Invocation

#     In some implementations, the developer might explicitly define when to use a particular tool based on conditions or user inputs. This might be done through conditional logic in the code.
#     For instance, if the input contains the term “arXiv” or something indicative of academic research, the code might route the query to ArxivQueryRun. Otherwise, it might default to DuckDuckGoSearchRun.

#     Decision-Making Logic

#     There might be decision-making logic that ranks tools based on their relevance to the query or scores them according to how well they match the user’s needs. The tool with the highest score is then selected.

# Here's some example selection logic:

# def select_tool(user_input, tool_belt):
#     if "arxiv" in user_input.lower():
#         return tool_belt[1]  # ArxivQueryRun
#     else:
#         return tool_belt[0]  # DuckDuckGoSearchRun

# selected_tool = select_tool(user_query, tool_belt)
# selected_tool.run(user_query)

# In this example, if the input is relevant to academic research (e.g., contains “arxiv”), ArxivQueryRun is selected. Oherwise, DuckDuckGoSearchRun is used by default.
# Task 4: Putting the State in Stateful

# Earlier we used this phrasing:

# coordinated multi-actor and stateful applications

# So what does that "stateful" mean?

# To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

# LangGraph leverages a StatefulGraph which uses an AgentState object to pass information between the various nodes of the graph.

# There are more options than what we'll see below - but this AgentState object is one that is stored in a TypedDict with the key messages and the value is a Sequence of BaseMessages that will be appended to whenever the state changes.

# Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

#     We initialize our state object:

#     {"messages" : []}

#     Our user submits a query to our application.

#     New State: HumanMessage(#1)
#     {"messages" : [HumanMessage(#1)}

#     We pass our state object to an Agent node which is able to read the current state. It will use the last HumanMessage as input. It gets some kind of output which it will add to the state.

#     New State: AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})
#     {"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}

#     We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

# from typing import TypedDict, Annotated
# from langgraph.graph.message import add_messages
# import operator
# from langchain_core.messages import BaseMessage

# class AgentState(TypedDict):
#   messages: Annotated[list, add_messages]



In [5]:

# Task 5: It's Graphing Time!

# Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

# Let's take a second to refresh ourselves about what a graph is in this context.

# Graphs, also called networks in some circles, are a collection of connected objects.

# The objects in question are typically called nodes, or vertices, and the connections are called edges.

# Let's look at a simple graph.

# image

# Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

# If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

# If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

# Let's create some nodes and expand on our diagram.

#     NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

# from langgraph.prebuilt import ToolNode

# def call_model(state):
#   messages = state["messages"]
#   response = model.invoke(messages)
#   return {"messages" : [response]}

# tool_node = ToolNode(tool_belt)

# Now we have two total nodes. We have:

#     call_model is a node that will...well...call the model
#     tool_node is a node which can call a tool

# Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!

# from langgraph.graph import StateGraph, END

# uncompiled_graph = StateGraph(AgentState)

# uncompiled_graph.add_node("agent", call_model)
# uncompiled_graph.add_node("action", tool_node)

# Let's look at what we have so far:

# image

# Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

# uncompiled_graph.set_entry_point("agent")

# image

# Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

# We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

# Notice how our function simply checks if there is a "function_call" kwarg present.

# Then we create an edge where the origin node is our agent node and our destination node is either the action node or the END (finish the graph).

# It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case should_continue outputs either "end" or "continue" which are subsequently mapped to the action node or the END node.

# def should_continue(state):
#   last_message = state["messages"][-1]

#   if last_message.tool_calls:
#     return "action"

#   return END

# uncompiled_graph.add_conditional_edges(
#     "agent",
#     should_continue
# )

# Let's visualize what this looks like.

# image

# Finally, we can add our last edge which will connect our action node to our agent node. This is because we always want our action node (which is used to call our tools) to return its output to our agent!

# uncompiled_graph.add_edge("action", "agent")

# Let's look at the final visualization.

# image

# All that's left to do now is to compile our workflow - and we're off!

# compiled_graph = uncompiled_graph.compile()



In [6]:

# Using Our Graph

# Now that we've created and compiled our graph - we can call it just as we'd call any other Runnable!

# Let's try out a few examples to see how it fairs:

# from langchain_core.messages import HumanMessage

# inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}

# async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
#     for node, values in chunk.items():
#         print(f"Receiving update from node: '{node}'")
#         print(values["messages"])
#         print("\n\n")

# Receiving update from node: 'agent'
# [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_Zax1N7jhXQyRQ4pH884lDDvW', 'function': {'arguments': '{"query":"current captain of the Winnipeg Jets 2023"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 156, 'total_tokens': 181}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-cc69bf54-a9fd-476a-9f35-2578df67fee4-0', tool_calls=[{'name': 'duckduckgo_search', 'args': {'query': 'current captain of the Winnipeg Jets 2023'}, 'id': 'call_Zax1N7jhXQyRQ4pH884lDDvW', 'type': 'tool_call'}], usage_metadata={'input_tokens': 156, 'output_tokens': 25, 'total_tokens': 181})]



# Receiving update from node: 'action'
# [ToolMessage(content="Posted September 12, 2023 9:29 am. Centre Adam Lowry was named the Winnipeg Jets new captain on Tuesday. Lowry is the third Jets captain since the team moved from Atlanta to Winnipeg in 2011. He follows Andrew Ladd and Blake Wheeler, who served as captain for five and six years respectively. Adam Lowry was named captain of the Winnipeg Jets on Tuesday. ... Sep 20, 2023. Latest News. Inside look at Vegas Golden Knights Aug 30, 2024. Vegas Golden Knights fantasy projections for 2024-25 When the Jets drafted Lowry in 2011's second round, the team's scouts raved about him as captain material. ... — Winnipeg Jets (@NHLJets) September 12, 2023. Cheveldayoff made his case. Lowry will follow Andrew Ladd and Blake Wheeler to serve as the third captain of the new Winnipeg Jets franchise. - Sep 12, 2023. After a season without a captain, the Winnipeg Jets have named ... September 12, 2023. There are not many honours in team sports bigger than being named captain. That honour was given to Winnipeg Jet forward Adam Lowry officially Tuesday morning as he becomes the ...", name='duckduckgo_search', tool_call_id='call_Zax1N7jhXQyRQ4pH884lDDvW')]



# Receiving update from node: 'agent'
# [AIMessage(content='As of September 12, 2023, Adam Lowry is the current captain of the Winnipeg Jets.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 445, 'total_tokens': 468}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'stop', 'logprobs': None}, id='run-cca535ca-accb-4ce1-b1c5-91937d7c3f2a-0', usage_metadata={'input_tokens': 445, 'output_tokens': 23, 'total_tokens': 468})]



# Let's look at what happened:

#     Our state object was populated with our request
#     The state object was passed into our entry point (agent node) and the agent node added an AIMessage to the state object and passed it along the conditional edge
#     The conditional edge received the state object, found the "tool_calls" additional_kwarg, and sent the state object to the action node
#     The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
#     The agent node added a response to the state object and passed it along the conditional edge
#     The conditional edge received the state object, could not find the "tool_calls" additional_kwarg and passed the state object to END where we see it output in the cell above!

# Now let's look at an example that shows a multiple tool usage - all with the same flow!

# inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo.")]}

# async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
#     for node, values in chunk.items():
#         print(f"Receiving update from node: '{node}'")
#         if node == "action":
#           print(f"Tool Used: {values['messages'][0].name}")
#         print(values["messages"])

#         print("\n\n")

# Receiving update from node: 'agent'
# [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_0ih8d3QwfztR5bRKJfuhGaB3', 'function': {'arguments': '{"query": "QLoRA"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_KMGRJ9mUM1lS6TVIgSWLVo35', 'function': {'arguments': '{"query": "latest tweet Tim Dettmers"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_h7vwN3bVsfBtQDezgGz4c86A', 'function': {'arguments': '{"query": "latest tweet Mike Lewis"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_0OngtYNzyp3Q6LvAFDbt0kiY', 'function': {'arguments': '{"query": "latest tweet Sam Shleifer"}', 'name': 'duckduckgo_search'}, 'type': 'function'}, {'id': 'call_9z392ANKJUl318Asr9eKbawX', 'function': {'arguments': '{"query": "latest tweet Luke Zettlemoyer"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 119, 'prompt_tokens': 173, 'total_tokens': 292}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-a0f558a6-ecc1-49b3-96d9-fe27f7524078-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'QLoRA'}, 'id': 'call_0ih8d3QwfztR5bRKJfuhGaB3', 'type': 'tool_call'}, {'name': 'duckduckgo_search', 'args': {'query': 'latest tweet Tim Dettmers'}, 'id': 'call_KMGRJ9mUM1lS6TVIgSWLVo35', 'type': 'tool_call'}, {'name': 'duckduckgo_search', 'args': {'query': 'latest tweet Mike Lewis'}, 'id': 'call_h7vwN3bVsfBtQDezgGz4c86A', 'type': 'tool_call'}, {'name': 'duckduckgo_search', 'args': {'query': 'latest tweet Sam Shleifer'}, 'id': 'call_0OngtYNzyp3Q6LvAFDbt0kiY', 'type': 'tool_call'}, {'name': 'duckduckgo_search', 'args': {'query': 'latest tweet Luke Zettlemoyer'}, 'id': 'call_9z392ANKJUl318Asr9eKbawX', 'type': 'tool_call'}], usage_metadata={'input_tokens': 173, 'output_tokens': 119, 'total_tokens': 292})]



# Receiving update from node: 'action'
# Tool Used: arxiv
# [ToolMessage(content='Published: 2023-05-23\nTitle: QLoRA: Efficient Finetuning of Quantized LLMs\nAuthors: Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer\nSummary: We present QLoRA, an efficient finetuning approach that reduces memory usage\nenough to finetune a 65B parameter model on a single 48GB GPU while preserving\nfull 16-bit finetuning task performance. QLoRA backpropagates gradients through\na frozen, 4-bit quantized pretrained language model into Low Rank\nAdapters~(LoRA). Our best model family, which we name Guanaco, outperforms all\nprevious openly released models on the Vicuna benchmark, reaching 99.3% of the\nperformance level of ChatGPT while only requiring 24 hours of finetuning on a\nsingle GPU. QLoRA introduces a number of innovations to save memory without\nsacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is\ninformation theoretically optimal for normally distributed weights (b) double\nquantization to reduce the average memory footprint by quantizing the\nquantization constants, and (c) paged optimziers to manage memory spikes. We\nuse QLoRA to finetune more than 1,000 models, providing a detailed analysis of\ninstruction following and chatbot performance across 8 instruction datasets,\nmultiple model types (LLaMA, T5), and model scales that would be infeasible to\nrun with regular finetuning (e.g. 33B and 65B parameter models). Our results\nshow that QLoRA finetuning on a small high-quality dataset leads to\nstate-of-the-art results, even when using smaller models than the previous\nSoTA. We provide a detailed analysis of chatbot performance based on both human\nand GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable\nalternative to human evaluation. Furthermore, we find that current chatbot\nbenchmarks are not trustworthy to accurately evaluate the performance levels of\nchatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to\nChatGPT. We release all of our models and code, including CUDA kernels for\n4-bit training.\n\nPublished: 2024-05-27\nTitle: Accurate LoRA-Finetuning Quantization of LLMs via Information Retention\nAuthors: Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, Michele Magno\nSummary: The LoRA-finetuning quantization of LLMs has been extensively studied to\nobtain accurate yet compact LLMs for deployment on resource-constrained\nhardware. However, existing methods cause the quantized LLM to severely degrade\nand even fail to benefit from the finetuning of LoRA. This paper proposes a\nnovel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate\nthrough information retention. The proposed IR-QLoRA mainly relies on two\ntechnologies derived from the perspective of unified information: (1)\nstatistics-based Information Calibration Quantization allows the quantized\nparameters of LLM to retain original information accurately; (2)\nfinetuning-based Information Elastic Connection makes LoRA utilizes elastic\nrepresentation transformation with diverse information. Comprehensive\nexperiments show that IR-QLoRA can significantly improve accuracy across LLaMA\nand LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4%\nimprovement on MMLU compared with the state-of-the-art methods. The significant\nperformance gain requires only a tiny 0.31% additional time consumption,\nrevealing the satisfactory efficiency of our IR-QLoRA. We highlight that\nIR-QLoRA enjoys excellent versatility, compatible with various frameworks\n(e.g., NormalFloat and Integer quantization) and brings general accuracy gains.\nThe code is available at https://github.com/htqin/ir-qlora.\n\nPublished: 2024-06-12\nTitle: Exploring Fact Memorization and Style Imitation in LLMs Using QLoRA: An Experimental Study and Quality Assessment Methods\nAuthors: Eugene Vyborov, Oleksiy Osypenko, Serge Sotnyk\nSummary: There are various methods for adapting LLMs to different domains. The most\ncommon methods are prompting, finetuning, and RAG. In this w', name='arxiv', tool_call_id='call_0ih8d3QwfztR5bRKJfuhGaB3'), ToolMessage(content='Tech Moves: AI researcher Yejin Choi leaves Univ. of Washington and Allen Institute for AI. by Todd Bishop & Taylor Soper on August 2, 2024. Yejin Choi, who was named a 2022 MacArthur Fellow and ... Magistrate Judge Thomas S. Hixson\'s protective order said the group of authors suing Meta for copyright infringement over its AI model cannot use the chat logs "and must return them to Meta or destroy them.". The authors, who include comedian Sarah Silverman and Pulitzer Prize-winner Andrew Sean Greer, argued that Meta waived its ... — Tim Dettmers is joining Ai2 as an AI researcher. Dettmers specializes in efficient deep learning at the intersection of machine learning, NLP, and computer systems with a focus on quantization ... Allen School Ph.D. student Tim Dettmers accepted the grand prize for QLoRA, a novel approach to finetuning pretrained models that significantly reduces the amount of GPU memory required — from over 780GB to less than 48GB — to finetune a 65B parameter model. With QLoRA, the largest publicly available models can be finetuned on a single ... Tim Dettmers, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev, ... (SpQR), a new compressed format and quantization technique that enables for the first time \\emph{near-lossless} compression of LLMs across model scales while reaching similar compression levels to previous methods. SpQR works by identifying and isolating \\emph{outlier ...', name='duckduckgo_search', tool_call_id='call_KMGRJ9mUM1lS6TVIgSWLVo35'), ToolMessage(content='By Christopher Beam. Oct. 6, 2023. A hike is Michael Lewis\'s interview format of choice. When he first met the FTX founder Sam Bankman-Fried in late 2021, he took the cargo-shorted chief ... Author Michael Lewis spent countless hours with FTX founder and alleged fraudster Sam Bankman-Fried. He details the cryptocurrency wunderkind\'s rise and fall in his new book, "Going Infinite.&quot; MCKINNEY — A "selfless servant" is how friends and family are remembering Michael Lewis, the North Texas pilot who died in a plane crash earlier this week in McKinney. Family members say Lewis ... Author Michael Lewis met with FTX founder Sam Bankman-Fried more than 100 times. Lewis breaks down the crypto superstar\'s rise and fall in his new book, "Going Infinite." By Andrew R. Chow. October 9, 2023 9:20 AM EDT. W hen the journalist Michael Lewis announced in May that he was writing a book on Sam Bankman-Fried after spending months with the FTX crypto mogul ...', name='duckduckgo_search', tool_call_id='call_h7vwN3bVsfBtQDezgGz4c86A'), ToolMessage(content='November 21, 2023 at 7:11 PM EST. Save. Tiger Global Management\'s Scott Shleifer, the partner overseeing a majority of the firm\'s assets and who helped launch its venture arm, is stepping down ... Scott Shleifer, the head of Tiger Global\'s $30bn-plus private equity business, will step down from his role at the hedge fund at the end of the year. The move will be a significant change for ... Tiger was founded by Coleman, a protégé of hedge fund pioneer Julian Robertson, in 2000. Shleifer joined two years later. This is not the firm\'s first major leadership change. Nov 29, 2023, 8:26 AM PST. Shleifer cofounded Tiger Global\'s private equity business nearly two decades ago Ben Gabbe/Getty Images. Billionaire Scott Shleifer stepped down from his role leading ... Scott Shleifer is stepping down as head of private equity at Tiger Global, one of the driving forces behind the tech "unicorn" boom, Axios has learned. Details: Shleifer will remain a senior advisor to Tiger, which was founded to manage hedge funds, while firm founder Chase Coleman will assume the chairman role on a new private equity ...', name='duckduckgo_search', tool_call_id='call_0OngtYNzyp3Q6LvAFDbt0kiY'), ToolMessage(content="Luke Zettlemoyer is a research manager and site lead for FAIR Seattle. He is also a Professor in the Allen School of Computer Science & Engineering at the University of Washington. His research is in empirical computational semantics, where the goal is to build models that recover representations of the meaning of natural language text. AUTHORS Written by Scott Yih Luke Zettlemoyer Mike Lewis Hannaneh Hajishirzi Kalpesh Krishna Mohit Iyyer Pang Wei Koh Sewon Min Xinxi Lyu Publisher EMNLP Research Topics Natural Language Processing (NLP) Yichao LuSrihari JayakumarDebojeet ChatterjeeMohsen MoslehpourPierce ChuangAbhay HarpaleVikas BhardwajDi Xu (SWE)Shicong ZhaoAnkit ... Today we're joined by Luke Zettlemoyer, professor at University of Washington and a research manager at Meta. In our conversation with Luke, we cover multimodal generative AI, the effect of data on models, and the significance of open source and open science. We explore the grounding problem, the need for visual grounding and embodiment in FILE - Washington Commanders wide receiver Luke McCaffrey, left, poses with his brother, Miami Dolphins offensive assistant Max McCaffrey on the field after an NFL pre-season football game, on Aug. 17, 2024, in Miami Gardens, Fla. Luke McCaffrey this weekend will become the latest member of his family to play in the NFL. Hibs sporting director Malky Mackay admits the club could delve into the free agent market after missing out to Luke McCowan to Celtic in a dramatic deadline day at Easter Road.. The Easter Road ...", name='duckduckgo_search', tool_call_id='call_9z392ANKJUl318Asr9eKbawX')]



# Receiving update from node: 'agent'
# [AIMessage(content='### QLoRA Paper on Arxiv\n**Title:** QLoRA: Efficient Finetuning of Quantized LLMs  \n**Authors:** Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer  \n**Summary:** QLoRA is an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. The paper introduces several innovations to save memory without sacrificing performance, such as 4-bit NormalFloat (NF4), double quantization, and paged optimizers. The best model family, Guanaco, outperforms previous models on the Vicuna benchmark, reaching 99.3% of ChatGPT\'s performance level with only 24 hours of finetuning on a single GPU. The paper also provides a detailed analysis of instruction following and chatbot performance across various datasets and model scales.\n\n### Latest Tweets of Authors\n\n#### Tim Dettmers\n- **Latest Tweet:** Tim Dettmers is joining Ai2 as an AI researcher. Dettmers specializes in efficient deep learning at the intersection of machine learning, NLP, and computer systems with a focus on quantization.\n\n#### Mike Lewis\n- **Latest Tweet:** Author Michael Lewis met with FTX founder Sam Bankman-Fried more than 100 times. Lewis breaks down the crypto superstar\'s rise and fall in his new book, "Going Infinite."\n\n#### Sam Shleifer\n- **Latest Tweet:** Scott Shleifer, the head of Tiger Global\'s $30bn-plus private equity business, will step down from his role at the hedge fund at the end of the year. The move will be a significant change for the firm.\n\n#### Luke Zettlemoyer\n- **Latest Tweet:** Today we\'re joined by Luke Zettlemoyer, professor at University of Washington and a research manager at Meta. In our conversation with Luke, we cover multimodal generative AI, the effect of data on models, and the significance of open source and open science.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 426, 'prompt_tokens': 2452, 'total_tokens': 2878}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_157b3831f5', 'finish_reason': 'stop', 'logprobs': None}, id='run-75cca5fe-3661-48b6-a719-9ba043009707-0', usage_metadata={'input_tokens': 2452, 'output_tokens': 426, 'total_tokens': 2878})]





In [7]:
# LangSmith Evaluator
# Pre-processing for LangSmith

# To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

# def convert_inputs(input_object):
#   return {"messages" : [HumanMessage(content=input_object["question"])]}

# def parse_output(input_state):
#   return input_state["messages"][-1].content

# agent_chain = convert_inputs | compiled_graph | parse_output

# agent_chain.invoke({"question" : "What is RAG?"})

# "RAG stands for Retrieval-Augmented Generation. It is a technique used in natural language processing (NLP) and machine learning to improve the performance of language models by combining retrieval-based methods with generative models. Here's a brief overview of how it works:\n\n1. **Retrieval**: In the first step, the system retrieves relevant documents or pieces of information from a large corpus or database. This is typically done using a retrieval model, such as BM25 or a dense retrieval model like DPR (Dense Passage Retrieval).\n\n2. **Augmentation**: The retrieved documents are then used to augment the input to the generative model. This means that the generative model has access to additional context or information that can help it produce more accurate and relevant responses.\n\n3. **Generation**: Finally, the generative model, such as GPT-3 or BERT, uses the augmented input to generate a response. The additional context provided by the retrieved documents helps the model generate more informed and contextually appropriate responses.\n\nRAG is particularly useful in scenarios where the generative model alone might not have enough information to produce a high-quality response, such as in open-domain question answering, conversational agents, and other applications requiring access to a large knowledge base."

# Task 1: Creating An Evaluation Dataset

# Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

# In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

# questions = [
#     "What optimizer is used in QLoRA?",
#     "What data type was created in the QLoRA paper?",
#     "What is a Retrieval Augmented Generation system?",
#     "Who authored the QLoRA paper?",
#     "What is the most popular deep learning framework?",
#     "What significant improvements does the LoRA system make?"
# ]

# answers = [
#     {"must_mention" : ["paged", "optimizer"]},
#     {"must_mention" : ["NF4", "NormalFloat"]},
#     {"must_mention" : ["ground", "context"]},
#     {"must_mention" : ["Tim", "Dettmers"]},
#     {"must_mention" : ["PyTorch", "TensorFlow"]},
#     {"must_mention" : ["reduce", "parameters"]},
# ]

# 🏗️ Activity #3:

# Please create a dataset in the above format with at least 5 questions.

# questions = [
#     # **ANSWER**
#     "What is Retrieval Augmented Generation (RAG)?",
#     "How does RAG differ from traditional language models?",
#     "What are the main components of a RAG system?",
#     "What are the benefits of using RAG?",
#     "What are some challenges associated with RAG?",
#     "In what applications is RAG commonly used?",
# ]

# answers = [
#     # ** ANSWER**
#     {"must_mention" : ["retrieval", "generation"]},
#     {"must_mention" : ["retrieves", "external", "information"]},
#     {"must_mention" : ["retrieval system", "language model"]},
#     {"must_mention" : ["accurate", "contextually relevant"]},
#     {"must_mention" : ["retrieval effectivness", "response generation", "efficiency"]},
#     {"must_mention" : ["question answering", "chatbot", "summarization"]},
# ]

# Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

# from langsmith import Client

# client = Client()
# dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

# dataset = client.create_dataset(
#     dataset_name=dataset_name,
#     description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
# )

# client.create_examples(
#     inputs=[{"question" : q} for q in questions],
#     outputs=answers,
#     dataset_id=dataset.id,
# )

# ❓ Question #3:

# How are the correct answers associated with the questions?

#     NOTE: Feel free to indicate if this is problematic or not

# ANSWER

# The create_examples function, which is part of the langsmith module, is doing this. This function is matching list indices which can always be problematic. The documentation here - https://docs.smith.langchain.com/old/evaluation/faq/manage-datasets - shows other approaches that might be less problematic.
# Task 2: Adding Evaluators

# Now we can add a custom evaluator to see if our responses contain the expected information.

# We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

# from langsmith.evaluation import EvaluationResult, run_evaluator

# @run_evaluator
# def must_mention(run, example) -> EvaluationResult:
#     prediction = run.outputs.get("output") or ""
#     required = example.outputs.get("must_mention") or []
#     score = all(phrase in prediction for phrase in required)
#     return EvaluationResult(key="must_mention", score=score)

# ❓ Question #4:

# What are some ways you could improve this metric as-is?

#     NOTE: Alternatively you can suggest where gaps exist in this method.

# ANSWER

# Using exact match will likely result in a much failure rate than is reality. A better way to do this would be to make an LLM call here to check if the response had the required information.

# Now that we have created our custom evaluator - let's initialize our RunEvalConfig with it!

# from langchain.smith import RunEvalConfig, run_on_dataset

# eval_config = RunEvalConfig(
#     custom_evaluators=[must_mention],
# )

# Task 3: Evaluating

# All that is left to do is evaluate our agent's response!

# client.run_on_dataset(
#     dataset_name=dataset_name,
#     llm_or_chain_factory=agent_chain,
#     evaluation=eval_config,
#     verbose=True,
#     project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
#     project_metadata={"version": "1.0.0"},
# )

# View the evaluation results for project 'RAG Pipeline - Evaluation - 529f2d66' at:
# https://smith.langchain.com/o/be600afd-3795-5505-ad50-068c806b5628/datasets/647e0c09-17f1-4396-9496-11bef801de66/compare?selectedSessions=fca7eafc-7cdc-4f29-8b68-8982b3a03b45

# View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - 54557797 at:
# https://smith.langchain.com/o/be600afd-3795-5505-ad50-068c806b5628/datasets/647e0c09-17f1-4396-9496-11bef801de66
# [------------------------------------------------->] 6/6

# Experiment Results:
# feedback.must_mention 	error 	execution_time 	run_id
# count 	6 	0 	6.000000 	6
# unique 	2 	0 	NaN 	6
# top 	True 	NaN 	NaN 	b73a8408-ccf0-4fce-8f33-038fb31af340
# freq 	3 	NaN 	NaN 	1
# mean 	NaN 	NaN 	5.405717 	NaN
# std 	NaN 	NaN 	0.884489 	NaN
# min 	NaN 	NaN 	4.312219 	NaN
# 25% 	NaN 	NaN 	5.015214 	NaN
# 50% 	NaN 	NaN 	5.350587 	NaN
# 75% 	NaN 	NaN 	5.495624 	NaN
# max 	NaN 	NaN 	6.973515 	NaN