# Local Web Research Agent w/ Llama 3 8b

### [Llama 3 Release](https://llama.meta.com/llama3/)

### [Ollama Llama 3 Model](https://ollama.com/library/llama3)
---

![diagram](local_agent_diagram.png)

---
[Llama 3 Prompt Format](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/)

### Special Tokens used with Meta Llama 3
* **<|begin_of_text|>**: This is equivalent to the BOS token
* **<|eot_id|>**: This signifies the end of the message in a turn.
* **<|start_header_id|>{role}<|end_header_id|>**: These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant.
* **<|end_of_text|>**: This is equivalent to the EOS token. On generating this token, Llama 3 will cease to generate more tokens.
A prompt should contain a single system message, can contain multiple alternating user and assistant messages, and always ends with the last user message followed by the assistant header.

In [19]:
# Displaying final output format
from IPython.display import display, Markdown, Latex
# LangChain Dependencies
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langgraph.graph import END, StateGraph
# For State Graph 
from typing_extensions import TypedDict
import os

In [20]:
# Environment Variables
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ["LANGCHAIN_PROJECT"] = "L3 Research Agent"

In [21]:
# Defining LLM
local_llm = 'llama3'
llama3 = ChatOllama(model=local_llm, temperature=0)
llama3_json = ChatOllama(model=local_llm, format='json', temperature=0)

In [22]:
# Web Search Tool
# pip install -U duckduckgo_search==5.3.0b4
# ^ if running into 202 rate limit error

wrapper = DuckDuckGoSearchAPIWrapper(max_results=15)
web_search_tool = DuckDuckGoSearchRun(api_wrapper=wrapper)

# Test Run
# resp = web_search_tool.invoke("home depot news")
# resp

In [23]:
# Generation Prompt

generate_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|> 
    
    You are an AI assistant for Research Question Tasks, that synthesizes web search results. 
    Strictly use the following pieces of web search context to answer the question. If you don't know the answer, just say that you don't know. 
    keep the answer concise, but provide all of the details you can in the form of a research report. 
    Only make direct references to material if provided in the context.
    
    <|eot_id|>
    
    <|start_header_id|>user<|end_header_id|>
    
    Question: {question} 
    Web Search Context: {context} 
    Answer: 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)

# Chain
generate_chain = generate_prompt | llama3 | StrOutputParser()

# Test Run
# question = "How are you?"
# context = ""
# generation = generate_chain.invoke({"context": context, "question": question})
# print(generation)

In [24]:
# Router

router_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|>
    
    You are an expert at routing a user question to either the generation stage or web search. 
    Use the web search for questions that require more context for a better answer, or recent events.
    Otherwise, you can skip and go straight to the generation phase to respond.
    You do not need to be stringent with the keywords in the question related to these topics.
    Give a binary choice 'web_search' or 'generate' based on the question. 
    Return the JSON with a single key 'choice' with no premable or explanation. 
    
    Question to route: {question} 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>
    
    """,
    input_variables=["question"],
)

# Chain
question_router = router_prompt | llama3_json | JsonOutputParser()

# Test Run
# question = "What's up?"
# print(question_router.invoke({"question": question}))

In [25]:
# Query Transformation

query_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|> 
    
    You are an expert at crafting web search queries for research questions.
    More often than not, a user will ask a basic question that they wish to learn more about, however it might not be in the best format. 
    Reword their query to be the most effective web search string possible.
    Return the JSON with a single key 'query' with no premable or explanation. 
    
    Question to transform: {question} 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>
    
    """,
    input_variables=["question"],
)

# Chain
query_chain = query_prompt | llama3_json | JsonOutputParser()

# Test Run
# question = "What's happened recently with Macom?"
# print(query_chain.invoke({"question": question}))

In [26]:
# Graph State
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        search_query: revised question for web search
        context: web_search result
    """
    question : str
    generation : str
    search_query : str
    context : str

# Node - Generate

def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    
    print("Step: Generating Final Response")
    question = state["question"]
    context = state["context"]

    # Answer Generation
    generation = generate_chain.invoke({"context": context, "question": question})
    return {"generation": generation}

# Node - Query Transformation

def transform_query(state):
    """
    Transform user question to web search

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended search query
    """
    
    print("Step: Optimizing Query for Web Search")
    question = state['question']
    gen_query = query_chain.invoke({"question": question})
    search_query = gen_query["query"]
    return {"search_query": search_query}


# Node - Web Search"
def web_search(state):
    """
    Web search based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to context
    """

    search_query = state['search_query']
    print(f'Step: Searching the Web for: "{search_query}"')
    
    # Web search tool call
    search_result = web_search_tool.invoke(search_query)
    return {"context": search_result}


# Conditional Edge, Routing

def route_question(state):
    """
    route question to web search or generation.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("Step: Routing Query")
    question = state['question']
    output = question_router.invoke({"question": question})
    if output['choice'] == "web_search":
        print("Step: Routing Query to Web Search")
        return "websearch"
    elif output['choice'] == 'generate':
        print("Step: Routing Query to Generation")
        return "generate"

In [27]:
# Build the nodes
workflow = StateGraph(GraphState)
workflow.add_node("websearch", web_search)
workflow.add_node("transform_query", transform_query)
workflow.add_node("generate", generate)

# Build the edges
workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "websearch")
workflow.add_edge("websearch", "generate")
workflow.add_edge("generate", END)

# Compile the workflow
local_agent = workflow.compile()

In [28]:
from langsmith import traceable

@traceable 
def run_agent(query):
    output = local_agent.invoke({"question": query})
    print("=======")
    display(Markdown(output["generation"]))

In [29]:
# Test it out!
run_agent("What's are Apple's q3 earnings")

Step: Routing Query
Step: Routing Query to Web Search
Step: Optimizing Query for Web Search
Step: Searching the Web for: "Apple Q3 earnings report"
Step: Generating Final Response


Based on the provided web search context, Apple's Q3 earnings are as follows:

* Quarterly revenue: $81.8 billion (down 1% year over year)
* Quarterly earnings per diluted share: $1.26 (up 5% year over year)

Note that these figures were announced by Apple on August 3, 2023, and the company's CEO Tim Cook and CFO Luca Maestri shared additional details during their Q3 2023 financial results call.

---
# Attaching Evals to Existing Runs

What if you have an existing application that's being traced, and you want to insert evaluations at specific parts of the operation?

### Creating a quick QA dataset to test against

In [30]:
from langsmith import Client

client = Client()

examples = [
    ("What Apple's Q3 Earnings?", "Apple today announced financial results for its fiscal 2023 third quarter ended July 1, 2023. The Company posted quarterly revenue of $81.8 billion, down 1 percent year over year, and quarterly earnings per diluted share of $1.26, up 5 percent year over year."),
    ("What are new apple products?", "Apple is refreshing both iPad Pro models with OLED screens, bringing a major update in display quality. There will be two models with screen sizes around 11 and 13 inches, and we are expecting design updates. With the switch to OLED, Apple is cutting down on thickness, and the new iPad Pro models will be much thinner. We're also expecting them to adopt the M3 chip for faster performance, and Apple is planning to debut a new Magic Keyboard that gives the iPad Pro a more Mac-like feel and a new Apple Pencil.  With the 2024 iPad Air refresh, we're getting two models for the first time. The smaller iPad Air will have a 10.9-inch display like the current iPad Air, while the larger version will have a 12.9-inch display like the current 12.9-inch iPad Pro. The iPad Air models will be more affordable than the iPad Pro models, and won't have \"Pro\" features like ProMotion refresh rates and OLED displays. Rumors are mixed on whether the iPad Air will get the M2 or the M3 chip, but either option will be an improvement over the M1 in the current model."),
]

dataset_name = "Apple - L3 Agent Testing"
if not client.has_dataset(dataset_name=dataset_name):
    dataset = client.create_dataset(dataset_name=dataset_name)
    inputs, outputs = zip(
        *[({"input": input}, {"expected": expected}) for input, expected in examples]
    )
    client.create_examples(inputs=inputs, outputs=outputs, dataset_id=dataset.id)

### Defining Some Custom Evaluators

Few notes here, using structured function calling alongside OpenAI to create a quick LLM-as-judge Evaluator

Also, need to make sure that digging into your runs/child_runs is accurate. Using LangSmith expand all runs to see how this flows exactly.

In [31]:
from langsmith.evaluation import LangChainStringEvaluator, evaluate
from langsmith.schemas import Example, Run
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

# Search Tool Test
def search_retrieval(root_run: Run, example: Example) -> dict:
    """
    A simple evaluator that checks if the retrieved web search contains answer for the question
    """
    # Get documents and answer
    agent_run = next(run for run in root_run.child_runs if run.name == "run_agent")
    LangGraph = next(run for run in agent_run.child_runs if run.name == "LangGraph")
    search_run = next(run for run in LangGraph.child_runs if run.name == "websearch")
    context = search_run.outputs["context"]
    question = agent_run.inputs["query"]

    # Data model
    class GradeWebsearch(BaseModel):
        """Binary score for whether websearch contains question context."""

        binary_score: int = Field(description="Context contains answer to question, 1 or 0")

    # LLM with function call
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    structured_websearch_grader = llm.with_structured_output(GradeWebsearch)

    # Prompt
    system = """You are a grader assessing whether an Web search contains the context needed to answer a user query. \n
        Give a binary score 1 or 0, where 1 means that the answer is in the web search results."""
    websearch_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system),
            ("human", "Web search: \n\n {context} \n\n Question: {question}"),
        ]
    )

    websearch_grader = websearch_prompt | structured_websearch_grader
    score = websearch_grader.invoke({"context": context, "question": question})
    return {"key": "websearch_verification", "score": int(score.binary_score)}

# Hallucination Test
def hallucination(root_run: Run, example: Example) -> dict:
    """
    A simple evaluator that checks to see the answer is grounded in the context
    """
    # Get documents and answer
    agent_run = next(run for run in root_run.child_runs if run.name == "run_agent")
    LangGraph = next(run for run in agent_run.child_runs if run.name == "LangGraph")
    search_run = next(run for run in LangGraph.child_runs if run.name == "websearch")
    context = search_run.outputs["context"]
    generation = LangGraph.outputs["generation"]

    # Data model
    class GradeHallucinations(BaseModel):
        """Binary score for hallucination present in generation answer."""

        binary_score: int = Field(description="Answer is grounded in the facts, 1 or 0")

    # LLM with function call
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    structured_llm_grader = llm.with_structured_output(GradeHallucinations)

    # Prompt
    system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
        Give a binary score 1 or 0, where 1 means that the answer is grounded in / supported by the set of facts."""
    hallucination_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system),
            ("human", "Set of facts: \n\n {context} \n\n LLM generation: {generation}"),
        ]
    )

    hallucination_grader = hallucination_prompt | structured_llm_grader
    score = hallucination_grader.invoke({"context": context, "generation": generation})
    return {"key": "answer_hallucination", "score": int(score.binary_score)}

### Running the Evaluation!

In [32]:
experiment_results = evaluate(
    lambda inputs: run_agent(inputs["input"]),
    data="Apple - L3 Agent Testing",
    evaluators=[search_retrieval, hallucination],
    experiment_prefix="websearch-test-1"
)

View the evaluation results for experiment: 'websearch-test-1-4ebcc119' at:
https://smith.langchain.com/o/ef6f5694-a2fa-5316-9158-12297cd17350/datasets/e301d2c7-3cfd-4a70-8ecf-2ea308bf9ad4/compare?selectedSessions=86b68fc3-c086-4a77-a730-7bb46c77028f




0it [00:00, ?it/s]

Step: Routing Query
Step: Routing Query
Step: Routing Query to Web Search
Step: Optimizing Query for Web Search
Step: Routing Query to Web Search
Step: Optimizing Query for Web Search
Step: Searching the Web for: "new Apple products"
Step: Searching the Web for: "Apple Q3 earnings report"
Step: Generating Final Response
Step: Generating Final Response


Based on the provided web search context, new Apple products include:

* iPad Air: Available in new blue and purple finishes, along with starlight and space gray, starting at $599 for the 11-inch model and $799 for the 13-inch model.
* iPhone 15 and iPhone 15 Plus: Feature a gorgeous new design, Dynamic Island, 48MP Main camera, and A16 Bionic chip. They will be available in five colors and have a USB-C connector, contoured edge, and durable color-infused back glass. Pre-orders begin on September 15, with availability starting on September 22.
* iPhone 15 Pro and iPhone 15 Pro Max: Available in four stunning new finishes, including black titanium, white titanium, blue titanium, and natural titanium. Pre-orders begin on September 15, with availability starting on September 22.
* Apple Watch Series 9: Available in 41mm and 45mm sizes in starlight, midnight, silver, (PRODUCT)RED, and a new pink aluminum case, as well as stainless steel in gold, silver, and graphite cases.
* Apple Pencil: A new, more affordable option with pixel-perfect accuracy, low latency, and tilt sensitivity for note taking, sketching, and more. It works with all iPad models that have a USB-C port, including iPad Pro, iPad Air, and iPad mini, and is available for purchase beginning in early November.
* Mac Studio: Receiving an update, including the silicon, replacing the M1 Max and M1 Ultra with the M2 Max and M2 Ultra.

Note: The article also mentions Apple's upcoming mixed reality headset, which can play back stereoscopic 3D video shot on iPhone 15 Pro.



Based on the provided web search context, Apple's Q3 earnings are as follows:

* Quarterly revenue: $81.8 billion, down 1% year over year
* Quarterly earnings per diluted share: $1.26, up 5% year over year

These figures were announced by Apple in its fiscal 2023 third-quarter earnings report, which was released on August 3, 2023.