# L5: Long-context understanding

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

In [1]:
import warnings
warnings.filterwarnings('ignore')

## Load API keys and libraries

In [2]:
import os
from utils import get_llama_api_key, get_llama_base_url, get_together_api_key
from utils import get_hf_access_token, get_github_access_token

llama_api_key = get_llama_api_key()
llama_base_url = get_llama_base_url()
together_api_key = get_together_api_key()
hf_access_token = get_hf_access_token()
github_access_token = get_github_access_token()

from utils import llama4, llama4_together
from transformers import AutoTokenizer

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> 💻 &nbsp; <b>Access <code>requirements.txt</code> and <code>helper.py</code> files:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em>.</p>

<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

<p> 📒 &nbsp; For more help, please see the <em>"Appendix – Tips, Help, and Download"</em> Lesson.</p>
</div>

In [3]:
model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [4]:
prompt = "This is a test prompt."
num_tokens = len(tokenizer.encode(prompt))
print(f"Number of tokens: {num_tokens}")

Number of tokens: 7


## Large book

In [5]:
with open("files/war-and-peace.txt", "r", encoding='utf=8') as file:
    wp = file.read()
len(wp), len(tokenizer.encode(wp))

(3227578, 778678)

<p style="background-color:#f7fff8; padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px"> 🚨
&nbsp; <b>Different Run Results:</b> The output generated by AI chat models can vary with each execution due to their dynamic, probabilistic nature. Don't be surprised if your results differ from those shown in the video.</p>

In [6]:
print(llama4_together(f"give me a summary of the book below: {wp}"))

The Project Gutenberg eBook of War and Peace by Leo Tolstoy is a classic novel that follows the lives of several aristocratic Russian families during the Napoleonic Wars. The story begins in July 1805 at a reception hosted by Anna Pavlovna Scherer, where the guests discuss the impending war with Napoleon. Prince Vasili Kuragin, a high-ranking and influential man, is the first to arrive at the reception. He is greeted by Anna Pavlovna, who is suffering from la grippe. The conversation turns to politics, and Anna Pavlovna expresses her concerns about Napoleon's intentions.

Meanwhile, Pierre Bezukhov, the illegitimate son of a wealthy count, arrives at the reception. He is a socially awkward but intelligent and observant young man. Anna Pavlovna is anxious about Pierre's presence and tries to keep him under observation. The conversation at the reception is lively, with the guests discussing politics, society, and the arts.

The novel then follows the lives of several characters, includin

## Multiple document summarization

In [7]:
from utils import pdf2text

papers = ["2407.16833.pdf", "2409.01666.pdf", "2411.03538.pdf", "2503.17407.pdf", "2404.06654.pdf"]
paper_texts = []
for n, paper in enumerate(papers):
  text = pdf2text(f"files/{paper}")
  paper_texts.append(f"Paper {n+1}:\n{text}")
  print(paper, len(text), len(tokenizer.encode(text)))

total_text = "\n\n".join(paper_texts)
print(f"""Total papers: {len(total_text)},
{len(tokenizer.encode(total_text))}""")

2407.16833.pdf 48830 13393
2409.01666.pdf 17822 4886
2411.03538.pdf 49248 13122
2503.17407.pdf 426588 108877
2404.06654.pdf 84506 28690
Total papers: 627047,
168986


In [8]:
print(llama4_together(f"""give me a summary of the five papers
below: {total_text}"""))

## Paper 1: Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

The paper presents a comprehensive study comparing Retrieval Augmented Generation (RAG) and long-context Large Language Models (LLMs). The study aims to leverage the strengths of both approaches and proposes a hybrid method called SELF-ROUTE. SELF-ROUTE routes queries to RAG or long-context LLMs based on model self-reflection, significantly reducing computational cost while maintaining comparable performance to long-context LLMs.

## Paper 2: In Defense of RAG in the Era of Long-Context Language Models

This paper argues that Retrieval Augmented Generation (RAG) is still relevant in the era of long-context language models. The authors propose an order-preserving retrieval-augmented generation (OP-RAG) mechanism, which improves the performance of RAG for long-context question-answer applications. OP-RAG preserves the order of retrieved chunks in the original text, and the authors 

## Single-hop codebase question answering

In [9]:
from utils import download_and_extract_repo, get_py_files, write_files_to_text

repo_url = "https://github.com/meta-llama/llama-models"
repo_name = repo_url.rstrip('/').split('/')[-1]
extract_dir = f"./{repo_name}"

download_repo = False # The repo is already downloaded and extracted on the platform. If you wouldf like to try this for another repo, please change this to True.
if download_repo:
    download_and_extract_repo(repo_url, extract_dir)

py_files = get_py_files(extract_dir)
output_file = f"files/{repo_name}_files.txt"
write_files_to_text(py_files, output_file)

print(f"Output written to {output_file}")

Writing ./llama-models/llama-models-main/models/__init__.py
Writing ./llama-models/llama-models-main/models/checkpoint.py
Writing ./llama-models/llama-models-main/models/datatypes.py
Writing ./llama-models/llama-models-main/models/quantize_impls.py
Writing ./llama-models/llama-models-main/models/tokenizer_utils.py
Writing ./llama-models/llama-models-main/models/llama3/__init__.py
Writing ./llama-models/llama-models-main/models/llama3/args.py
Writing ./llama-models/llama-models-main/models/llama3/chat_format.py
Writing ./llama-models/llama-models-main/models/llama3/generation.py
Writing ./llama-models/llama-models-main/models/llama3/model.py
Writing ./llama-models/llama-models-main/models/llama3/tokenizer.py
Writing ./llama-models/llama-models-main/models/llama3/tool_utils.py
Writing ./llama-models/llama-models-main/models/llama3/multimodal/__init__.py
Writing ./llama-models/llama-models-main/models/llama3/multimodal/encoder_utils.py
Writing ./llama-models/llama-models-main/models/llama

In [10]:
!head -100 files/llama-models_files.txt

Path: ./llama-models/llama-models-main/models/__init__.py
Content:
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# top-level folder for each specific model found within the models/ directory at
# the top-level of this source tree.


Path: ./llama-models/llama-models-main/models/checkpoint.py
Content:
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# top-level folder for each specific model found within the models/ directory at
# the top-level of this source tree.

import concurrent.futures
import re
from pathlib import Path
from typing import Any, Dict, List, Optional, Union

import numpy as np
import torch
from fairscale.nn.model_parallel.initialize import get_model_parallel_rank, get_model_parallel_world_size


def map_mp_rank(old_mp_siz

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [11]:
with open("files/llama-models_files.txt", "r", encoding='utf=8') as file:
    repo = file.read()

In [12]:
prompt = f"""Which file in the content below, which consists of a list
of source file path and its content, has the function _encode_image
defined? content:
{repo}"""
print(llama4_together(prompt))

To find which file has the function `_encode_image` defined, we need to search through the provided content.


## Step 1: Review the content
The content provided consists of a list of source file paths and their corresponding contents.


## Step 2: Identify the function definition
We are looking for the function `_encode_image`. This function seems to be related to encoding images.


## Step 3: Search for the function
Upon reviewing the content, we find that the function `_encode_image` is defined in the file `./llama-models/llama-models-main/models/llama4/chat_format.py`.


The final answer is: 
./llama-models/llama-models-main/models/llama4/chat_format.py


## Analyze extensive user activity

In [13]:
from utils import get_pull_requests
import pickle
repo_owner = "meta-llama"
repo_name = "llama-cookbook"

# We have saved the pull_requests as a pickle file to make this run faster by loading files from local.
cache_file = "meta-llama_llama-cookbook_pull_requests.pkl"
load_cache_file = True

if load_cache_file:
    with open(cache_file, "rb") as f:
        pull_requests = pickle.load(f)
else:
    pull_requests = get_pull_requests(repo_owner, repo_name)
    with open(cache_file, "wb") as f:
        pickle.dump(pull_requests, f)

for pr in pull_requests:
    print(pr["number"], pr["title"])

962 Fix README links for use cases
959 Created an api inference script with its supporting documentation
958 add vertex notebooks for gcp
957 Updated notebook to use Llama API
956 Bootcamp week2 task
954 Adding Databricks RAG example recipe
951 Fix model string name for Llama-Prompt-Guard-2-86M in inference.py
950 Fix/update samsum
944 [Llama Tools] Getting started fixes for llama-prompt-ops
943 fix failing github actions
942 Llama 4 Fine tuning: updated docs to streamline environment setup and clarify command line arguments
941 [Llama Tools] Getting started guide for llama-prompt-ops 
940 Fix/GitHub action runner
939 fix links and references
937 Notebook for generating evals using sythetic data
935 [Hotfix] Pytest Workflow AndroidManifest.xml issue
932 Llama4 Fine-tuning using torchtune
931 [WIP] image grounding example
930 fixing readme links
929 Update README.md
928 Llama 4 api recipes
927 Llama 4 api release
926 Update android app
925 changed planning prompt, execution prompt, few 

In [14]:
import pickle

pickle.dump(pull_requests, open(f"{repo_owner}_{repo_name}_pull_requests.pkl", "wb"))

In [15]:
len(pull_requests)

512

In [16]:
pull_requests[0]

{'url': 'https://api.github.com/repos/meta-llama/llama-cookbook/pulls/962',
 'id': 2584943236,
 'node_id': 'PR_kwDOJ8YnuM6aExqE',
 'html_url': 'https://github.com/meta-llama/llama-cookbook/pull/962',
 'diff_url': 'https://github.com/meta-llama/llama-cookbook/pull/962.diff',
 'patch_url': 'https://github.com/meta-llama/llama-cookbook/pull/962.patch',
 'issue_url': 'https://api.github.com/repos/meta-llama/llama-cookbook/issues/962',
 'number': 962,
 'state': 'closed',
 'locked': False,
 'title': 'Fix README links for use cases',
 'user': {'login': 'fbnav',
  'id': 82674371,
  'node_id': 'MDQ6VXNlcjgyNjc0Mzcx',
  'avatar_url': 'https://avatars.githubusercontent.com/u/82674371?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/fbnav',
  'html_url': 'https://github.com/fbnav',
  'followers_url': 'https://api.github.com/users/fbnav/followers',
  'following_url': 'https://api.github.com/users/fbnav/following{/other_user}',
  'gists_url': 'https://api.github.com/users/fbnav/gist

In [17]:
from utils import get_pr_content

all_content = []
for pr in pull_requests:
    pr_number = pr['number']
    content = get_pr_content(repo_owner, repo_name,
                             pr_number, github_access_token) # a github access token is needed to avoid the lower rate limit of making github requeests
    if content:
        all_content.append(f"PR #{pr_number}: {content}")

len(all_content)

512

In [18]:
all_pr_content = "\n\n".join(all_content)
len(all_pr_content), len(tokenizer.encode(all_pr_content))

(1051214, 260468)

In [19]:
query = "how many PRs are about android or iOS?"
print(llama4_together(f"""{query}
Below is all the PR info:
{all_pr_content}
"""))

## Step 1: Analyze the given PR information
The given information includes a list of pull requests (PRs) with their descriptions, which detail various changes, fixes, and additions to a repository. The task is to determine how many of these PRs are related to Android or iOS.

## Step 2: Identify relevant PRs
To solve this task, we need to go through each PR and identify if it mentions Android or iOS.

## Step 3: Count Android and iOS related PRs
After analyzing each PR, we find the following relevant PRs:
- PR #962: No mention of Android or iOS.
- PR #959: No mention of Android or iOS.
- PR #958: No mention of Android or iOS.
- PR #957: No mention of Android or iOS.
- PR #956: No mention of Android or iOS.
- PR #954: No mention of Android or iOS.
- PR #953: No mention of Android or iOS.
- PR #951: No mention of Android or iOS.
- PR #950: No mention of Android or iOS.
- PR #944: No mention of Android or iOS.
- PR #943: No mention of Android or iOS.
- PR #942: No mention of Android or iO

In [20]:
query = f"""how many PRs are about agents? give a one-sentence
summary of each, then a summary of all those agent PRs."""

print(llama4_together(f"""{query}
Below is all the PR info:
{all_pr_content}
"""))

There are 6 PRs about agents:

1. PR #825: Llama code/code review agents - This PR showcases how to use Llama for code review.
2. PR #706: Adds Llama 3.2 example on Modal with a fun experiment - This PR provides an example of using Llama 3.2 on Modal.
3. PR #618: [Recipe] Example featuring built-in tool calling capabilities - Wolfram Alpha, Interpreter, Brave Search - This PR demonstrates Llama 3.1's tool calling capabilities.
4. PR #597: Fix broken image link - This PR fixes a broken image link.
5. PR #570: bug fix - This PR fixes a bug in the Jupyter notebook langgraph-rag-agent.ipynb.
6. PR #825: Llama code/code review agents - This PR provides an example of using Llama for code review.

Here is a summary of all the agent PRs: These PRs provide examples and bug fixes for using Llama with various agents and tools, including code review, Modal, Wolfram Alpha, Interpreter, and Brave Search.
