# RAG Pipeline with BGE and Llama 3
---

In this notebook, we build a Retrieval-Augmented Generation (RAG) pipeline that enhances a language model with up-to-date news information. We use **BAAI/bge-base-en-v1.5** to generate embeddings and retrieve the most relevant BBC news articles based on a user query. The retrieved documents are then injected into a prompt sent to **meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo**, a model trained on data up to December 2023.

By combining semantic retrieval with generation, the system enables the LLM to produce responses grounded in recent events (including 2024), improving factual accuracy and contextual relevance.

# Table of Contents
- [ 1 - Introduction](#1)
  - [ 1.1 RAG architecture overview](#1-1)
  - [ 1.2 Importing the necessary libraries](#1-2)
- [ 2 - Loading the dataset](#2)
- [ 3 - Embedding generation](#3)
  - [ 3.1 Loading the Embedding Model](#3-1)
  - [ 3.2 Generating Document Embeddings](#3-2)
  - [ 3.3 Generating the final Prompt](#3-3)
- [ 4 - LLM calls](#4)

<a id='1'></a>
## 1 - Introduction

---

<a id='1-1'></a>
### 1.1 RAG Architecture Overview

Below is a simplified representation of a Retrieval-Augmented Generation (RAG) pipeline:

<div align="center">
  <img src="../src/assets/rag_overview.png" alt="RAG Overview" width="60%">
</div>

The system follows a structured workflow. A retriever first identifies the most relevant documents from the dataset based on a user query. The retrieved content is then formatted and injected into an augmented prompt. This enriched prompt is finally passed to the language model to generate a grounded response.

To evaluate the impact of retrieval, responses generated with the RAG pipeline are compared against responses produced without additional retrieved context. This comparison highlights how external knowledge influences factual accuracy and relevance in the model’s output.

<a id='1-2'></a>
### 1.2 Importing the necessary libraries

In [2]:
import os
import sys
from pathlib import Path
sys.path.extend([
    str(Path.cwd().parent),
    str(Path.cwd().parent / "src"),
])
import data
from sentence_transformers import SentenceTransformer
import bm25s
from utils.formatting import (
    pprint_json,
    read_dataframe,
    format_relevant_data,
)
from utils.rag_core import (
    query_news,
    build_embeddings_joblib,
    retrieve,
    get_relevant_data,
    generate_final_prompt,
    generate_with_single_input,
    generate_with_multiple_input,
    llm_call,
    display_widget,
)

<a id='2'></a>
## 2 - Loading the dataset

In [3]:
NEWS_DATA = read_dataframe(path=os.path.join(os.path.dirname(data.__file__), "news_data_dedup.csv"))

In [4]:
pprint_json(NEWS_DATA[9:11])

[
  {
    "guid": "5dae28f191cfd1047f67c409e616fc3f",
    "title": "Paris's Moulin Rouge loses windmill sails overnight",
    "description": "The cause of the sails' collapse from the roof of the world famous cabaret club is not yet clear.",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-europe-68895836",
    "published_at": "2024-04-25",
    "updated_at": "2024-04-26"
  },
  {
    "guid": "d2c3ff79d4e068911d05416ca061cd51",
    "title": "Ukraine uses longer-range US missiles for first time",
    "description": "Missiles secretly delivered this month have been used to strike Russian targets in Crimea, US media say.",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-europe-68893196",
    "published_at": "2024-04-25",
    "updated_at": "2024-04-26"
  }
]


Important fields are `title`, `description`, `url` and `published_at`. These fields will give good information to the LLM to answer the majority of questions with good enough data.

In [5]:
indices = [3, 6, 9]
pprint_json(query_news(indices=indices, dataset=NEWS_DATA))

[
  {
    "guid": "e696224ac208878a5cec8bdc9f97c632",
    "title": "Europe risks dying and faces big decisions - Macron",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-europe-68898887",
    "published_at": "2024-04-25",
    "updated_at": "2024-04-26"
  },
  {
    "guid": "4f585bad8f61b715fbafe2f022ab0ae8",
    "title": "Supreme Court divided on whether Trump has immunity",
    "description": "The justices discussed immunity, coups, pardons, Operation Mongoose - and the future of democracy.",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-us-canada-68901817",
    "published_at": "2024-04-25",
    "updated_at": "2024-04-26"
  },
  {
    "guid": "5dae28f191cfd1047f67c409e616fc3f",
    "title": "Paris's Moulin Rouge loses windmill sails overnight",
    "description": "The cause of the sails' collapse from the roof of the world famous cabaret club is not yet clear.",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-europe-68895836",
    "

<a id='3'></a>
## 3 - Embedding generation

<a id='3-1'></a> 
### 3.1 Loading the Embedding Model

In [6]:
model = SentenceTransformer("BAAI/bge-base-en-v1.5")

Loads the **[BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)** embedding model to generate 768-dimensional semantic text embeddings.

<a id='3-2'></a> 
### 3.2 Generating Document Embeddings

In [7]:
EMBEDDINGS = build_embeddings_joblib(
    dataset=NEWS_DATA,
    model=model,
    output_path="embeddings.joblib",   
    fields=["title", "description", "url", "published_at", "updated_at"],   
    batch_size=32,
    normalize_embeddings=True
)

Batches:   0%|          | 0/28 [00:00<?, ?it/s]

<a id='3-3'></a> 
### 3.3 Generating the final Prompt

#### Text Corpus

In [8]:
fields = ["title", "description", "url", "published_at", "updated_at"]
corpus = []
for item in NEWS_DATA:
    text_parts = []
    for field in fields:
        value = item.get(field, "")
        if value:
            text_parts.append(str(value))
    text = " ".join(text_parts).strip()[:493]
    corpus.append(text)

#### BM25 retriever

In [9]:
BM25_RETRIEVER = bm25s.BM25(corpus=corpus)
tokenized_data = bm25s.tokenize(corpus)
BM25_RETRIEVER.index(tokenized_data)

Split strings:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/870 [00:00<?, ?it/s]

In [10]:
indices_bm25 = retrieve(query="Concerts in North America", model=model, embeddings=EMBEDDINGS, top_k = 5, method="bm25", BM25_RETRIEVER=BM25_RETRIEVER,corpus=corpus)

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

In [11]:
indices_bm25

[715, 84, 656, 383, 628]

#### Embedding based retriever

In [12]:
indices_sem = retrieve(query="Concerts in North America", model=model, embeddings=EMBEDDINGS, top_k = 1)

In [13]:
indices_sem

[350]

#### Extracting relevant data using the semantic approach

In [14]:
query = "Greatest storms in the US"
relevant_data = get_relevant_data(query=query, model=model, embeddings=EMBEDDINGS, dataset=NEWS_DATA, top_k = 1)
pprint_json(relevant_data)

[
  {
    "guid": "3ca548fe82c3fcae2c4c0c635d03eb2e",
    "title": "Large tornado seen touching down in Nebraska",
    "description": "Severe and powerful storms have moved across several US states, leaving many experiencing power shortages.",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-us-canada-68860070",
    "published_at": "2024-04-26",
    "updated_at": "2024-04-28"
  }
]


#### Extracting relevant data using BM25 approach

In [15]:
query = "Greatest storms in the US"
relevant_data = get_relevant_data(query=query, model=model, embeddings=EMBEDDINGS, dataset=NEWS_DATA, top_k = 1,method="bm25", BM25_RETRIEVER=BM25_RETRIEVER,corpus=corpus)
pprint_json(relevant_data)

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

[
  {
    "guid": "3ca548fe82c3fcae2c4c0c635d03eb2e",
    "title": "Large tornado seen touching down in Nebraska",
    "description": "Severe and powerful storms have moved across several US states, leaving many experiencing power shortages.",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-us-canada-68860070",
    "published_at": "2024-04-26",
    "updated_at": "2024-04-28"
  }
]


#### Extracting relevant data using hybrid search

In [16]:
query = "Greatest storms in the US"
relevant_data = get_relevant_data(query=query, model=model, embeddings=EMBEDDINGS, dataset=NEWS_DATA, top_k = 1,method="hybrid_rrf", BM25_RETRIEVER=BM25_RETRIEVER,corpus=corpus)
pprint_json(relevant_data)

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

{458: 0.03278688524590164}
[
  {
    "guid": "3ca548fe82c3fcae2c4c0c635d03eb2e",
    "title": "Large tornado seen touching down in Nebraska",
    "description": "Severe and powerful storms have moved across several US states, leaving many experiencing power shortages.",
    "venue": "BBC",
    "url": "https://www.bbc.co.uk/news/world-us-canada-68860070",
    "published_at": "2024-04-26",
    "updated_at": "2024-04-28"
  }
]


#### generating the final prompt using semantic search

In [17]:
print(generate_final_prompt(query="Tell me about the US GDP in the past 3 years.", model=model, embeddings=EMBEDDINGS, dataset=NEWS_DATA, top_k=5, use_rag=True, prompt=None))

Answer the user query below. There will be provided additional information for you to compose your answer. The relevant information provided is from 2024 and it should be added as your overall knowledge to answer the query, you should not rely only on this information to answer the query, but add it to your overall knowledge.Query: Tell me about the US GDP in the past 3 years.
2024 News: {
  "guid": "60adcbc18cfa8fee177fbe0f25dd350c",
  "title": "America's Economy Is No. 1. That Means Trouble",
  "description": "If you want a single number to capture America’s economic stature, here it is: This year, the U.S. will account for 26.3% of the global gross domestic product, the highest in almost two decades. That’s based on the latest projections from the International Monetary Fund. According to the IMF, Europe’s share of world GDP has dropped 1.4 percentage points since 2018, and Japan’s by 2.1 points. The U.S. share, by contrast, is up 2.3 points.",
  "venue": "WSJ",
  "url": "https://ww

#### generating the final prompt using BM25 approach

In [18]:
print(generate_final_prompt(
       query="Tell me about the US GDP in the past 3 years.",
       model=model,
       embeddings=EMBEDDINGS,
       dataset=NEWS_DATA,
       top_k=5,
       use_rag=True,
       prompt=None,
       method="bm25s",
       BM25_RETRIEVER=BM25_RETRIEVER,
       corpus=corpus))

Answer the user query below. There will be provided additional information for you to compose your answer. The relevant information provided is from 2024 and it should be added as your overall knowledge to answer the query, you should not rely only on this information to answer the query, but add it to your overall knowledge.Query: Tell me about the US GDP in the past 3 years.
2024 News: {
  "guid": "60adcbc18cfa8fee177fbe0f25dd350c",
  "title": "America's Economy Is No. 1. That Means Trouble",
  "description": "If you want a single number to capture America’s economic stature, here it is: This year, the U.S. will account for 26.3% of the global gross domestic product, the highest in almost two decades. That’s based on the latest projections from the International Monetary Fund. According to the IMF, Europe’s share of world GDP has dropped 1.4 percentage points since 2018, and Japan’s by 2.1 points. The U.S. share, by contrast, is up 2.3 points.",
  "venue": "WSJ",
  "url": "https://ww

#### generating the final prompt using hybrid search approach

In [19]:
print(generate_final_prompt(
       query="Tell me about the US GDP in the past 3 years.",
       model=model,
       embeddings=EMBEDDINGS,
       dataset=NEWS_DATA,
       top_k=5,
       use_rag=True,
       prompt=None,
       method="hybrid_rrf",
       BM25_RETRIEVER=BM25_RETRIEVER,
       corpus=corpus))

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

{772: 0.01639344262295082, 626: 0.03225806451612903, 754: 0.015873015873015872, 289: 0.03125, 752: 0.03125763125763126, 799: 0.01639344262295082, 630: 0.015384615384615385}
Answer the user query below. There will be provided additional information for you to compose your answer. The relevant information provided is from 2024 and it should be added as your overall knowledge to answer the query, you should not rely only on this information to answer the query, but add it to your overall knowledge.Query: Tell me about the US GDP in the past 3 years.
2024 News: {
  "guid": "33eb5444b0c15721298806596220f367",
  "title": "Do the GDP and Dow Reflect American Well-Being?",
  "description": "Do the GDP and Dow Reflect American Well-Being?",
  "venue": "WSJ",
  "url": "https://www.wsj.com/economy/gdp-and-the-dow-are-up-but-what-about-american-well-being-87f90e6d?mod=wknd_pos1",
  "published_at": "2024-04-25",
  "updated_at": "2024-04-26"
}
{
  "guid": "00c09867fdc33a9b6759e6b00218070b",
  "title

<a id='4'></a>
## 4 - LLM calls

In [20]:
output = generate_with_single_input(
    prompt="What is the capital of France?",
    together_api_key=os.getenv("TOGETHER_API_KEY")
    
)
print("Role:", output['role'])
print("Content:", output['content'])

Role: assistant
Content: The capital of France is Paris.


In [21]:
messages = [
    {'role': 'user', 'content': 'Hello, who won the FIFA world cup in 2018?'},
    {'role': 'assistant', 'content': 'France won the 2018 FIFA World Cup.'},
    {'role': 'user', 'content': 'Who was the captain?'}
]

output = generate_with_multiple_input(
    messages=messages,
    together_api_key=os.getenv("TOGETHER_API_KEY"),
    max_tokens=100
)

print("Role:", output['role'])
print("Content:", output['content'])

Role: assistant
Content: The captain of the French team that won the 2018 FIFA World Cup was Hugo Lloris.


In [22]:
query = "Tell me about the US GDP in the past 3 years."

#### llm call using semantic search

In [27]:
print(llm_call(query=query, model=model, embeddings=EMBEDDINGS, dataset=NEWS_DATA, top_k=5, use_rag=True, prompt=None))

Based on the provided news articles from 2024, I can provide some information about the US GDP in the past 3 years. However, please note that the provided news articles do not provide specific data on the US GDP for the past 3 years.

According to the International Monetary Fund (IMF), the US share of world GDP has increased by 2.3 percentage points since 2018, making it the largest economy in the world. This is based on the latest projections from the IMF.

The IMF also states that Europe's share of world GDP has dropped 1.4 percentage points since 2018, and Japan's share has dropped 2.1 percentage points.

In terms of the current state of the US economy, the news articles suggest that it is strong, with solid growth and a strong dollar. However, there are also concerns about the potential for stagflation, which could impact the economy.

It's worth noting that the news articles do not provide specific data on the US GDP for the past 3 years. To get a more accurate picture of the US G

#### llm call unig BM25 approach

In [28]:
print(llm_call(
      query=query,
      model=model,
      embeddings=EMBEDDINGS,
      dataset=NEWS_DATA,
      top_k=5,
      use_rag=True,
      prompt=None,
      method="bm25s",
      BM25_RETRIEVER=BM25_RETRIEVER,
      corpus=corpus))

Based on the provided news articles from 2024, I can provide some information about the US GDP in the past 3 years. However, please note that the provided news articles do not provide specific data on the US GDP for the past 3 years.

According to the International Monetary Fund (IMF), the US share of world GDP has increased by 2.3 percentage points since 2018, making it the largest economy in the world. This is based on the latest projections from the IMF.

The IMF also states that Europe's share of world GDP has dropped 1.4 percentage points since 2018, and Japan's share has dropped 2.1 percentage points.

In terms of the current state of the US economy, the news articles suggest that it is strong, with solid growth and a strong dollar. However, there are also concerns about the potential for stagflation, which could impact the economy.

It's worth noting that the news articles do not provide specific data on the US GDP for the past 3 years. To get a more accurate picture of the US G

#### llm call using hybrid approach

In [29]:
print(llm_call(
      query=query,
      model=model,
      embeddings=EMBEDDINGS,
      dataset=NEWS_DATA,
      top_k=5,
      use_rag=True,
      prompt=None,
      method="hybrid_rrf",
      BM25_RETRIEVER=BM25_RETRIEVER,
      corpus=corpus))

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

{772: 0.01639344262295082, 626: 0.03225806451612903, 754: 0.015873015873015872, 289: 0.03125, 752: 0.03125763125763126, 799: 0.01639344262295082, 630: 0.015384615384615385}
Based on the provided news articles from 2024, I can provide some information about the US GDP in the past 3 years. However, please note that the provided articles do not provide specific data on the US GDP for the past 3 years.

According to the articles, the US economy is performing well, with the US accounting for 26.3% of the global gross domestic product (GDP) in 2024, the highest in almost two decades. This is based on projections from the International Monetary Fund (IMF).

The articles also mention that the US GDP has been growing, but at a slower pace, and that there are signs of stagflation, which is a combination of slow economic growth and persistent inflation.

However, the articles do not provide specific data on the US GDP for the past 3 years. To provide a more comprehensive answer, I would need to r

In [31]:
import ipywidgets as widgets
from IPython.display import display, Markdown
import numpy as np 

def display_widget(
    llm_call_func: callable,
    model: SentenceTransformer,
    embeddings: np.ndarray,
    dataset: list[dict],
    *,
    BM25_RETRIEVER=None,
    corpus=None,
    rrf_k: int = 60,
) -> None:
    def on_button_click(_):
        # Clear outputs
        out_sem.clear_output()
        out_bm25.clear_output()
        out_rrf.clear_output()
        out_no_rag.clear_output()
        status_output.clear_output()

        status_output.append_stdout("Generating...\n")

        query = query_input.value.strip()
        top_k = slider.value
        custom_prompt = prompt_input.value.strip() if prompt_input.value.strip() else None

        if not query:
            status_output.clear_output()
            status_output.append_stdout("Please enter a query.\n")
            return

        # --- Semantic (RAG) ---
        try:
            resp_sem = llm_call(
                query=query,
                model=model,
                embeddings=embeddings,
                dataset=dataset,
                top_k=top_k,
                use_rag=True,
                prompt=custom_prompt,
                method="semantic",
                BM25_RETRIEVER=BM25_RETRIEVER,
                corpus=corpus,
                rrf_k=rrf_k,
            )
        except Exception as e:
            resp_sem = f"**Error (Semantic):** {e}"

        # --- BM25 (RAG) ---
        try:
            resp_bm25 = llm_call(
                query=query,
                model=model,
                embeddings=embeddings,
                dataset=dataset,
                top_k=top_k,
                use_rag=True,
                prompt=custom_prompt,
                method="bm25",
                BM25_RETRIEVER=BM25_RETRIEVER,
                corpus=corpus,
                rrf_k=rrf_k,
            )
        except Exception as e:
            resp_bm25 = f"**Error (BM25):** {e}"

        # --- Hybrid RRF (RAG) ---
        try:
            resp_rrf = llm_call(
                query=query,
                model=model,
                embeddings=embeddings,
                dataset=dataset,
                top_k=top_k,
                use_rag=True,
                prompt=custom_prompt,
                method="hybrid_rrf",
                BM25_RETRIEVER=BM25_RETRIEVER,
                corpus=corpus,
                rrf_k=rrf_k,
            )
        except Exception as e:
            resp_rrf = f"**Error (RRF):** {e}"

        # --- Without RAG ---
        try:
            resp_no_rag = llm_call(
                query=query,
                model=model,
                embeddings=embeddings,
                dataset=dataset,
                top_k=top_k,
                use_rag=False,
                prompt=custom_prompt,
                method="semantic",  # irrelevant when use_rag=False, but harmless
                BM25_RETRIEVER=BM25_RETRIEVER,
                corpus=corpus,
                rrf_k=rrf_k,
            )
        except Exception as e:
            resp_no_rag = f"**Error (Without RAG):** {e}"

        with out_sem:
            display(Markdown(resp_sem))
        with out_bm25:
            display(Markdown(resp_bm25))
        with out_rrf:
            display(Markdown(resp_rrf))
        with out_no_rag:
            display(Markdown(resp_no_rag))

        status_output.clear_output()

    # Inputs
    query_input = widgets.Text(
        description="Query:",
        placeholder="Type your query here",
        layout=widgets.Layout(width="100%")
    )

    prompt_input = widgets.Textarea(
        description="Augmented prompt layout:",
        placeholder=("Optional custom prompt. Use {query} and {documents} placeholders.\n"
                     "Leave blank to use the default prompt builder."),
        layout=widgets.Layout(width="100%", height="90px"),
        style={"description_width": "initial"}
    )

    slider = widgets.IntSlider(
        value=5,
        min=1,
        max=20,
        step=1,
        description="Top K:",
        style={"description_width": "initial"}
    )

    submit_button = widgets.Button(
        description="Get Responses",
        button_style="",  # keep neutral
        layout=widgets.Layout(width="160px")
    )
    submit_button.on_click(on_button_click)

    status_output = widgets.Output()

    # Outputs (4 panels)
    out_sem = widgets.Output(layout={"border": "1px solid #ccc", "height": "320px", "overflow": "auto"})
    out_bm25 = widgets.Output(layout={"border": "1px solid #ccc", "height": "320px", "overflow": "auto"})
    out_rrf = widgets.Output(layout={"border": "1px solid #ccc", "height": "320px", "overflow": "auto"})
    out_no_rag = widgets.Output(layout={"border": "1px solid #ccc", "height": "320px", "overflow": "auto"})

    # Titles
    title_sem = widgets.HTML("<b>Semantic Search</b>")
    title_bm25 = widgets.HTML("<b>BM25 Search</b>")
    title_rrf = widgets.HTML("<b>Reciprocal Rank Fusion</b>")
    title_no_rag = widgets.HTML("<b>Without RAG</b>")

    # Layout: controls on top
    controls = widgets.VBox([
        query_input,
        prompt_input,
        widgets.HBox([slider, submit_button]),
        status_output
    ])

    # Layout: 2x2 grid
    cell_left_top = widgets.VBox([title_sem, out_sem])
    cell_right_top = widgets.VBox([title_bm25, out_bm25])
    cell_left_bottom = widgets.VBox([title_rrf, out_rrf])
    cell_right_bottom = widgets.VBox([title_no_rag, out_no_rag])

    grid = widgets.VBox([
        widgets.HBox([cell_left_top, cell_right_top], layout=widgets.Layout(justify_content="space-between")),
        widgets.HBox([cell_left_bottom, cell_right_bottom], layout=widgets.Layout(justify_content="space-between")),
    ])

    # Make columns consistent width
    for cell in [cell_left_top, cell_right_top, cell_left_bottom, cell_right_bottom]:
        cell.layout.width = "49%"

    display(widgets.HTML("""
    <style>
        .widget-label { font-size: 14px; }
        textarea { font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; }
    </style>
    """))

    display(controls, grid)

In [None]:
display_widget(
    llm_call,
    model=model,
    embeddings=EMBEDDINGS,
    dataset=NEWS_DATA,
    BM25_RETRIEVER=BM25_RETRIEVER,
    corpus=corpus
)

HTML(value='\n    <style>\n        .widget-label { font-size: 14px; }\n        textarea { font-family: ui-mono…

VBox(children=(Text(value='', description='Query:', layout=Layout(width='100%'), placeholder='Type your query …

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Semantic Search</b>'), Output(layout=Layout(border…

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]