# Techniques for Improving the Effectiveness of RAG Systems

---

## Lesson 01: Exploring and Preparing your Dataset for Retrieval

Large language models (LLMs) are powerful tools for solving real-world problems--as long as they have the information necessary to help. Useful LLM-based systems need to be able to integrate new information reliably and quickly. Both of those goals can be achieved via retrieval-augmented generation (RAG): retrieving information from a data source and injecting it into the LLM's prompt.

This greatly simplifies the problem that the LLM has to solve. Instead of requiring the information to have been available during training, the LLM can simply "read" the sources it receives as context and use them to find a solution. 

This course is a practical guide to building more effective RAG applications. RAG systems show great promise, but the simplest version of RAG--the one most often discussed in typical online tutorials--can struggle with anything other than basic queries. This course will show how your RAG system design can significantly improve performance at whatever task you choose.

**This notebook will focus on the chunker and LLM.**


<div style="text-align: center;">
<img src="img/01_overview.png" width="850" alt="architecture with router, chunker, and LLM highlighted">
</div>

---

## Data Sources: NVIDIA Tech Blog

Many of you will probably be looking to build a RAG system suited to your particular domain. We're no different, and so for this course, we'll be using NVIDIA materials as our data source. In particular, we'll be using HTML articles from the [NVIDIA Tech Blog](https://developer.nvidia.com/blog).

This data source contains thousands of articles written by NVIDIAns on a number of different topics. Some articles are deep technical walkthroughs with lots of code samples. Other articles are more like news pieces that announce a new SDK release or feature. There is a pretty wide range of NVIDIA-related content, because our technology is used in so many different industries.

We can do many different things with that data source--but first, we will explore it to figure out what kind of tasks we ultimately want our RAG application to accomplish.

---

## Data Exploration

TechBlogs are HTML pages from a Wordpress Blog. They have a leading title, and then are often further divided into sections, which can be one or more paragraphs. The blogs contain a mix of regular text, code, and images.

Have a look at a few of these articles using these links: 
- https://developer.nvidia.com/blog/create-share-and-scale-enterprise-ai-workflows-with-nvidia-ai-workbench-now-in-beta/
- https://developer.nvidia.com/blog/improving-cuda-initialization-times-using-cgroups-in-certain-scenarios/
- https://developer.nvidia.com/blog/bringing-generative-ai-to-the-edge-with-nvidia-metropolis-microservices-for-jetson/

If we "embed" the text of each post, we get a floating-point vector with hundreds or even thousands of dimensions quantifying the "meaning" of each blog post. To visualize this, we can reduce the dimensionality of these vectors so each blog is now represented by a 3D point which we can easily plot. Note how the embeddings naturally cluster the blogs, and we've color coded the clusters. Orange might represent blogs in the realm of healthcare and life-sciences. Magenta might capture blogs within the realm of robotics, etc.

<div style="text-align: center;">
<img src="img/constellation.gif" width="600" alt="Constellation">
</div>

---

## Download NVIDIA Tech Blogs

To save time and make sure we're all using data that isn't date-dependent, we've already downloaded 200 blog posts from the Wordpress API. The response from that API has been saved in the directory `data/techblogs/`.

If you'd like at a later time to download more, or more recent blog posts, you can use the cells in this section to do so.

### Imports

Here we import the required Python libraries for downloading the NVIDIA Tech Blog data.

In [None]:
import math
import os
import json
import asyncio
import httpx
import time
import shutil
from datetime import datetime

### Restart the Services

To make sure you're staring this lesson with all your services in the correct state, please restart them by running the following cell.

In [None]:
!./restart.sh

### Create Download Function

The following function will download the latest blogs from the Wordpress API.

In [None]:
POSTS_PER_PAGE = 25  # using 100 can cause HTML response to be too long so that the text gets terminated
MAX_PAGE = 8  # setting to 8 so that we get 200 pages total. Increase to download more articles

def download(session, headers, wp, data_dir):
    current_page = 1
    download_complete = False
    now = datetime.now()
    start_timecode = f"{now.year}{now.month}{now.day}{now.hour}{now.minute}{now.second}"
    padding_width = math.ceil(math.log(MAX_PAGE, 10))

    print(f"Downloading up to {MAX_PAGE * POSTS_PER_PAGE} posts...")
    while (not download_complete) and (
        current_page <= MAX_PAGE
    ):  # <= because pages are 1-indexed
        response = session.get(
            f"https://{wp}/wp-json/wp/v2/posts?page={current_page}&per_page={POSTS_PER_PAGE}",
            headers=headers,
        )
        if response.status_code == 200:
            response_json = response.json()
            with open(
                os.path.join(
                    data_dir,
                    f"{start_timecode}_{str(current_page).zfill(padding_width)}.json",
                ),
                "w",
            ) as dump_file:
                json.dump(response_json, dump_file)

            print(f"Page {current_page}. Downloaded {len(response_json)} posts")

            if len(response_json) < POSTS_PER_PAGE:
                download_complete = True
                print(
                    f"Downloaded all ({POSTS_PER_PAGE * (current_page - 1) + len(response_json)} posts)"
                )
            else:
                current_page += 1

        else:
            print(
                f"Download of page {current_page} failed with status code {response.status_code}. {response.text}"
            )
            download_complete = True

### Create Data Directory

Although we have already created it in this environment, the following is how we would create the `data/techblogs` directory to download the blog posts into.

In [None]:
data_dir = os.path.join(os.getcwd(), 'data', 'techblogs')

### Download NVDIA Blog Posts

After the course is over, if you want to try downloading more articles, uncomment the code below and run the cells again. You can also, if you wish, change the `MAX_PAGE` constant above. If you do, be sure to rerun the cell that defines it before running the cell below.

In [None]:
# Uncomment these lines below if you wanted to redownload
# shutil.rmtree(data_dir)
# os.makedirs(data_dir, exist_ok=True)
# download(
#     session=httpx.Client(),
#     headers={
#         "user-agent": "Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0",
#         "Accept-Encoding": "gzip, deflate, br",
#     },
#     wp="developer.nvidia.com/blog",
#     data_dir=data_dir,
# )

---

## Load and Organize Tech Blogs

Now let's organize all our downloaded data into a simple dictionary where the key is the URL, just for the purposes of easier access of a few examples.

In [None]:
file_list = [x for x in sorted(os.listdir(data_dir)) if '.json' in x]

techblogs_dict = {}

for i, filename in enumerate(file_list):
    with open(os.path.join(data_dir, filename), 'r') as in_file:
        data = json.load(in_file)
    for item in data:
        # skip items that do not link to developer.nvidia.com/blog or blogs.nvidia.com
        if not item['link'].startswith("https://developer.nvidia.com/blog"): # and not item['link'].startswith("https://blogs.nvidia.com"):
            continue
        document_title = item['title']['rendered']
        document_url = item['link']
        document_html = item['content']['rendered']
        document_date = item['date_gmt']
        document_date_modified = item['modified_gmt']

        techblogs_dict[document_url] = item


Here are a few of the URLs:

In [None]:
list(techblogs_dict.keys())[0:10]

---

## Explore Tech Blogs Data

Let's see what information we got back from the Wordpress API, before looking later in this notebook at chunking it.

We have the actual rendered HTML content, as well as some valuable metadata like when the article was published/modified, the author ID, etc.

In [None]:
example1 = techblogs_dict["https://developer.nvidia.com/blog/improving-cuda-initialization-times-using-cgroups-in-certain-scenarios/"]
example1

Here is a second example article we'd like to index.

In [None]:
example2 = techblogs_dict["https://developer.nvidia.com/blog/bringing-generative-ai-to-the-edge-with-nvidia-metropolis-microservices-for-jetson/"]
example2

---

## Importance of Chunking

If you haven't already, click on the links to those articles and get a sense visually of how much text there is and what style of text it is.

These articles can get fairly long! We might not want to send entire posts as context to our downstream LLM--for any given prompt to the LLM, there might be hundreds of posts that have pieces relevant to the response! Using full posts makes it harder for the LLM to find the right information and could increase costs (since API-based LLMs charge per token).

Instead, we can break these articles into chunks and index those chunks. Chunking is incredibly important for a RAG system, and chunking design can have a surprisingly large impact on RAG effectiveness.

Unfortunately, there isn't a one-size-fits-all chunking strategy that will work for every dataset and every downstream task. Some use cases need a lot of context within each chunk; others do best when many small chunks are aggregated together. You'll need to experiment to ultimately arrive at a strategy that yields the best performance. 

Some chunking strategies can be quite clever (and also expensive computationally), and look at break points where the topic of the document is changing.

For this lesson we'll go through some simpler ones via the chunking service we've prepared for you. This has a few basic strategies for chunking text and HTML based on a running counter of words--a convenient way to estimate how many chunks we can fit into a fixed-size LLM prompt. 

As a reminder, the source code for this can be found by navigating to `chunking/src`

---

## Introducing the Chunking Service through API Docs

You started the `chunking` microservice in *Lesson 00*.

In [None]:
!docker-compose logs chunking

As you can see the chunking service is available on port 5005. Execute the following cell to generate a link to open it in a new browser tab.

In [None]:
%%js
var host = window.location.host;
var url = 'http://'+host+':5005';
element.innerHTML = '<a style="color:green;" target="_blank" href='+url+'>Click to open chunking service API docs.</a>';

The chunking service is a FastAPI Python web app, running with Uvicorn. FastAPI has better performance than the more popular Flask framework and built-in asynchronous support for endpoints. This will let us chunk more than one document at a given time, to make the data upload process faster.

FastAPI also has built in auto-generated API documentation (in a Swagger/OpenAPI format). If you visit the link you just generated in your browser, you should see an API docs page. If you click and expand the endpoint `/api/chunking` you will see examples of the Request Body you'd send to the endpoint in order to use different chunking strategies. Each example has an `Example Description` explaining what is happening.

If you click "Try It out" button and then click "Execute" button, you can hit the chunking API entirely through the web browser. 

---

## Experimenting Programmatically with Chunking Service 

Using the API docs is nice for testing, but in order to send lots of requests at once, we want to hit the chunking API programmatically.

### Create API Client

We can accomplish this using the Python `httpx` library, which is very similar to the popular `requests` library but with better async support.

In [None]:
client = httpx.Client()

### Create API Request Function

Here we create a `chunk_request` function for making requests to the chunking API. Note: the chunking service we are already running for you is running at the `chunking` hostname.

In [None]:
chunking_url = "http://chunking:5005/api/chunking"

def chunk_request(client, request_body):
    chunking_resp = client.post(chunking_url, json=request_body, timeout=30)
    chunks = chunking_resp.json()
    return chunks

### Sentence-by-Sentence Chunking

First we do a standard sentence-by-sentence chunking to see what that looks like.

In [None]:
item = example1

document_title = item["title"]["rendered"]
document_url = item["link"]
document_html = item["content"]["rendered"]
document_date = item["date_gmt"]
document_date_modified = item["modified_gmt"]


chunk_request(
    client,
    {
        "strategy": "sentence",
        "input_type": "html",
        "input_str": document_html,
        "additional_metadata": {
            "document_title": document_title,
            "document_url": document_url,
            "document_date": document_date,
            "document_date_modified": document_date_modified,
        },
    },
)

These chunks are probably too small to be useful--they often refer to information outside of the chunk that's important to understanding it. 

### Increase Minimum Chunk Size

Let's increase the minimum size of the chunks (measured in number of words) to 250. The chunking service will add sentences to each chunk until we exceed the minimum word count.

But what if that border we picked breaks up a meaningful segment of text? We can let chunks overlap--making it more likely that sentences are chunked with any important context in at least one of the chunks on either side of a boundary. Let's set overlap at least 50 words. 

The chunking microservice uses the Python `spacy` package internally to count words.

In [None]:
chunks = chunk_request(
    client,
    {
        "strategy": "sentence",
        "chunk_min_words": 250,
        "chunk_overlap_words": 50,
        "input_type": "html",
        "input_str": document_html,
        "additional_metadata": {
            "document_title": document_title,
            "document_url": document_url,
            "document_date": document_date,
            "document_date_modified": document_date_modified,
        },
    },
)
chunks

In [None]:
print(len(chunks))

In [None]:
for i, chunk in enumerate(chunks):
    print(f"Chunk #{i}")
    print(f"Word Count: {sum(chunk['word_count'])}")
    # print(chunk["text"])
    print("==========")

If we inspect the word count in each chunk, we see that we don't always hit the minimum number of words we set as a parameter (250).

Check out Chunk #3 for example, which only has 21 words. Too many small chunks like this will "clog" our retrieval system with bad data. This forces us to retrieve a larger 
batch of results (top K has to be set to a larger number) and then an LLM (or person, if they're looking at the LLM's sources) would need to sift through too many bad matches. 

Even if a small chunk is not helpful, it can still come up in a search if the document title or other metadata cause a close semantic/keyword search match. Small chunks are especially prone to this problem because they don't have enough content of their own to "dilute" the effect of metadata matches.

In [None]:
chunks[3]["text"]

---

## Dealing with Code in Your Documents

This chunk is this small because the default behavior of the chunking microservice is to enforce boundaries between code and non-code sections in HTML. It will not combine a section of the article written in natural language with another section of only code. This little sentence comes right before a large code section in the article, so the chunking service ends the chunk right there.

Our use case therefore requires us to handle the presence of large code sections within our HTML. 

Lots of text documents at NVIDIA contain a mix of code and natural language, whether it's blog posts like these, SDK documentation, Git repository README markdown files, etc.

These sections of code are very different syntactically and grammatically from regular natural language text, and so an embedding model that has not been trained on code may not perform well with code present. For embedding models that are trained on code and natural language, it's also going to be important to delimit the code with the characters the embedding model was trained on. The chunking service uses triple backticks (```) to indicate a section of code.

The chunking service as written supports three strategies to deal with code. 
1. The default (`"code_behavior": "enforce_code_boundaries"`) is to enforce hard boundaries between code and non-code. This has the benefit of separation, but has the drawback that sometimes you will end up with awkward small chunks because of these boundaries.
2. The second option (`"code_behavior": "ignore_code_boundaries"`) is to just ignore the boundaries and lump code and non-code together, while still keeping the backticks as delimiters. This is a good option if your embedding model supports both code and non-code.
3. The third option (`"code_behavior": "remove_code_sections"`) is to remove the long only-code sections from the actual text that will be embedded, but store the code as metadata which can later be used. For example, the code can be supplied to an LLM that is generating a response based on the retrieval results it found by matching on the accompanying natural language.

---

## Remove Only Code Sections

Let's try option 3, since for this lesson we will be using the `SentenceTransformers` embedding model [`e5-large-unsupervised`](https://huggingface.co/intfloat/e5-large-unsupervised), which was not trained on code. Additionally, this model has a maximum token limit of 512 tokens, or roughly ~380 typical words. To be on the safe side we'll use a minimum of 250 words.

*Note: `e5-large-unsupervised` should only be used for English language text. There is a multilingual version of the e5 model on HuggingFace if you're interested.*

In [None]:
chunks = chunk_request(
    client,
    {
        "strategy": "sentence",
        "code_behavior": "remove_code_sections",
        "chunk_min_words": 250,
        "chunk_overlap_words": 50,
        "input_type": "html",
        "input_str": document_html,
        "additional_metadata": {
            "document_title": document_title,
            "document_url": document_url,
            "document_date": document_date,
            "document_date_modified": document_date_modified,
        },
    },
)

print(len(chunks))

In [None]:
for i, chunk in enumerate(chunks):
    print(f"Chunk #{i}")
    # note that word_count is a list which contains word components of all text components, including ones that are
    # only code. These get removed from the final text though
    print(f"Word Count: {sum(wc for wc, only_code in zip(chunk['word_count'], chunk['only_code']) if not only_code)}")
    print(chunk["text"])
    print("==========")

This is looking pretty good! Let's try our second example.

In [None]:
item = example2

document_title = item["title"]["rendered"]
document_url = item["link"]
document_html = item["content"]["rendered"]
document_date = item["date_gmt"]
document_date_modified = item["modified_gmt"]


chunks = chunk_request(
    client,
    {
        "strategy": "sentence",
        "code_behavior": "remove_code_sections",
        "chunk_min_words": 250,
        "chunk_overlap_words": 50,
        "input_type": "html",
        "input_str": document_html,
        "additional_metadata": {
            "document_title": document_title,
            "document_url": document_url,
            "document_date": document_date,
            "document_date_modified": document_date_modified,
        },
    },
)
for i, chunk in enumerate(chunks):
    print(f"Chunk #{i}")
    # note that word_count is a list which contains word components of all text components, including ones that are
    # only code. These get removed from the final text though
    print(f"Word Count: {sum(wc for wc, only_code in zip(chunk['word_count'], chunk['only_code']) if not only_code)}")
    print(chunk["text"])
    print("==========")

---

## Handling Heading Sections

There are many further tweaks we could consider, beyond just changing the minimum words per chunk or number of overlap words. 

For example, we could choose to chunk by adding paragraphs instead of sentences until we hit the minimum number of words. Perhaps it doesn't make sense to break apart paragraphs, since the author saw them as one logical unit. 

Let's additionally consider the implicit structure provided by the headings. We could chunk only off the headings, though this would likely result in a similar issue of awkward small chunks for heading sections that are irregular sizes (especially if we taking code sections out of the embeddable text). 

The chunking microservice lets us preserve which heading section a sentence/paragraph came from and insert that into the chunk text, helping the embedding model understand longer-range context for the chunk. This way, if a chunk starts in the middle of a particular section, it can still be interpreted alongside the title of that section. We just change `strategy` to `heading_section_sentence`

In [None]:
item = example2

document_title = item["title"]["rendered"]
document_url = item["link"]
document_html = item["content"]["rendered"]
document_date = item["date_gmt"]
document_date_modified = item["modified_gmt"]


chunks = chunk_request(
    client,
    {
        "strategy": "heading_section_sentence",
        "code_behavior": "remove_code_sections",
        "chunk_min_words": 250,
        "chunk_overlap_words": 50,
        "input_type": "html",
        "input_str": document_html,
        "additional_metadata": {
            "document_title": document_title,
            "document_url": document_url,
            "document_date": document_date,
            "document_date_modified": document_date_modified,
        },
    },
)
for i, chunk in enumerate(chunks):
    print(f"Chunk #{i}")
    # note that word_count is a list which contains word components of all text components, including ones that are
    # only code. These get removed from the final text though
    print(f"Word Count: {sum(wc for wc, only_code in zip(chunk['word_count'], chunk['only_code']) if not only_code)}")
    print(chunk["text"])
    print("==========")

---

## Unstructured vs. Structured Text

Chunking structured text like HTML articles is different than chunking unstructured text like the transcript of a video. With structured text, we can take advantage of the implicit structure provided by the original authors. With unstructured text, we have to find other methods.

For structured text, we could use a recursive strategy that looks at the largest logical chunks of the HTML articles first, breaking HTML by the largest heading sections (since heading levels suggest hierarchy). Then if necessary, break into smaller heading sections (for example h3 headings nested under h2 headings). And then progressively break into paragraphs and so on.

Similarly, with unstructured text, we might assume that e.g. a presentation is comprised of topics. We could then embed small chunks, combining adjacent ones with close enough embeddings (i.e. ones likely on the same topic) into larger chunks.

After this course is complete, you'll be well positioned to explore those techniques!

---

## Another Chunking Strategy: Summarization

When we started exploring the data, we were considering potential use cases. The chunking strategy we've selected so far gives us direct access to the contents of the articles, and it's well-suited for a question-answering (QA) task where we need to extract fine-grained details from a document to answer a user's question.

A good semantic search system enables an array of use cases, including a more general version of classic search: finding which articles are available on a given topic, without needing to know the relevant keywords.

For this kind of asset discovery task, the system doesn't need to know all the detailed information in each article--the kind that we were preserving in chunks--so instead we could summarize each article and store that summary as a chunk.

Plus, as long as we keep the original text in our database, we could simultaneously support full-article retrieval and question-answering.

To accomplish this we will do the following:
- First use the chunking service to split articles by heading and remove code sections
- Next, concatenate the non-code sections and send to an LLM for summarization
- Finally, concatenate all text (including the code sections) and store that as additional metadata.

---

## Split by Heading and Remove Code Sections

In [None]:
item = example1

document_title = item["title"]["rendered"]
document_url = item["link"]
document_html = item["content"]["rendered"]
document_date = item["date_gmt"]
document_date_modified = item["modified_gmt"]


chunks = chunk_request(
    client,
    {
        "strategy": "heading_section",
        "code_behavior": "remove_code_sections",
        "input_type": "html",
        "input_str": document_html,
        "additional_metadata": {
            "document_title": document_title,
            "document_url": document_url,
            "document_date": document_date,
            "document_date_modified": document_date_modified,
        },
    },
)
for i, chunk in enumerate(chunks):
    print(f"Chunk #{i}")
    # note that word_count is a list which contains word components of all text components, including ones that are
    # only code. These get removed from the final text though
    print(
        f"Word Count: {sum(wc for wc, only_code in zip(chunk['word_count'], chunk['only_code']) if not only_code)}"
    )
    print(chunk["text"])
    print("==========")

---

## Concatenate Non-code Sections

In [None]:
clean_text_no_code = "\n".join([x["text"] for x in chunks])
print(clean_text_no_code)

---

## Concatenate Code and Non-code Sections

In [None]:
clean_text_with_code = "\n".join([ x["heading_section_title"][0] + "\n" + "\n".join(x["text_components"]) for x in chunks])
print(clean_text_with_code)

---

## Use LLM to Summarize Blog Posts (Non-code Sections)

In order to summarize, we'll rely on making calls to an LLM.

### NeMo Inference Microservice Mixtral 8x7B

 As a default for our LLM we will use a local instance of Mistral's Mixtral 8x7B instruct model served via NIM. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to accelerate deployment of generative AI across your enterprise. This versatile runtime supports a broad spectrum of AI models—from open-source community models to NVIDIA AI Foundation models, as well as custom AI models. Leveraging industry standard APIs, developers can quickly build enterprise-grade AI applications with just a few lines of code. Built on the robust foundations including inference engines like Triton Inference Server, TensorRT, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing at scale, ensuring that you can deploy AI applications anywhere with confidence. Whether on-premises or in the cloud, NIM is the fastest way to achieve accelerated generative AI inference at scale. 

To get started, users can experience the accelerated generative AI models on the API catalog. When ready to deploy, enterprises can export models with NVIDIA NIM which is included with the NVIDIA AI Enterprise license, and run anywhere, giving them ownership to their customizations and full control of their IP and AI application.     

Here we import a `ChatOpenAI` instance of our local NIM Mixtral 8x7B model configured and ready for use with LangChain from an [`llms` helper file](llms.py).

In [None]:
from llms import llms

In [None]:
llm = llms.nim_mixtral_llm

### Optional Remote LLMs

Optionally, instead of using our local model, you can also use either NVIDIA AI Foundation's Mixtral 8x7B model or OpenAI's gpt-3.5-turbo.

For either of these 2 options you'll need an API key. For more details about NVIDIA AI Foundation and obtaining a free API key, see [the notebook *NVIDIA AI Foundation.ipynb*](./NVIDIA%20AI%20Foundation.ipynb).

After obtaining an appropriate API key, uncomment the appropriate cell below, add your API key, and run the cell to set `llm` to the remote LLM you chose to work with.

#### NVIDIA AI Foundation Mixtral 8x7B

In [None]:
# from llms import set_api_key
# set_api_key('NVIDIA_API_KEY', '<your_nvidia_api_key>')
# llm = llms.nvai_mixtral_llm

#### OpenAI GPT-3

In [None]:
# from llms import set_api_key
# set_api_key('OPENAI_API_KEY', '<your_openai_api_key>')
# llm = llms.openai_gpt3_llm

---

## Test Model

Here we try out whichever model you've chosen to work with with a simple prompt, using LangChain.

In [None]:
from langchain.schema.messages import (
    AIMessage,
    HumanMessage,
)

In [None]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_messages(
    [("user", "{user_input}")]
)

messages = template.format_messages(
    user_input="Tell me a story about fishing."
)

generation: AIMessage = llm.invoke(messages)

In [None]:
print(generation.content)

In [None]:
# rough word count
len(generation.content.split())

---

## Use LLM to Summarize Blog Post

In [None]:
template = ChatPromptTemplate.from_messages(
    [("user", "Summarize the following article in 200 words or less:\n{user_input}")]
)

messages = template.format_messages(
    user_input=clean_text_no_code
)

generation: AIMessage = llm.invoke(messages)

print(generation.content)

In [None]:
# rough word count
len(generation.content.split())

---

## Recap

We have made it through Lesson 01!

To recap, we've got two different chunking strategies to use for preparing data for our search system. One is chunking the actual text of the articles using a running word count, while the other is summarizing the article and using that summary as the chunk. Each of these strategies is suited to its own task: raw text chunking for question-answering, and full-article summarization for asset discovery. 

In Lesson 02, we're going to launch our database container and start importing and searching through this data.