# INDEXATION AND INGESTION FIRST EXAMPLE

For this example we are using a PDF file about Formula 1 cars (INTRODUCTION_TO_FORMULA_1_CAR.pdf). You can find this file inside the data folder of this learning pod's section -> lp-llm/02-indexation-and-ingestion/data.

> Note that you'll need to add this file to your data folder if you want to run this notebook.

## INSTALLATION AND SETUP

Firs step is installing LlamaIndex using pip.

In [9]:
%pip install llama-index



LlamaIndex uses OpenAI's models by default (both for embeddings and LLM), so you will need an API key to use it.

However, LlamaIndex is flexible, and you can use other types of embeddings, such as Sentence Transformers or custom embeddings, by configuring it accordingly. We'll explore these other options in later examples.

In [10]:
import getpass
import os
os.environ['OPENAI_API_KEY'] = getpass.getpass("Open API Key:")

Open API Key:··········


## FIRST LLAMAINDEX ACTIONS

To start using LlamaIndex, you'll first need to create a VectorStoreIndex object. This object will store the documents you want to search through. You can create a VectorStoreIndex from a list of documents. In this example, we’ll use SimpleDirectoryReader, which loads any file it finds in a specified directory. There are many other ways to ingest data from various sources, such as APIs and databases. We’ll start with this simple directory example and explore additional methods in the future.

In [12]:
from llama_index.core import (
    VectorStoreIndex,
    Settings,
    StorageContext,
    load_index_from_storage,
    SimpleDirectoryReader
)

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True)
index.storage_context.persist(persist_dir="./storage")

Parsing nodes:   0%|          | 0/33 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/32 [00:00<?, ?it/s]

Note that we are asking to persist the index to disk, so that you can load it later without having to re-index your documents. VectorStoreIndex has a persist method that allows you to save the index to disk.

Now, if we want to load the index later in the project we can do it from the persist_dir created, using the load_index_from_storage function.

The next step is to create a Query Engine and perform queries on the indexed documents, which is the core functionality of an information retrieval system.

In [13]:
query_engine = index.as_query_engine()
response = query_engine.query("What is the text about?")
print(response)

The text provides information about the fuel injection system and turbocharger used in Formula 1 cars. It explains the importance of direct fuel injection for fuel efficiency and power delivery, as well as the function of a turbocharger in increasing engine power by utilizing exhaust gas energy.


In [14]:
response2 = query_engine.query("Tell me the different parts of the text and summary the various topics for each part. Do it in a list")
print(response2)

- Components Used in Formula 1 Car:
  - Engine
  - Radiator
  - Gearbox and Transmission
  - Air Intake and Airbox
  - Disc Brake
  - Car Floor (Plank)
  - Suspension
  - Tyres
  - Steering Wheel
  - Cockpit
  - Front Wings and Rear Wings
  - Oil and Coolant System
  - Fuel

- Safety Regulations for Formula 1 Cars:
  - Red light requirement at the rear of the car
  - Padding in the cockpit for head protection
  - Padding around the driver's legs
  - Removable seat for easy driver extraction
  - Wheel retention devices for wheel safety
