# Part 1: Basic RAG with LlamaIndex
<ul>
  <li><a target="_blank" href="https://docs.llamaindex.ai">LlamaIndex Docs</a> </li>
  <li><a target="_blank" href="https://llamahub.ai/">Llama Hub: third party integgrations with LlamaIndex</a> </li>
</ul>

In [30]:
from llama_index.core import Document, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
import csv
import logging as log
import os

In [6]:
log.basicConfig(level=log.INFO, format='%(asctime)s [%(levelname)5s] %(message)s',datefmt='%H:%M:%S')

In [7]:
GoogleApiEnvVar= 'GOOGLE_API_KEY'
assert GoogleApiEnvVar in os.environ

In [12]:
TaiDatasetRootEnvVar='TAI_DATASET_ROOT'
assert TaiDatasetRootEnvVar in os.environ
TaiDatasetRoot = os.environ[TaiDatasetRootEnvVar]

In [13]:
datasetFile= os.path.join(TaiDatasetRoot, 'rag_ai_tutor', 'mini-llama-articles.txt')

## Load the dataset
### Load the dataset using the csv package

In [17]:
with open(datasetFile, mode='r', encoding='utf-8') as csvFile:
    csvReader = csv.reader(csvFile, delimiter=',')
    next(csvReader) # Skip the first line
    rows= [ row for row in csvReader]                

In [19]:
print(f'Number of rows:{len(rows)}')
for index,row in enumerate(rows):
    print(f'{index}: Num fields:{len(row)}')

Number of rows:14
0: Num fields:4
1: Num fields:4
2: Num fields:4
3: Num fields:4
4: Num fields:4
5: Num fields:4
6: Num fields:4
7: Num fields:4
8: Num fields:4
9: Num fields:4
10: Num fields:4
11: Num fields:4
12: Num fields:4
13: Num fields:4


### Creating Indexes (Embeddings)

In [23]:
documents = [Document( text=row[1], metadata={"url": row[2]}) for row in rows]

Adapted from <a target="" href="https://docs.llamaindex.ai/en/stable/#getting-started">LlamaIndex Getting started</a>

In [32]:
index = VectorStoreIndex.from_documents(documents,
                                       embed_model=OpenAIEmbedding(model='text-embedding-3-small'),
                                       transformations=[SentenceSplitter(chunk_size=768, chunk_overlap=64)],
                                       show_progress=True)

Parsing nodes:   0%|          | 0/14 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/60 [00:00<?, ?it/s]

23:11:08 [ INFO] HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [34]:
queryIndex = index.as_query_engine()

In [35]:
response = queryIndex.query("How many parameters has Llama 2?")
print(response)

23:11:25 [ INFO] HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
23:11:27 [ INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Llama 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters.
