It is best to be aware of these changes, the documentation I was initially referring to was outdated.: 
https://github.com/run-llama/llama_index/blob/main/docs/docs/changes/deprecated_terms.md

Reference Code: https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent_with_query_engine/

Document Summary: https://docs.llamaindex.ai/en/stable/examples/index_structs/doc_summary/DocSummary/

List of vector stores:https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores/

RAG agent: https://docs.llamaindex.ai/en/stable/use_cases/q_and_a/

Note: type 'exit' to stop chatting with bot.

In [1]:
pip install -r requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


## Part 1: Data Preparation

In [3]:
import pandas as pd
from pathlib import Path
from dotenv import load_dotenv
import re

# load environment variables from .env file, API key is stored in the .env file.
load_dotenv()

# Load the CSV file with the specified encoding
df = pd.read_csv('lyrics.csv', encoding='ISO-8859-1')
data_path = Path("data")
data_path.mkdir(exist_ok=True)

def sanitize_filename(name):
    return re.sub(r'[\\/*?:"<>|]', "", name)

# lyrics stored by 'Album' name
grouped = df.groupby(['album', 'track_n', 'track_title', 'artist', 'year'])
album_files = {}
for (album, track_n, track_title, artist, year), group in grouped:
    sanitized_album_name = sanitize_filename(album.replace(' ', '_'))
    album_file_path = data_path / f"{sanitized_album_name}.txt"
    if sanitized_album_name not in album_files:
        album_files[sanitized_album_name] = open(album_file_path, "w", encoding='utf-8')
    fp = album_files[sanitized_album_name]
    
    # track details and lyrics to the album file
    fp.write(
        f"Artist: {artist}\n"
        f"Album: {album}\n"
        f"Track Title: {track_title}\n"
        f"Track Number: {track_n}\n"
        f"Year: {year}\n"
        f"Lyric:\n"
    )
    for lyric in group['lyric']:
        fp.write(f"{lyric}\n")
    fp.write("\n\n")  # Add a newline between each song's lyrics

# closing the files
for fp in album_files.values():
    fp.close()

print(f"Extracted and grouped lyrics by album into text files in the '{data_path}' directory.")

# combine all album lyrics into one master file
master_file_path = data_path / "master_lyrics_by_album.txt"
with open(master_file_path, "w", encoding='utf-8') as master_fp:
    for album_file in data_path.glob("*.txt"):
        with open(album_file, "r", encoding='utf-8') as fp:
            master_fp.write(fp.read())
            master_fp.write("\n\n")  

print(f"All lyrics combined into '{master_file_path}'")


Extracted and grouped lyrics by album into text files in the 'data' directory.
All lyrics combined into 'data\master_lyrics_by_album.txt'


## Part 2: Q & A Bot (based on RAG)

In [4]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from llama_index.readers.file import FlatReader
from llama_index.embeddings.openai import OpenAIEmbedding, OpenAIEmbeddingModelType
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAgent
import os

# API key
openai_api_key = os.getenv('OPENAI_API_KEY')


# openAI Embedding Model
embedding_model = OpenAIEmbedding(
    api_key=openai_api_key, 
    model=OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002
)

# using FlatReader for reading text files
file_extractor = {".txt": FlatReader()}

# load documents from the specified directory
reader = SimpleDirectoryReader("./data", file_extractor=file_extractor)
documents = reader.load_data()

# create a Vector Store Index and save it there, we can also do this manually.
vector_index = VectorStoreIndex.from_documents(documents)
vector_index.storage_context.persist(persist_dir="./storage/text_data")

# query engine
query_engine = vector_index.as_query_engine(similarity_top_k=10)
query_engine_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="text_data",
        description="Provides creative answers based on the text data loaded."
    )
)

# Start chatting with the agent!
agent = OpenAIAgent.from_tools([query_engine_tool], model_name="gpt-4o",verbose= True)
agent.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

Added user message to memory: Hi can you make up a song based on the Red album?
=== Calling Function ===
Calling function: text_data with args: {"input":"Red album"}
Got output: Taylor Swift's "Red" album was released in 2012 and features tracks like "State of Grace," "Red," "Starlight," "Begin Again," "Holy Ground," "Sad Beautiful Tragic," "The Lucky One," "22," "I Almost Do," "All Too Well," "The Last Time," "Everything Has Changed," "Treacherous," "I Knew You Were Trouble," and "Stay Stay Stay."

Assistant: Here is a made-up song based on Taylor Swift's "Red" album:

(Verse 1)
In the state of grace, we found our way
Painting the town red, under starlight's sway
Begin again, on holy ground we stand
A sad beautiful tragic love, like grains of sand

(Chorus)
We're the lucky ones, feeling 22
I almost do, but all too well I knew
The last time we danced, everything changed
In treacherous love, trouble stayed the same

(Verse 2)
I knew y