In [84]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

## Previous name: GPT Index

## Home Page: Pitch
* Unleash the power of LLMs over your data
    * Data Ingestion
        * Unstructured data: PDF, Text, Video, Images, etc.
        * Strucured data: Excel, SQL, etc.
        * Semi-strucured data: API's Slack, Salesforce, Notion, etc. 
    * Data Indexing
        * Store (save)
        * Index (find)
        * Integrate with vector stores and databases 
    * Query Interface
        * Accepts any input prompt over your data
        * Returns a knowledge-augmented response

## Home Page: Use Cases
* Document QA
* Data Augmented Chatbots
* Knowledge Agents
* Structured Analytics

## Home Page: Products
* LlamaIndex (Python)
* LlamaIndex.TS (Typescript version)
* LlamaHub
    * Llama Packs
    * Data Loaders
    * Agent tools 
* SEC Insights: end to end app
* create-llama: CLI tool to install llamaindex from terminal

## Last features
* [RAGs](https://github.com/run-llama/rags):
    * Build, customize, and use multiple ChatGPTs over your data, all with natural language.
    * RAGs is a Streamlit app that lets you create a RAG pipeline from a data source using natural language.
* [LLama Packs](https://llamahub.ai/). Interesting llama packs:
    * Resume screener
    * Gmail OpenAI agent
    * Deeplake multimodal retrieval
    * Sub_question Webiate

## Documentation: structure
* Getting started
* Use cases
* Understanding LLamaIndex
    * Tutorial series 
* Optimizing
    * When you already have LlamaIndex app working and are looking to further refine it.
    * List of first things you should try: embedding model, chunk size, customizations, etc.
    * Fine tuning your model.
* Module guides
    * Guides to the individual components of LlamaIndex

## Documentation: Starter Tutorial

#### Load Private Document

#### Create Vector Database (LlamaIndex call them "indexes")

#### QA over private document

#### Save the vector database in your computer

By default, this will save the data to the directory storage, but you can change that by passing a `persist_dir` parameter.

## Documentation: High-Level Concepts

#### RAG
* Your data is loaded
* Your data is indexed: prepared for queries
* When you ask a question, LlamaIndex gets the most relevant data from the vector database and passes your question and this most relevant data (called "the context") to the LLM so the LLM can redact a conversational answer.

#### Stages within RAG
1. Loading
2. Indexing: convert data into embeddings and metadata
3. Storing: store your embeddings and metadata
4. Querying
    * sub-queries
    * multi-step queries
    * hybrid strategies
5. Evaluation: checking how your accurate, faithful and fast responses to queries are

#### Important concepts within some of the previous stages
1. Loading
    * Document: data format (PDF, API, etc).
    * Node: data chunk with metadata.
    * Connector or Reader: connects with data sources.
2. Indexing
    * Indexing: transformation and storage of data into embeddings with metadata in vector databases.
    * Embeddings: numerical representation of data.
4. Querying
    * Retrievers: how to retrieve relevant context from an index when given a query. The retrieval strategy is key to the performance of the app.
    * Routers: determines which retriever will be used based on the reriever's metadata and the query.
    * Node postprocessors: applies transformations, filtering and re-ranking logic to nodes.
    * Response synthesizers: given a query and a set of retrieved text chunks, it generates the conversational response from an LLM.

#### Naming of the 3 main use cases
* Query Engines: ask questions about your data.
* Chat Engines: have a conversation with your data.
* Agents: automated decision maker.

## Documentation: Customization Tutorial

#### Starting point: basic RAG

In [71]:
#%pip install llama-index-vector-stores-chroma

#### Parse the document into smaller chunks

In [72]:
# from llma_index  import ServiceContext

# service_context = ServiceContext.from_defaults(chunk_size=1000)

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)
Settings.num_output = 512
Settings.context_window = 3900





In [73]:
# index = VectorStoreIndex.from_documents(
#     documents, 
#     service_context=service_context
# )

#### Use a different vector database

In [74]:
# import chromadb
# from llama_index.vector_stores import ChromaVectorStore
# from llama_index import StorageContext

# chroma_client = chromadb.PersistentClient()
# chroma_collection = chroma_client.create_collection("quickstart")
# vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# storage_context = StorageContext.from_defaults(
#     vector_store=vector_store
# )


# loading in chromadb 
import chromadb

from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext


documents = SimpleDirectoryReader("data").load_data()

db = chromadb.PersistentClient(path="./chroma_db1")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)


index = VectorStoreIndex.from_documents(
  documents, storage_context=storage_context, embed_model=Settings.embed_model)





In [86]:
query_engine = index.as_query_engine(similarity_top_k=5)

In [87]:
query_engine = index.as_query_engine()
response = query_engine.query("summary of article?")
response

Response(response='The article discusses the importance of creating something that people want and not focusing solely on making money, highlighting the success of companies like Craigslist that prioritize user satisfaction over maximizing profits. It also shares a story about a startup, Octopart, facing challenges from a larger distributor due to their ethical approach to providing a valuable service. The article emphasizes the benefits of being benevolent in business, attracting talented individuals and support from various stakeholders by prioritizing doing good over profit.', source_nodes=[NodeWithScore(node=TextNode(id_='b97fc4ec-a34c-4d6f-86be-e3ec5c25a94f', embedding=None, metadata={'file_path': '/Users/myhome/Downloads/2.LlamaIndex Tutorials/data/be-good.txt', 'file_name': 'be-good.txt', 'file_type': 'text/plain', 'file_size': 16710, 'creation_date': '2025-01-19', 'last_modified_date': '2025-01-19'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation

#### Use a different response mode

In [81]:
query_engine = index.as_query_engine(response_mode="tree_summarize")

#### Stream the response back

In [88]:
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("In less than 100 words, what is the meaning of good according to the author?")
response.print_response_stream()

The author defines "good" as a strategic approach that serves as a compass for decision-making. It involves prioritizing the best interests of users, leading to exponential growth and success. Being "good" is highlighted as a valuable strategy in complex situations due to its stateless nature, akin to telling the truth. The concept of "good" is not presented in a sanctimonious manner but rather as a practical guide for making choices, forming strategies, and even designing software.

#### Use a chatbot instead of a QA

In [90]:
query_engine = index.as_chat_engine()
response = query_engine.chat("In less than 100 words, what is the meaning of bad according to the author?")
print(response)



The author views being bad as deviating from the traditional notion of goodness, which is often linked to being quiet. They question the concept of goodness and recall being labeled as bad in their youth, indicating that being bad involves not conforming to conventional standards of good behavior.


In [91]:
response = query_engine.chat("Oh interesting, tell me more.")
print(response)

The author reflects on their childhood perception of being bad and how it contrasted with the idea of being good, which was linked to being quiet. They express skepticism towards the concept of goodness and note that their reputation is not primarily associated with being good, but rather with meaning well.


## Documentation: The LlamaIndex Video Series
* Build a document chatbot from scratch
* Sub-questions
* Manage documents from a source that is constantly updating like Discord
* Combining SQL and Semantic Search

## Documentation: Use Cases
* QA
* Chatbots
* Agents
* Structured Data Extraction
* Multi-modal

## Documentation: Understanding (LI vs LC)
* Using LLMs
    * Different way of loading OpenAIEmbeddings than LC
    * Similar approach to Prompt templates 
* Loading
    * Very interesting: multi-purpose loader
    * Splitter, chunk_size, chunk_overlap
    * Creating chunks (nodes) manually
    * Adding metadata to document (copied to nodes)
    * Loading connectors from LLamaHub
* Indexing
    * Index types:
        * Vector store index
            * Nodes and embeddings
            * Semantic search
            * Top K Retrieval
        * Summary index
            * If you want to summarize the document 
        * Knowledge graph index
            * If your data is a set of disconnected concepts (a "graph") 
* Storing
    * by default, indexed data is stored only in memory
    * creating embeddings is expensive
    * store to avoid the time and cost of re-indexing
    * save: .persist()
    * load persisted index: load_index_from_storage()
* Querying
    * the most significant part of an LLM App
    * stages: retrieval, postprocessing, response synthesis.
    * customizing the stages of querying.
* Putting it all together
    * advanced techniques
    * how to build a full-stack app
        * React + Flask API
* Observability: tracing and debugging.
    * Logging
    * Callbacks to help debug
    * One-click observability with eval tools offered by partners (W&B, etc)
* Evaluation.
    * Response evaluation
    * Retrieval evaluation
    * Analizing the cost of your app
        * MockLLM to predict token usage
        * MockEmbedding

## Documentation: Optimizing
* Advanced prompt techniques
* Prompt engineering for RAG
* Advanced retrieval strategies
* Agentic strategies
    * OpenAI Agent
* Evaluation
* Fine-tuning
* Building performant RAG apps for production
    * General techniques
        * decoupling retrieval chunks vs syntesis chunks
        * structured retrieval for large document sets
        * dynamically retrieve chunks
        * optimize context embeddings
    * Long list of specific techniques
* Building RAG from scratch (lower-level)  