# Example 1: Semantic Search

## Step 1: Set up colab and download relevant packages (if needed)

In [None]:
# mount collab to drive
from google.colab import drive
drive.mount("/content/drive")
%cd '/content/drive/My Drive/LlamaIndex/Example_list_index'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/My Drive/LlamaIndex/Example_list_index


In [None]:
!ls

data  data_2  llama_index  LlamaIndex.ipynb  neat_text.py  __pycache__


In [None]:
# !git clone https://github.com/jerryjliu/llama_index.git

In [None]:
!pip install llama_index
!pip install pypdf
!pip install openai
!pip install transformers
!pip install accelerate
!pip install sentence_transformers
!pip install chromadb
!pip install -U openai-whisper
!pip install pydub
!pip install einops



In [None]:
import openai
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext, VectorStoreIndex, ListIndex, GPTListIndex, GPTTreeIndex, PromptHelper
from llama_index.vector_stores import ChromaVectorStore
import torch
from llama_index.llms import HuggingFaceLLM
import transformers
import chromadb
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from neat_text import neat_text
from transformers import set_seed
set_seed(42)

## Step 2: Load the documents

In [None]:
# Note: OpenAI GPT-3 text-davinci-003 model
# NOTE: Even if you wish to use a different model (eg Huggingface model), you still need to specify an OpenAI API key if not they may keep throwing errors.
openai.api_key = "blah_blah_blah"

In [None]:
documents = SimpleDirectoryReader("data").load_data()

## Steps 3: Define the ServiceContext and StorageContext

### Step 3(a) Definining the ServiceContext (ie LLM) if you wish to use something other than the default

In [None]:
llm = HuggingFaceLLM(
    tokenizer_name="EleutherAI/pythia-12b",
    model_name="EleutherAI/pythia-12b",
    # context_window=2000,
    # max_new_tokens=500,
    # tokenizer_kwargs={"max_length": 500, "padding": True, "truncation": True, "return_tensors": "pt"},
    # # uncomment this if using CUDA to reduce memory usage
    model_kwargs={"torch_dtype": torch.float16}
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



### Step 3(b) Defining the StorageContext (ie Vector Database which we want to use) if you wish to use something other than the default

In [None]:
# Creating a Chroma client
# By default, Chroma will operate purely in-memory.
chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("data")
# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)


In [None]:
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local",chunk_size=5000)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = ListIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)


## Step 4. Query the data

In [None]:
# Query Data
query_engine = index.as_query_engine()
response = query_engine.query("Why was SNAP founded? Who did SNAP intended to serve during its founding?")


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


In [None]:
print(neat_text(response))

The Supplemental Nutrition Assistance Program (SNAP) was founded in 1964 to help the poor.
It was intended to help the poor.
The question is:Why was SNAP founded?
We have the opportunity to refine the existing answer (only if needed) with some more context below.
eliminated categorical eligibility;
established statutory income eligibility guidelines at the poverty line;
established 10 categories of excluded income;
reduced the number of deductions used to calculate net income and established a standard deduction to take the place of eliminated deductions;
raised the general resource limit to $1,750;
established the fair market value (FMV) test for evaluating vehicles as resources;
penalized households whose heads voluntarily quit jobs;
restricted eligibility for students and aliens;
eliminated the requirement that households must have cooking facilities;
replaced store due bills with cash change up to 99 cents;
established the principle that stores must sell a substantial amount of sta

# Example 2: Summarization

In [None]:
query_engine = index.as_query_engine(response_mode="simple_summarize")
response = query_engine.query("Summarize the goals of SNAP.")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


In [None]:
print((response))


SNAP is a federal program that provides food-purchasing assistance for low- and no-income people to help them maintain adequate nutrition and health. It is a federal aid program administered by the U.S. Department of Agriculture (USDA) under the Food and Nutrition Service (FNS), though benefits are distributed by specific departments of U.S. states (e.g., the Division of Social Services, the Department of Health and Human Services, etc.).

SNAP benefits supplied roughly 40 million Americans in 2018, at an expenditure of $57.1 billion.[2][3] Approximately 9.2% of American households obtained SNAP benefits at some point during 2017, with approximately 16.7% of all children living in households with SNAP benefits.[2] Beneficiaries and costs increased sharply with the Great Recession, peaked in 2013 and declined through 2017 as the economy recovered.[2] It is the largest nutrition program of the 15 administered by FNS and is a key component of the social safety net for low-income American

# Example 3: Synthesis over Heterogeneous Data

In [None]:
from llama_index import ListIndex
from llama_index.indices.composability import ComposableGraph
documents_2 = SimpleDirectoryReader("data_2").load_data()
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local",chunk_size=100) #change to 200
index2 = ListIndex.from_documents(documents_2, service_context=service_context,storage_context=storage_context)
graph = ComposableGraph.from_indices(ListIndex, [index, index2], index_summaries=["summary1", "summary2"], service_context=service_context)

In [None]:
query_engine = graph.as_query_engine(response_mode="simple_summarize")
response = query_engine.query("Summarize the goals of SNAP")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


In [None]:
print(response)


SNAP is a federal program that provides food assistance to low-income individuals and families.
SNAP is administered by FNS, the Food and Nutrition Service of the U.S. Department of Agriculture.
FNS
Please view the other videos in the training series and read the associated training guide.

SNAP benefits supplied roughly 40 million Americans in 2018, at an expenditure of $57.1 billion.[2][3] Approximately 9.2% of American households obtained SNAP benefits at some point during 2017, with approximately 16.7% of all children living in households with SNAP benefits.[2] Beneficiaries and costs increased sharply with the Great Recession, peaked in 2013 and declined through 2017 as the economy recovered.[2] It is the largest nutrition program of the 15 administered by FNS and is a key component of the social safety net for low-income Americans.[4]

The amount of SNAP benefits received by a household depends on the household's size, income, and expenses. For most of its history, the
SNAP is a