This notebook showcases the insert capabilities of different GPT Index data structures.

To see how to build the index during initialization, see `TestEssay.ipynb` instead.

In [None]:
# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"

## GPT List Insert

#### Data Prep
Chunk up the data into sub documents that we can insert

In [None]:
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index import SimpleDirectoryReader, Document

In [15]:
document = SimpleDirectoryReader('data').load_data()[0]
text_splitter = TokenTextSplitter(separator=" ", chunk_size=2048, chunk_overlap=20)
text_chunks = text_splitter.split_text(document.text)
doc_chunks = [Document(t) for t in text_chunks]

#### Insert into Index and Query

In [16]:
from llama_index import GPTListIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

In [17]:
# initialize blank list index
index = GPTListIndex([])

In [None]:
# insert new document chunks
for doc_chunk in doc_chunks:
    index.insert(doc_chunk)

In [19]:
# query
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

> Starting query: What did the author do growing up?


In [21]:
display(Markdown(f"<b>{response}</b>"))

<b>

The author worked on writing and programming and also took classes at Harvard and RISD. The author also worked at a company called Interleaf and wrote a book on Lisp. The author decided to write another book on Lisp and also started a company called Viaweb with the goal of putting art galleries online. The company eventually pivoted to focus on creating software to build online stores. The author also worked on making ecommerce software in the second half of the 90s.

The author also worked on building the infrastructure of the web and wrote essays that were published online. The author also worked on spam filters and bought a building in Cambridge to use as an office. The author also had a dinner party every Thursday night.

The author also worked on marketing at a Boston VC firm, writing essays, and building the infrastructure of the web. The author also started the Y Combinator program to help fund startups.</b>

## GPT Tree Insert

#### Data Prep
Chunk up the data into sub documents that we can insert

In [2]:
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index import SimpleDirectoryReader, Document

In [2]:
# NOTE: we truncate to the first 30 nodes to save on cost
document = SimpleDirectoryReader('data').load_data()[0]
text_splitter = TokenTextSplitter(separator=" ", chunk_size=256, chunk_overlap=20)
text_chunks = text_splitter.split_text(document.get_text())
doc_chunks = [Document(t) for t in text_chunks]

doc_chunks = doc_chunks[:30]

#### Insert into Index and Query

In [3]:
from llama_index import GPTTreeIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

In [4]:
# initialize blank tree index
tree_index = GPTTreeIndex([])

> Building index from nodes: 0 chunks


In [None]:
# insert new document chunks
for i, doc_chunk in enumerate(doc_chunks):
    print(f"Inserting {i}/{len(doc_chunks)}")
    tree_index.insert(doc_chunk)

In [None]:
# query
query_engine = tree_index.as_query_engine()
response_tree = query_engine.query("What did the author do growing up?")

In [9]:
display(Markdown(f"<b>{response_tree}</b>"))

<b>The author wrote stories and tried to program computers.</b>