# **Connecting ChatGPT with Your Own Data using LlamaIndex [<sup>source<sup>](https://levelup.gitconnected.com/connecting-chatgpt-with-your-own-data-using-llamaindex-663844c06653)**

In [None]:
# Requeriments
# project that provides a central interface to connect your LLM’s with external data
! pip install llama_index

# framework for developing applications powered by language models
! pip install langchain


! pip install pypdf


Set the `OPENAI_API_KEY ` 

In [7]:
import os 
from  dotenv import load_dotenv

# load the variables located in the file .env
load_dotenv()

True

## Indexing the Documents

In [8]:
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex,\
    LLMPredictor, PromptHelper

from langchain.chat_models import ChatOpenAI 


# GPTVectorStoreIndex = VectorStoreIndex


def index_documents(folder):
    """ index your documents so that ChatGPT can use it to answer your questions."""
    
    max_input_size = 4096
    num_outputs = 512
    max_chunk_overlap = 20
    chunk_size_limit = 600

    prompt_helper = PromptHelper(max_input_size,
                                 num_outputs,
                                 max_chunk_overlap,
                                 chunk_size_limit=chunk_size_limit)

    llm_predictor = LLMPredictor(
        llm=ChatOpenAI(temperature=0.7,
                       model_name="gpt-3.5-turbo",
                       max_tokens=num_outputs)
    )

    documents = SimpleDirectoryReader(folder).load_data()

    index = GPTVectorStoreIndex.from_documents(
        documents,
        llm_predictor=llm_predictor,
        prompt_helper=prompt_helper)

    index.storage_context.persist(persist_dir="resources/")
    
index_documents('resources\\ingest\\')



* The `PromptHelper` class helps us fill in the prompt, split the text,and fill in context information according to necessary token limitations.

* You create an instance of the `LLMPredictor` class, which is a wrapper around an **LLMChain** from `Langchain`. You will make use of the LLM from OpenAI’s “gpt-3.5-turbo” model.

* The `SimpleDirectoryReader` class loads the documents from the documents folder, which you will use to perform the indexing.

* The GPTVectorStoreIndex’s `from_documents()` function performs the indexing using the documents that you have placed in the training documents folder.

* Once the indexing is done, you persist it to storage using the storage_context.persist() function. 

* Persisting the index allows you to query the LLM at a later time without spending time performing the indexing again. By default, the index is saved to a file named vector_store.json.

## Asking Questions

In [None]:
from llama_index import StorageContext, load_index_from_storage

def my_chatGPT_bot(input_text):
    
    # load the index from vector_store.json
    storage_context = StorageContext.from_defaults(persist_dir="resources/")
    index = load_index_from_storage(storage_context)

    # create a query engine to ask question
    query_engine = index.as_query_engine()
    response = query_engine.query(input_text)
    return response.response

my_chatGPT_bot('What is the population actually in Singapore?')