# LLama index

In this tutorial, we shall use `Llama index` as a framework to create a retrieval based QA.  `Llama index` is a very nice framework that provides an unified interface with all the large language models.

The idea of retrieval based QA is simple, i.e., it would create a similarity based tables ahead called `index`.  During query, it would search the most relevant documents to answer the question.

In [1]:
#comment this if you are using puffer or tokyo
import os
os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

In [2]:
#better print these versions, since they are very unstable currently....
from watermark import watermark
print(watermark(packages="torch,langchain,transformers,llama_index"))

torch       : 1.13.0
langchain   : 0.0.171
transformers: 4.21.3
llama_index : 0.6.7



In [3]:
from transformers.utils import logging
logging.set_verbosity(40)

## 1. Define model

First, we shall load the large language model that will be use for our question answering model.  We shall use `flan-t5-large` which is quite good.

The goal of LlamaIndex is to provide a toolkit of data structures that can organize external information in a manner that is easily compatible with the prompt limitations of an LLM. Therefore LLMs are always used to construct the final answer. Depending on the type of index being used, LLMs may also be used during index construction, insertion, and query traversal.

`LlamaIndex` uses` Langchai`n's LLM and `LLMChain` module to define the underlying abstraction. We introduce a wrapper class, `LLMPredictor`, for integration into `LlamaIndex`.

It also introduce a `PromptHelper` class, to allow the user to explicitly set certain constraint parameters, such as maximum input size (default is 4096 for davinci models), number of generated output tokens, maximum chunk overlap, and more.

By default, it use OpenAI's `text-davinci-003 model`. But you may choose to customize the underlying LLM being used.

In [4]:
from llama_index import LLMPredictor
from langchain.llms.base import LLM
from transformers import pipeline
import torch

In [6]:
from typing import Mapping, Optional, Any, List
from langchain.callbacks.manager import CallbackManagerForLLMRun

class CustomLLM(LLM):
    
    n: int
    model_name = "google/flan-t5-large"
    pipeline = pipeline("text2text-generation", model=model_name, device=1, model_kwargs={"torch_dtype":torch.bfloat16})
        
    @property
    def _llm_type(self) -> str:
        return "custom"
    
    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        
        out = self.pipeline(prompt, max_length=9999)[0]["generated_text"]
        return out[:self.n]
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"n": self.n}

In [7]:
llm_predictor = LLMPredictor(llm=CustomLLM(n=500))

## 2. Define data

Here, I would like to define a `SimpleDirectoryReader` to read through my folders and convert into llama based documents

In [8]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('txt').load_data()

In [9]:
# documents

## 3. Embeddings

Next, we shall define the embeddings to embed our documents.  There are many but here we just use `HuggingFaceEmbeddings`.

Note: there are many good embeddings.....please try for example:  `hkunlp/instructor-base`

In [10]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding

hfemb = HuggingFaceEmbeddings()
embed_model = LangchainEmbedding(hfemb)

#note that huggingface embeddings use sentence transformers

## 4. Create index

Next, we shall combine everything into something called `ServiceContext` that specifies the language model and the embedding.

Here index basically a vector stores containing all the similarity scores.

PS: Please note that there are many types of index, such as tree and graph....I have not really fully understand different types and their advantages and limitations.  Please let me know if you figure it out!

In [11]:
from llama_index import ServiceContext, PromptHelper

max_input_size = 4096 #max input size
num_output     = 256 #number of output tokens
max_chunk_overlap = 20 #chunk overlap
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model, prompt_helper=prompt_helper)

In [12]:
from llama_index import GPTVectorStoreIndex

index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

## 5. Save and load

For production, we have to save to disk.

By default, data is stored in-memory. To persist to disk:

In [13]:
index.storage_context.persist(persist_dir = 'Store')

Let's see how to load

In [14]:
from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir = 'Store')
# load index
index = load_index_from_storage(storage_context, service_context=service_context)

# Query time!  

Last, we can query accordingly.  Yay!

In [15]:
query_engine = index.as_query_engine()

In [16]:
response = query_engine.query("What is AIT")
print(response)

The Asian Institute of Technology


In [17]:
response = query_engine.query("What is AIT's vision")
print(response)

Transforming AIT to be a respectable international graduate institution whose research and education contribute to the development of Asia


In [18]:
response = query_engine.query("When was AIT found?")
print(response)

1959


In [19]:
response = query_engine.query("Where is AIT located at?")
print(response)

Bangkok, Thailand


In [20]:
response = query_engine.query("Hello, I want a coffee!")  #try irrelevant question
print(response)

None


In [21]:
response = query_engine.query("Is machine learning taught in AIT?")
print(response)

yes


In [22]:
response = query_engine.query("Is medicine taught in AIT?")
print(response)

no


In [23]:
response = query_engine.query("What is the preferred background to study in DSAI")
print(response) 

Computer Science/Computer Engineering/ICT


In [24]:
response = query_engine.query("What are the required courses in DSAI")
print(response) 

Data Modeling and Management Machine Learning Business Intelligence and Analytics Computer Programming for Data Science and Artificial Intelligence Artificial Intelligence: Natural Language Understanding Elective courses Artificial Intelligence: Knowledge Representation and Reasoning
