To query the chatbot please follow these steps:
1. Install all neccessary requirements
2. Get personal huggingface token
3. Load Llaama 2 7B Chat HuggingFace model
4. Read from stored Qdrant database
5. Query the model!

Installs all necessary requirements

In [None]:
!pip install llama-index
!pip install transformers accelerate bitsandbytes
!pip install llama-index-readers-web
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface
!pip install llama-index-program-openai
!pip install llama-index-agent-openai
!pip install InstructorEmbedding
!pip install llama-index-vector-stores-qdrant qdrant_client
!pip install fastembed
!pip install transformers

Loads our source data

In [None]:
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()

Gets personal huggingface token for model access

In [2]:
from google.colab import userdata
hf_token = userdata.get('HF_TOKEN')

Loads LLama 2 7B Chat HuggingFace model

In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.core.prompts import PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    query_wrapper_prompt=PromptTemplate("<s> [INST] {query_str} [/INST] "),
    context_window=3900,
    model_kwargs={"token": hf_token, "quantization_config": quantization_config},
    tokenizer_kwargs={"token": hf_token},
    device_map="auto",
)

Index and embed source data to the model

In [None]:
from llama_index.core import Settings
from InstructorEmbedding import INSTRUCTOR

Settings.llm = llm
Settings.embed_model='local:hkunlp/instructor-large'
embeddings = Settings.embed_model(documents)
print(embeddings)

Stores embedded data to a VectorStore

In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.core.indices import SummaryIndex

vector_index = VectorStoreIndex.from_documents(documents)
summary_index = SummaryIndex.from_documents(documents)

Tests querying the model

In [None]:
import logging
import sys
from llama_index.core.response.notebook_utils import display_response

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

query_engine = vector_index.as_query_engine(response_mode="compact")
response = query_engine.query("Does Franklin have veteran tuition benefits?")

display_response(response)

Stores the indexed data to a QDrant cluster

In [None]:
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from llama_index.core import StorageContext, VectorStoreIndex


qdrant_client = QdrantClient(
    url='https://1d752ae2-4e0f-4101-ae0f-b59cd212e480.us-east4-0.gcp.cloud.qdrant.io',
    api_key="ZEUHVnqv9sKXF1gHpY3u1pBKljE26BBoOqA2bkyAXKT7nEhCdq_xWA",
)

vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name="mycollection",
    enable_hybrid=True,
    batch_size=20
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents=documents,
    storage_context=storage_context
)

Read from stored Qdrant database

In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
from qdrant_client import QdrantClient
from llama_index.core import Settings
from InstructorEmbedding import INSTRUCTOR

Settings.llm = llm
Settings.embed_model='local:hkunlp/instructor-large'


qdrant_client = QdrantClient(
    url="https://1d752ae2-4e0f-4101-ae0f-b59cd212e480.us-east4-0.gcp.cloud.qdrant.io",
    api_key="ZEUHVnqv9sKXF1gHpY3u1pBKljE26BBoOqA2bkyAXKT7nEhCdq_xWA",
)

vector_store = QdrantVectorStore(client=qdrant_client, collection_name="mycollection", enable_hybrid=True)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

In [9]:
from llama_index.core.response.notebook_utils import display_response

chat_engine = index.as_chat_engine(chat_mode="context", response_mode="compact")
prompt = str(input("Ask me a question about Franklin University!  "))
response = chat_engine.chat(prompt)

display_response(response)

Ask me a question about Franklin University!  How long does it take to complete a master's program in Computer Science at Franklin University?


**`Final Response:`** According to the information provided on the Franklin University website, the Master of Science in Computer Science program at Franklin University can be completed in as little as 12 months (1 year) of full-time study. However, the program can also be completed on a part-time basis, which can take up to 24 months (2 years) to complete.