In [1]:
#!pip install llama-index #LLM inteface LLM with your private Data 
#!pip install langchain #application building with LLM
# !pip install jinja2

In [1]:
#pip install ipywidgets

# Download  your custom Dataset
We are going to use github as our knowledge base. You can upload your own custom dataset.

In [1]:
! git clone https://github.com/danishmeh/chatbottest/

Cloning into 'chatbottest'...


# Define the functions
The following code defines the functions we need to construct the index and query it

In [2]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path): #directory patch Context_data --> Data 
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600 

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    #updated April 2023 
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

    index = GPTSimpleVectorIndex.from_documents(
    documents, service_context=service_context
    )

    index.save_to_disk('index.json') #here we are loading the data and saving the index in json Vactor map of Token  

    return index

def ask_bot():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        query = input("Ask you question ? ")
        response = index.query(query, response_mode="compact")
        display(Markdown(f"Response: <b>{response.response}</b>"))
  


# Set OpenAI API Key
You need an OPENAI API key to be able to run this code.

[signing up](https://platform.openai.com/overview). 

Then run the code below and paste your API key into the text input.

In [3]:
os.environ["OPENAI_API_KEY"] = input("Paste OpenAI API key here and enter:")

# Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.

**Notice:** running this code will cost you credits on your OpenAPI account ($0.02 for every 1,000 tokens). If you've just set up your account, the free credits that you have should be more than enough for this experiment.

In [4]:
construct_index("chatbottest/Data") #tokanising 

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 706 tokens


<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x247ea8ed190>

# Ask questions
It's time to have fun and test our AI. Run the function that queries GPT and type your question into the input. 

If you've used the provided example data for your custom knowledge base, here are a few questions that you can ask:
1. How to connect to yout techbot

In [5]:
ask_bot()

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 748 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 3 tokens


Response: <b>
Kzz01 is a coal mill fan actuator.</b>

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 799 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 5 tokens


Response: <b>and outputs of kzz01?

The inputs of kzz01 are setpoint (kspzz01) and feedback (kfpzz01). The outputs of kzz01 are 3phase breaker K01, setpoint (kspzz01) and feedback (kfpzz01).</b>

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 785 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 7 tokens


Response: <b>
The purpose of an actuator is to convert energy into motion in order to control a mechanical system. It is typically used to move or control a mechanism or system, such as opening a valve or moving a robot arm.</b>

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 775 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 4 tokens


Response: <b>
When the actuator is operated, it will move the fan blades to adjust the air flow in the coal mill. This will help to regulate the temperature and pressure in the coal mill.</b>

`**Stop words(NLP)**`
are common words that are often removed from texts before processing, 
as they are unlikely to carry much meaning or importance in natural language
processing tasks. 
`Here are some examples of common stop words in the English language`
a, an, and, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or,
such, that, the, their, then, there, these, they, this, to, was, will, with.
`why Remove stop words in NLP`
Certainly! Removing stop words from text helps to reduce noise in data, improve text analysis efficiency, and increase accuracy of natural language processing tasks like sentiment analysis or topic modeling. This is because stop words don't carry much meaning or significance in a given context and can often be misleading or ambiguous. By removing stop words, we focus on the more meaningful words in a text.
