Tut1: A Customer Service Q&A 

> Traditionally, chatbot were bult for specific user intents, formed from sets of sample questions and their corresponding answers like a matching game. 

> The advent of LLMs significantly enhance chatbot functionality by linking broader intents with documents from a Knowledge Base (KB). 


![Buiding CS Chatbot Process](images/build_cs_chatbot_process.png)

### 0. Setup

In [1]:
# We'll use SeleniumURLLoader, which relies on the unstructured and selenium Python libraries
!pip install unstructured selenium 



In [2]:
from langchain.document_loaders import SeleniumURLLoader
from langchain.text_splitter import CharacterTextSplitter 
from langchain.embeddings.openai import OpenAIEmbeddings 
from langchain.vectorstores import DeepLake  
from langchain.chat_models import ChatOpenAI 
from langchain import PromptTemplate 

### 1. `Load` (Scrape) data using SeleniumURLLoader

In [3]:
urls = ['https://beebom.com/what-is-nft-explained/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-download-gif-twitter/',
        'https://beebom.com/how-use-chatgpt-linux-terminal/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-save-instagram-story-with-music/',
        'https://beebom.com/how-install-pip-windows/',
        'https://beebom.com/how-check-disk-usage-linux/']

In [4]:
loader = SeleniumURLLoader(urls=urls)
docs_not_splitted = loader.load()

### 2. `Split` to chunks

In [5]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_not_splitted)

### 3. `Embed` & `Store` in Deep Lake

In [6]:
# create an instance of OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002") 

my_activelopp_org_id = 'bichpham102'
my_activelopp_dataset_name = 'langchain_course_customer_support'
dataset_path = f'hub://{my_activelopp_org_id}/{my_activelopp_dataset_name}'
# Initializes a connection to the Deep Lake dataset, using the path and an embedding function for processing the dataset.
db = DeepLake(dataset_path=dataset_path 
             ,embedding_function = embeddings) 

# add documents to the Deep Lake dataset 
db.add_documents(docs)

Your Deep Lake dataset has been successfully created!


Creating 105 embeddings in 1 batches of size 105:: 100%|██████████| 1/1 [00:23<00:00, 23.27s/it]

Dataset(path='hub://bichpham102/langchain_course_customer_support', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype       shape      dtype  compression
  -------    -------     -------    -------  ------- 
   text       text      (105, 1)      str     None   
 metadata     json      (105, 1)      str     None   
 embedding  embedding  (105, 1536)  float32   None   
    id        text      (105, 1)      str     None   





['0879cd5c-79c1-11ef-9798-0022485ac518',
 '0879cf0a-79c1-11ef-9798-0022485ac518',
 '0879d00e-79c1-11ef-9798-0022485ac518',
 '0879d0e0-79c1-11ef-9798-0022485ac518',
 '0879d1a8-79c1-11ef-9798-0022485ac518',
 '0879d270-79c1-11ef-9798-0022485ac518',
 '0879d32e-79c1-11ef-9798-0022485ac518',
 '0879d3f6-79c1-11ef-9798-0022485ac518',
 '0879d4b4-79c1-11ef-9798-0022485ac518',
 '0879d57c-79c1-11ef-9798-0022485ac518',
 '0879d63a-79c1-11ef-9798-0022485ac518',
 '0879d716-79c1-11ef-9798-0022485ac518',
 '0879d7e8-79c1-11ef-9798-0022485ac518',
 '0879d8c4-79c1-11ef-9798-0022485ac518',
 '0879d996-79c1-11ef-9798-0022485ac518',
 '0879da5e-79c1-11ef-9798-0022485ac518',
 '0879db30-79c1-11ef-9798-0022485ac518',
 '0879dbf8-79c1-11ef-9798-0022485ac518',
 '0879dcca-79c1-11ef-9798-0022485ac518',
 '0879dd88-79c1-11ef-9798-0022485ac518',
 '0879de46-79c1-11ef-9798-0022485ac518',
 '0879df0e-79c1-11ef-9798-0022485ac518',
 '0879dfd6-79c1-11ef-9798-0022485ac518',
 '0879e094-79c1-11ef-9798-0022485ac518',
 '0879e152-79c1-

### 4. A user’s query `retrieves` the most relevant segments from Deep Lake

In [7]:
# demo similarity search 
query = 'how to check disk usage in linux?'
docs = db.similarity_search(query) 
print(len(docs))    
print(docs[0].page_content)

4
Home > Tech > How to Check Disk Usage in Linux (4 Methods)

How to Check Disk Usage in Linux (4 Methods)

Beebom Staff

Updated: December 19, 2023

Comments 0

Share

Copied

There may be times when you need to download some important files or transfer some photos to your Linux system, but face a problem of insufficient disk space. You head over to your file manager to delete the large files which you no longer require, but you have no clue which of them are occupying most of your disk space. In this article, we will show some easy methods to check disk usage in Linux from both the terminal and the GUI application.

Table of Contents

Check Disk Space Using the df Command

In Linux, there are many commands to check disk usage, the most common being the df command. The df stands for “Disk Filesystem” in the command, which is a handy way to check the current disk usage and the available disk space in Linux. The syntax for the df command in Linux is as follows:

df <options> <file_syste

### 5. `Prompt` 

In [8]:
# let's write a prompt for a customer support chatbot that
# answer questions using information extracted from our db
template = """You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.

{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context information. Do not invent stuff.

Question: {query}

Answer:"""

prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],
    template=template,
)

The Full Pipeline

In [10]:
query = "How to check disk usage in linux?"

# retrieve 
docs = db.similarity_search(query) 
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt based on template above 
chunks_formatted = '\n\n'.join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted = chunks_formatted 
                                ,query = query)


In [16]:
# generate answer 
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)  # temperature is a hyperparameter that controls the randomness of the output, temperature=0 means deterministic output
answer = llm.predict(prompt_formatted) 
print(answer)

To check disk usage in Linux, you can use the df command in the terminal. The df command stands for "Disk Filesystem" and is a handy way to check the current disk usage and available disk space in Linux. Additionally, you can also use GUI tools like the GDU Disk Usage Analyzer and the Gnome Disks Tool to monitor disk usage in a more user-friendly way.
