In [1]:
from dotenv import load_dotenv

load_dotenv('../../.env')

True

# Import Libraries

In [6]:
from langchain.vectorstores import DeepLake
from langchain.text_splitter import CharacterTextSplitter
from langchain import HuggingFaceHub
from langchain.document_loaders import SeleniumURLLoader
from langchain import PromptTemplate
from langchain_community.embeddings import HuggingFaceEmbeddings

In [3]:
# we'll use information from the following articles
urls = ['https://beebom.com/what-is-nft-explained/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-download-gif-twitter/',
        'https://beebom.com/how-use-chatgpt-linux-terminal/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-save-instagram-story-with-music/',
        'https://beebom.com/how-install-pip-windows/',
        'https://beebom.com/how-check-disk-usage-linux/']

### 1. Split the documents into chunks and compute their embeddings

In [4]:
# use the selenium scraper to load the documents
loader = SeleniumURLLoader(urls=urls)
docs_not_splitted = loader.load()

# we split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_not_splitted)

Created a chunk of size 1226, which is longer than the specified 1000


In [8]:
print(docs[10])

page_content='Apart from that, critics warn that NFT is a bubble, and people who are buying a strange GIF or collecting a rare video clip at such a high cost will come crashing down. Experts say that paintings and rare collectibles do not hold value just because of the sheer artistry, but also because there is an established audience who want to own and collect rare paintings or an artwork that no other person has.\n\nExperts point out that those who are buying digital artwork are not paying huge sums because they appreciate art. Instead, they want to create a bubble to earn money by reselling it at a higher price. It’s worth noting that the digital world does not have a scarcity of artwork — unlike physical masterpieces — so the prices will likely come down once the bubble bursts.\n\nHow to Buy NFTs\n\nBuying an NFT is as simple as heading to any of the NFT marketplaces, and making a purchase. However, there are certain things you need to take into consideration before you make your f

In [7]:
# Before executing the following code, make sure to have
# your OpenAI key saved in the “OPENAI_API_KEY” environment variable.
embeddings = HuggingFaceEmbeddings(model_name = 'sentence-transformers/all-MiniLM-L6-v2',
                                       model_kwargs = {'device':'cpu'} )

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "thapabibek1129"
my_activeloop_dataset_name = "langchain_course_customer_support"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# add documents to our Deep Lake dataset
db.add_documents(docs)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

Deep Lake Dataset in hub://thapabibek1129/langchain_course_customer_support already exists, loading from the storage


Creating 151 embeddings in 1 batches of size 151:: 100%|██████████| 1/1 [01:07<00:00, 67.91s/it]

Dataset(path='hub://thapabibek1129/langchain_course_customer_support', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
 embedding  embedding  (151, 384)  float32   None   
    id        text      (151, 1)     str     None   
 metadata     json      (151, 1)     str     None   
   text       text      (151, 1)     str     None   





['58e683ba-c97a-11ee-89d3-a434d9523559',
 '58e683bb-c97a-11ee-b562-a434d9523559',
 '58e683bc-c97a-11ee-babe-a434d9523559',
 '58e683bd-c97a-11ee-bdf7-a434d9523559',
 '58e683be-c97a-11ee-8e27-a434d9523559',
 '58e683bf-c97a-11ee-b062-a434d9523559',
 '58e683c0-c97a-11ee-9384-a434d9523559',
 '58e683c1-c97a-11ee-863b-a434d9523559',
 '58e683c2-c97a-11ee-967a-a434d9523559',
 '58e683c3-c97a-11ee-9c77-a434d9523559',
 '58e683c4-c97a-11ee-bbd3-a434d9523559',
 '58e683c5-c97a-11ee-8071-a434d9523559',
 '58e683c6-c97a-11ee-b8dd-a434d9523559',
 '58e683c7-c97a-11ee-b1f4-a434d9523559',
 '58e683c8-c97a-11ee-98f4-a434d9523559',
 '58e683c9-c97a-11ee-8202-a434d9523559',
 '58e683ca-c97a-11ee-abd2-a434d9523559',
 '58e683cb-c97a-11ee-9a51-a434d9523559',
 '58e683cc-c97a-11ee-a839-a434d9523559',
 '58e683cd-c97a-11ee-aa3d-a434d9523559',
 '58e683ce-c97a-11ee-be26-a434d9523559',
 '58e683cf-c97a-11ee-9042-a434d9523559',
 '58e683d0-c97a-11ee-ae7d-a434d9523559',
 '58e683d1-c97a-11ee-b768-a434d9523559',
 '58e683d2-c97a-

In [9]:
# let's see the top relevant documents to a specific query
query = "how to check disk usage in linux?"
docs = db.similarity_search(query)
print(docs[0].page_content)

Home > Tech > How to Check Disk Usage in Linux (4 Methods)

How to Check Disk Usage in Linux (4 Methods)

Beebom Staff

Updated: December 19, 2023

Comments							
							
								0

Share

Copied

There may be times when you need to download some important files or transfer some photos to your Linux system, but face a problem of insufficient disk space. You head over to your file manager to delete the large files which you no longer require, but you have no clue which of them are occupying most of your disk space. In this article, we will show some easy methods to check disk usage in Linux from both the terminal and the GUI application.

Table of Contents

Display Disk Usage in Human Readable Format

Display Disk Occupancy of a Particular Type

Display Disk Usage in Human Readable Format

Display Disk Usage for a Particular Directory

Compare Disk Usage of Two Directories

Sorting Files based on File Size

Exclude Files Based on Their File Size

Exclude Files Based on their Types


### 2: Craft a prompt for T-5 or Mistral using the suggested strategies

In [10]:
# let's write a prompt for a customer support chatbot that
# answer questions using information extracted from our db
template = """You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.

{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context information. Do not invent stuff.

Question: {query}

Answer:"""

prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],
    template=template,
)

# 3: Utilize the T-5 or Mistral model with a temperature of 0 for text generation

In [20]:
# Load Models

llm_t5 = HuggingFaceHub(
    repo_id='google/flan-t5-large',
    model_kwargs={'temperature':0,"max_length": 64,"max_new_tokens":128}
)

llm_mistral = HuggingFaceHub(
    repo_id='mistralai/Mistral-7B-Instruct-v0.2',
    model_kwargs={'temperature':0.5,"max_length": 64,"max_new_tokens":512}
)




In [21]:
# the full pipeline

# user question
query = "How to check disk usage in linux?"

# retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

print("------T5-------")
answer = llm_t5(prompt_formatted)
print(answer)

print("------Mistral-------")
answer = llm_mistral(prompt_formatted)
print(answer)


------T5-------
From both the terminal and the GUI application.
------Mistral-------
 There are several methods to check disk usage in Linux, both through the terminal and the GUI. One of the simplest methods is by using the df command, which displays the total and used space for each mounted file system. Another method is by using the Gnome Disk Utility or the GDU Disk Usage Analyzer tool, which provide a graphical representation of disk usage. To check the disk usage for a specific directory, use the 'du' command with the '-h' flag, followed by the directory name. For example, 'du -h /path/to/directory'. To display the sizes in human-readable format like in megabytes, gigabytes, etc., use the '-h' flag with the 'df' command, i.e., 'df -h'. To compare the disk usage of two directories, use the 'du' command followed by the directory names, i.e., 'du directory1 directory2'.


In [22]:
# the full pipeline

# user question
query = "How to buy NFTs?"

# retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

print("------T5-------")
answer = llm_t5(prompt_formatted)
print(answer)

print("------Mistral-------")
answer = llm_mistral(prompt_formatted)
print(answer)


------T5-------
Heading to any of the NFT marketplaces, and making a purchase.
------Mistral-------
 To buy NFTs, you need to have a crypto wallet that supports NFTs, such as MetaMask. Once you have created your wallet, you can connect it to the platform where you want to buy your NFT, such as OpenSea or Rarible. After that, you can browse through the available NFTs, check their prices, and make a purchase using your cryptocurrency. Make sure to read the fine print regarding fees and other terms before making a purchase.
