# Author

- Name: **Chaipat Suwannapoom**
- Linkedin: https://www.linkedin.com/in/bchaipats/
- GitHub: https://github.com/bchaipats
- Medium: https://medium.com/@bchaipats
- Twitter: https://twitter.com/bchaipats

# Overview

In this notebook, we'll be creating a personal assistant powered by open-source LLMs, free of charge.

# Set up

In [1]:
# Install dependencies
!pip install \
    langchain==0.1.9 \
    gpt4all==2.2.1 \
    llama-index==0.10.13 \
    llama-index-embeddings-langchain==0.1.2 \
    llama-index-llms-langchain==0.1.3 \
    sentence-transformers==2.4.0

Collecting langchain==0.1.9
  Using cached langchain-0.1.9-py3-none-any.whl (816 kB)
Collecting gpt4all==2.2.1
  Using cached gpt4all-2.2.1-py3-none-macosx_10_15_universal2.whl (5.8 MB)
Collecting llama-index==0.10.13
  Using cached llama_index-0.10.13-py3-none-any.whl (5.6 kB)
Collecting llama-index-embeddings-langchain==0.1.2
  Using cached llama_index_embeddings_langchain-0.1.2-py3-none-any.whl (2.5 kB)
Collecting llama-index-llms-langchain==0.1.3
  Using cached llama_index_llms_langchain-0.1.3-py3-none-any.whl (4.6 kB)
Collecting sentence-transformers==2.4.0
  Using cached sentence_transformers-2.4.0-py3-none-any.whl (149 kB)
Collecting SQLAlchemy<3,>=1.4
  Using cached SQLAlchemy-2.0.27-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB)
Collecting aiohttp<4.0.0,>=3.8.3
  Using cached aiohttp-3.9.3-cp311-cp311-macosx_11_0_arm64.whl (387 kB)
Collecting dataclasses-json<0.7,>=0.5.7
  Using cached dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33
  Using cached j

# Run LLM on a personal laptop

If you worry about your data privacy (and your money), you might prefer running open-source LLMs privately on your local laptop instead of making API calls to external services like OpenAI.

GPT4All helps us achieve this. We can browse and download open-source LLMs from the [GPT4All homepage](https://gpt4all.io/index.html) into our laptop. Say, we pick Mistral-7B since it's so cool and requires only 8 GB of RAM which is suitable for most consumer-grade CPUs.

In [2]:
# Download an open-source LLM
!wget https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf

--2024-02-28 09:57:30--  https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf
Resolving gpt4all.io (gpt4all.io)... 104.26.0.159, 104.26.1.159, 172.67.71.169
Connecting to gpt4all.io (gpt4all.io)|104.26.0.159|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4108927744 (3.8G)
Saving to: ‘mistral-7b-openorca.Q4_0.gguf’


2024-02-28 09:58:37 (56.0 MB/s) - Connection closed at byte 3861454848. Retrying.

--2024-02-28 09:58:38--  (try: 2)  https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf
Connecting to gpt4all.io (gpt4all.io)|104.26.0.159|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4108927744 (3.8G), 247472896 (236M) remaining
Saving to: ‘mistral-7b-openorca.Q4_0.gguf’

mistral-7b-openorca  94%[++++++++++++++++++  ]   3.63G  95.4KB/s    in 4m 10s  

2024-02-28 10:02:51 (137 KB/s) - Connection closed at byte 3896442880. Retrying.

--2024-02-28 10:02:53--  (try: 3)  https://gpt4all.io/models/gguf/mistral-7b-

In [3]:
from langchain.llms import GPT4All

# Load the model
llm = GPT4All(model='./mistral-7b-openorca.Q4_0.gguf')

# Define a sentence to complete
sentence = "The capital of Thailand is "

# Complete the sentence
response = llm.generate([sentence])
print(response.generations[0][0].text)

 Bangkok, which is also known as the City of Angels. It’s a bustling city with over 10 million people and it can be overwhelming at first glance. But don’t worry! I have put together this list of things to do in Bangkok for you so that your trip will be smooth sailing.



# Personal data

We'll be using a simple text file as our data source for simplicity. Later, we can just copy and paste texts from personal calendar, news, articles, books, etc. For example, here is a mock calendar for testing:

In [4]:
%%writefile ./data.txt
# data.txt

19 Dec - Go to the movie
20 Dec - Have a dinner with family
21 Dec - Go to the birthday party
22 Dec - Go to the dentist
23 Dec - Finish writing a draft for blog post

Overwriting ./data.txt


# Create necessary components
- **Embedding model** used to generate embedding vector representing the original text.
- **Prompt helper** used to handle LLM context window and token limitations.
- **Sentence splitter** used to split data into multiple chunks.

In [5]:
from langchain.embeddings import HuggingFaceEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding
from llama_index.core.indices.prompt_helper import PromptHelper
from llama_index.core.node_parser import SentenceSplitter

# An embedding model used to structure text into representations
embed_model = LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

# PromptHelper can help deal with LLM context window and token limitations
prompt_helper = PromptHelper(context_window=2048)

# SentenceSplitter used to split our data into multiple chunks
# Only a number of relevant chunks will be retrieved and fed into LLMs
node_parser = SentenceSplitter(chunk_size=300, chunk_overlap=20)

  from .autonotebook import tqdm as notebook_tqdm


# Create a container service
This service packs the large language model, the embedding model, the prompt helper and the node parser into a single handy container.

In [6]:
from llama_index.core import Settings

In [7]:
Settings.llm = llm
Settings.embed_model = embed_model
Settings.prompt_helper = prompt_helper
Settings.node_parser = node_parser

# Build an index and a query engine

In [9]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex

# Load data.txt into a document
document = SimpleDirectoryReader(input_files=['./data.txt']).load_data()

# Process data (chunking, embedding, indexing) and store them
index = VectorStoreIndex.from_documents(document)

# Build a query engine from the index
query_engine = index.as_query_engine()

In [10]:
response = query_engine.query('Give me my calendar.')
print(response)

  warn_deprecated(



Your calendar for December is as follows:
19 Dec - Go to the movie
20 Dec - Have a dinner with family
21 Dec - Go to the birthday party
22 Dec - Go to the dentist
23 Dec - Finish writing a draft for blog post


In [11]:
response = query_engine.query("What's the plan for 22 Dec?")
print(response)

22 Dec is planned as "Go to the dentist".


In [12]:
response = query_engine.query('When should I finish a writing draft?')
print(response)

23 Dec


# Summarize news

For text summarization, it is better to create a new index that is suitable for the task. This is called a **SummaryIndex** which will synthesize an answer from all chunks of data compared to top-k chunks as in **VectorStoreIndex**.

Credit: the following is the text from this [news](https://www.newsinlevels.com/products/grenade-in-the-face-level-3/).

In [13]:
%%writefile ./data.txt
# data.txt

Grenade in the face – level 3

A soldier is lucky to be alive after a 40-millimetre grenade accidentally fired from a grenade launcher and embedded itself in his face.

The terrible accident happened in Colombia and surgeons had to be especially careful with the operation.

The soldier could not be moved to hospital by helicopter due to the risk of the explosive detonating, so an ambulance transferred him in an eight-hour ride to Bogota.

The team of doctors had to improvise and operate outside the hospital to minimise impact on the building in case the grenade exploded. Luckily, the operation was a success and the soldier could reintegrate into his military unit and his family to make a full recovery.

Overwriting ./data.txt


In [14]:
from llama_index.core import SummaryIndex

# Load data.txt into a document
document = SimpleDirectoryReader(input_files=['./data.txt']).load_data()

# Process data (chunking, embedding, indexing) and store them
summary_index = SummaryIndex.from_documents(document)

# Build a summary engine from the index
summary_engine = summary_index.as_query_engine(response_mode="tree_summarize")

In [15]:
response = summary_engine.query("What is a summary of this collection of text?")
print(response)


A soldier in Colombia survived an accident where a grenade fired from a launcher embedded itself into his face. Due to the risk of explosion, he was transported by ambulance instead of helicopter and underwent surgery outside the hospital. The operation was successful, allowing him to recover and reintegrate with his military unit and family.
