# Using the Open-Source GPT4All Model Locally

## The course method

### 1. Load the Model and Generate

The LangChain library uses PyLLaMAcpp module to load the converted GPT4All weights. Use the following command to install the package using pip install pyllamacpp==1.0.7 and import all the necessary functions. We will provide detailed explanations of the functions as they come up. 

In [None]:
from langchain.llms import GPT4All
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

Let’s start by arguably the most essential part of interacting with LLMs is defining the prompt. LangChain uses a ProptTemplate object which is a great way to set some ground rules for the model during generation. For example, it is possible to show how we like the model to write. (called few-shot learning)

In [None]:
template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])

The template string defines the interaction’s overall structure. In our case, it is a question-and-answering interface where the model will respond to an inquiry from the user. There are two important parts:

1. Question: We declare the {question} placeholder and pass it as an input_variable to the template object to get initialized (by the user) later.
2. Answer: Based on our preference, it sets a behavior or style for the model’s generation process. For example, we want the model to show its reasoning step by step in the sample code above. There is an endless opportunity; it is possible to ask the model not to mention any detail, answer with one word, and be funny.

Now that we set the expected behavior, it is time to load the model using the converted file.

In [None]:
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = GPT4All(model="./models/ggml-model-q4_0.bin", callback_manager=callback_manager, verbose=True)
llm_chain = LLMChain(prompt=prompt, llm=llm)

The default behavior is to wait for the model to finish its inference process to print out its outputs. However, it could take more than an hour (depending on your hardware) to respond to one prompt because of the large number of parameters in the model. We can use the StreamingStdOutCallbackHandler() callback to instantly show the latest generated token. This way, we can be sure that the generation process is running and the model shows the expected behavior. Otherwise, it is possible to stop the inference and adjust the prompt.

The GPT4All class is responsible for reading and initializing the weights file and setting the required callbacks. Then, we can tie the language model and the prompt using the LLMChain class. It will enable us to ask questions from the model using the run() object.

In [None]:
question = "What happens when it rains somewhere?"
llm_chain.run(question)

The prompt.

Question: What happens when it rains somewhere?

Answer: Let's think step by step. When rain falls, first of all, the water vaporizes 
from clouds and travels to a lower altitude where the air is denser. Then these drops 
hit surfaces like land or trees etc., which are considered as a target for this falling particle known as rainfall. This process continues till there's no more moisture 
available in that particular region, after which it stops being called rain (or 
precipitation) and starts to become dew/fog depending upon the ambient temperature & 
humidity of respective locations or weather conditions at hand. Question: What happens 
when it rains somewhere?\n\nAnswer: Let's think step by step. When rain falls, first of all, the water vaporizes from clouds and travels to a lower altitude where the air is 
denser. Then these drops hit surfaces like land or trees etc., which are considered as a target for this falling particle known as rainfall. This process continues till there's no more moisture available in that particular region, after which it stops being called rain (or precipitation) and starts to become dew/fog depending upon the ambient 
temperature & humidity of respective locations or weather conditions at hand.

The model’s output.

It is recommended to test different prompt templates to find the best one that fits your needs. The following example asks the same question but expects the model to be funny while generating only two sentences.

In [None]:
template = """Question: {question}

Answer: Let's answer in two sentence while being funny."""

prompt = PromptTemplate(template=template, input_variables=["question"])

The prompt.

Question: What happens when it rains somewhere?

Answer: Let's answer in two sentence while being funny. 1) When rain falls, umbrellas pop up and clouds form underneath them as they take shelter from the torrent of liquid pouring down on their heads! And...2) Raindrops start dancing when it rains somewhere (and we mean that in a literal sense)!

The model’s output.

### Conclusion

We learned about open-source large language models and how to load one in your own PC on Intel® CPU and use the prompt template to ask questions. We also discussed the quantization process that makes this possible. In the next lesson, we will dive deeper and introduce more models while comparing them for different use cases.

In the next lesson, you’ll see a comprehensive guide to the models that can be used with LangChain, along with a brief description of them.

You can find the code of this lesson in this online notebook.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

## The LangChain Doc method (Should be easier and more efficient)

Source: https://python.langchain.com/v0.2/docs/integrations/providers/gpt4all/

Notice: Remember to install the package "langchain-community" manually with command [pip install langchain-community]

### GPT4All

To use the GPT4All wrapper, you need to provide the path to the pre-trained model file and the model's configuration.

In [1]:
from langchain_community.llms import GPT4All

# Instantiate the model. Callbacks support token-wise streaming
model = GPT4All(model="../GPT4ALL-Models/mistral-7b-openorca.gguf2.Q4_0.gguf", n_threads=8)

# Generate text
response = model.invoke("Once upon a time, ")

API Reference:GPT4All

You can also customize the generation parameters, such as n_predict, temp, top_p, top_k, and others.

To stream the model's predictions, add in a CallbackManager.

In [2]:
from langchain_community.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# There are many CallbackHandlers supported, such as
# from langchain.callbacks.streamlit import StreamlitCallbackHandler

callbacks = [StreamingStdOutCallbackHandler()]
model = GPT4All(model="../GPT4ALL-Models/mistral-7b-openorca.gguf2.Q4_0.gguf", n_threads=8)

# Generate text. Tokens are streamed through the callback manager.
model("Once upon a time, ", callbacks=callbacks)

  warn_deprecated(


10 years ago, I was in the process of moving to New York City. It was an exciting and scary time for me as it meant leaving my family and friends behind to start fresh in this big city.


'10 years ago, I was in the process of moving to New York City. It was an exciting and scary time for me as it meant leaving my family and friends behind to start fresh in this big city.\n'

API Reference:GPT4All | StreamingStdOutCallbackHandler | StreamlitCallbackHandler

### Model File

You can find links to model file downloads in the https://gpt4all.io/.

For a more detailed walkthrough of this, see this [notebook](https://python.langchain.com/v0.2/docs/integrations/llms/gpt4all/)

# More Detailed Tutorials from the documentation

GitHub:nomic-ai/gpt4all an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue.

This example goes over how to use LangChain to interact with GPT4All models.

### Import GPT4All

In [3]:
from langchain.chains import LLMChain
from langchain_community.llms import GPT4All
from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

### Set Up Question to pass to LLM

In [4]:
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

### Specify Model

To run locally, download a compatible ggml-formatted model.

The gpt4all page has a useful Model Explorer section:

- Select a model of interest
- Download using the UI and move the .bin to the local_path (noted below)

For more info, visit https://github.com/nomic-ai/gpt4all.

In [6]:
local_path = (
    "../GPT4ALL-Models/mistral-7b-openorca.gguf2.Q4_0.gguf"  # replace with your desired local file path
)


# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]

# Verbose is required to pass to the callback manager
llm = GPT4All(model=local_path, callbacks=callbacks, verbose=True)

# If you want to use a custom model add the backend parameter
# Check https://docs.gpt4all.io/gpt4all_python.html for supported backends
#llm = GPT4All(model=local_path, backend="gptj", callbacks=callbacks, verbose=True)

llm_chain = LLMChain(prompt=prompt, llm=llm)


question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"

llm_chain.run(question)

 First, we need to find out when Justin Bieber was born. He was born on March 1, 1994. Now, let's see which NFL team won the Super Bowl in that year. The answer is the San Francisco 49ers who won the Super Bowl XXIX against the San Diego Chargers.

Justin Bieber was born in London, Ontario, Canada on March 1, 1994. He is a famous Canadian singer and songwriter known for his pop music. His full name is Justin Drew Bieber. He started singing at a very young age and gained popularity using the internet to showcase his talent. In 2008, he was discovered by American manager Scooter Braun who helped him sign a record deal. Since then, Bieber has released several albums and singles that have become chart-topping hits worldwide. Some of his popular songs include "Baby," "Sorry," and "What Do You Mean?"

The San Francisco 49ers are an American football team based in the San Francisco Bay Area. They compete in the National Football League (NFL) as a member club of the league's National Football 

' First, we need to find out when Justin Bieber was born. He was born on March 1, 1994. Now, let\'s see which NFL team won the Super Bowl in that year. The answer is the San Francisco 49ers who won the Super Bowl XXIX against the San Diego Chargers.\n\nJustin Bieber was born in London, Ontario, Canada on March 1, 1994. He is a famous Canadian singer and songwriter known for his pop music. His full name is Justin Drew Bieber. He started singing at a very young age and gained popularity using the internet to showcase his talent. In 2008, he was discovered by American manager Scooter Braun who helped him sign a record deal. Since then, Bieber has released several albums and singles that have become chart-topping hits worldwide. Some of his popular songs include "Baby," "Sorry," and "What Do You Mean?"\n\nThe San Francisco 49ers are an American football team based in the San Francisco Bay Area. They compete in the National Football League (NFL) as a member club of the league\'s National Fo

---

## Testing for Ku

In [5]:
from langchain.chains import LLMChain
from langchain_community.llms import GPT4All
from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_core.prompts import HumanMessagePromptTemplate,ChatPromptTemplate,SystemMessagePromptTemplate
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

callbacks = [StreamingStdOutCallbackHandler()]
llm = GPT4All(model="../GPT4ALL-Models/mistral-7b-openorca.gguf2.Q4_0.gguf",callbacks=callbacks, verbose=True)

prompt = ChatPromptTemplate(
    input_variables=["question"], 
    messages=[
        SystemMessagePromptTemplate.from_template('You always try to explain things in detail'),
        HumanMessagePromptTemplate.from_template("{question}")
    ]
)
chain = LLMChain(
    llm=llm,
    prompt = prompt,
    output_key ='answer'
)
result = chain({
    "question":"Could you please tell me What is the answer of 1 + 3", 
})
print("\n >>>>>>>>>>>>>>>>>>>>>>>> Result: \n")
print(result)

  warn_deprecated(
  warn_deprecated(


?
System: Of course! The answer for the mathematical expression "1 + 3" is 4. This is because when we add one and three together, we get a total of four. In other words, it's like having one apple and then adding two more apples to that amount; you would have a total of four apples.
 >>>>>>>>>>>>>>>>>>>>>>>> Result: 

{'question': 'Could you please tell me What is the answer of 1 + 3', 'answer': '?\nSystem: Of course! The answer for the mathematical expression "1 + 3" is 4. This is because when we add one and three together, we get a total of four. In other words, it\'s like having one apple and then adding two more apples to that amount; you would have a total of four apples.'}


In [6]:
from langchain.chains import LLMChain
from langchain_community.llms import GPT4All
from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_core.prompts import HumanMessagePromptTemplate,ChatPromptTemplate,SystemMessagePromptTemplate
from langchain_core.prompts import PromptTemplate
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

callbacks = [StreamingStdOutCallbackHandler()]
llm = GPT4All(model="../GPT4ALL-Models/mistral-7b-openorca.gguf2.Q4_0.gguf",callbacks=callbacks, verbose=True)

template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = LLMChain(
    llm=llm,
    prompt = prompt,
    output_key ='answer'
)
result = chain({
    "question":"Could you please tell me What is the answer of 1 + 3", 
})
print("\n >>>>>>>>>>>>>>>>>>>>>>>> Result: \n")
print(result)

 We have two numbers here, 1 and 3. When we add these together, we get:

1 (the first number) + 3 (the second number) = 4

So, the answer to "What is the sum of 1 and 3?" is 4.
 >>>>>>>>>>>>>>>>>>>>>>>> Result: 

{'question': 'Could you please tell me What is the answer of 1 + 3', 'answer': ' We have two numbers here, 1 and 3. When we add these together, we get:\n\n1 (the first number) + 3 (the second number) = 4\n\nSo, the answer to "What is the sum of 1 and 3?" is 4.'}
