### How GPT4All works?

It is trained on top of Facebook’s LLaMA model, which released its weights under a non-commercial license. Still, running the mentioned architecture on your local PC is impossible due to the large (7 billion) number of parameters. The authors incorporated two tricks to do efficient fine-tuning and inference. 

### 1. Convert the Model

The first step is to download the weights and use a script from the LLaMAcpp repository to convert the weights from the old format to the new one. It is a required step; otherwise, the LangChain library will not identify the checkpoint file.

In [1]:
workingFolder=r'C:\Users\jfrancis\OneDrive - GalaxE. Solutions, Inc\GalaxE D Drive\AI Journey\Gen AI'

#### We need to download the weights file. You can either head to [https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/] and download the weights (make sure to download the one that ends with *.ggml.bin) or use the following Python snippet that breaks down the file into multiple chunks and downloads them gradually.

In [None]:
import requests
from pathlib import Path
from tqdm import tqdm

local_path = workingFolder + '\\gpt4all-lora-quantized-ggml.bin'
Path(local_path).parent.mkdir(parents=True, exist_ok=True)

url = 'https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin'

# send a GET request to the URL to download the file.
response = requests.get(url, stream=True)

# open the file in binary mode and write the contents of the response
# to it in chunks.
with open(local_path, 'wb') as f:
    for chunk in tqdm(response.iter_content(chunk_size=8192)):
        if chunk:
            f.write(chunk)

#### This process might take a while since the file size is 4GB. Then, it is time to transform the downloaded file to the latest format. We start by downloading the codes in the LLaMAcpp repository or simply fork it using the following command. (You need to have the git command installed) Pass the downloaded file to the convert.py script and run it with a Python interpreter.


<br>git clone https://github.com/ggerganov/llama.cpp.git
<br>cd llama.cpp && git checkout 2b26469
<br>python3 llama.cpp/convert.py ./models/gpt4all-lora-quantized-ggml.bin

In [2]:
#pip install sentencepiece
#pip install langchain==0.0.152  
#pip install pyllamacpp==1.0.7

### 2. Load the Model and Generate

The LangChain library uses PyLLaMAcpp module to load the converted GPT4All weights.

In [3]:
from langchain.llms import GPT4All
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [4]:
template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])

#### The template string defines the interaction’s overall structure. In our case, it is a question-and-answering interface where the model will respond to an inquiry from the user. There are two important parts:

    Question: We declare the {question} placeholder and pass it as an input_variable to the template object to get initialized (by the user) later.
    Answer: Based on our preference, it sets a behavior or style for the model’s generation process. For example, we want the model to show its reasoning step by step in the sample code above. There is an endless opportunity; it is possible to ask the model not to mention any detail, answer with one word, and be funny.

In [5]:
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = GPT4All(model=workingFolder + '\\models\\ggml-model-q4_0.bin', callback_manager=callback_manager, verbose=True)
llm_chain = LLMChain(prompt=prompt, llm=llm)

#### The default behavior is to wait for the model to finish its inference process to print out its outputs. However, it could take more than an hour (depending on your hardware) to respond to one prompt because of the large number of parameters in the model. We can use the StreamingStdOutCallbackHandler() callback to instantly show the latest generated token. This way, we can be sure that the generation process is running and the model shows the expected behavior. Otherwise, it is possible to stop the inference and adjust the prompt.

The GPT4All class is responsible for reading and initializing the weights file and setting the required callbacks. Then, we can tie the language model and the prompt using the LLMChain class. It will enable us to ask questions from the model using the run() object.

In [6]:
question = "What happens when it rains somewhere?"
llm_chain.run(question)

 Question: What happens when it rains somewhere?

Answer: Let's think step by step. Whenever there is rain, the ground receives a lot of water and becomes wetter than usual; this can lead to flooding if enough rain occurs in one particular area or region for an extended period of time (such as during hurricanes). The surface tension properties of droplets that fall from clouds also determine whether they will bead up on leafy vegetation, clothes or any other objects nearby. This effect is seen when it rains lightly and the leaves in a tree start to look shiny with dew because the water molecules have been attracted due their electrical charges at surface level of droplets.

" Question: What happens when it rains somewhere?\n\nAnswer: Let's think step by step. Whenever there is rain, the ground receives a lot of water and becomes wetter than usual; this can lead to flooding if enough rain occurs in one particular area or region for an extended period of time (such as during hurricanes). The surface tension properties of droplets that fall from clouds also determine whether they will bead up on leafy vegetation, clothes or any other objects nearby. This effect is seen when it rains lightly and the leaves in a tree start to look shiny with dew because the water molecules have been attracted due their electrical charges at surface level of droplets."

In [7]:
template = """Question: {question}

Answer: Let's answer in two sentence while being funny."""

prompt = PromptTemplate(template=template, input_variables=["question"])

In [8]:
question = "What happens when it rains somewhere?"
llm_chain.run(question)

 Question: What happens when it rains somewhere?

Answer: Let's think step by step. First, rain can fall from the sky as precipitation in various forms such as drops or larger spheres of water called "raindrops". Rainfall may cause flooding on land and could be dangerous for humans if they happen to walk underneath it with no shelter available. On sea level surfaces like oceans, rivers or lakes rainwater can also erode the soil's particles leading to sedimentation processes that create layers of muddy material called "alluvial fans" deposited at river mouths where tidal forces are usually weakest but still affecting them through wind and currents. Rainfall on mountainsides causes water runoff which feeds into rivers or drains directly onto the sea, contributing to coastline sedimentation (erosion) as well. Additionally it can cause landsliding leading to mudflows if there is too much rain for soil conditions that may create a dangerous flow of debris downstream from eroding hillsides a

' Question: What happens when it rains somewhere?\n\nAnswer: Let\'s think step by step. First, rain can fall from the sky as precipitation in various forms such as drops or larger spheres of water called "raindrops". Rainfall may cause flooding on land and could be dangerous for humans if they happen to walk underneath it with no shelter available. On sea level surfaces like oceans, rivers or lakes rainwater can also erode the soil\'s particles leading to sedimentation processes that create layers of muddy material called "alluvial fans" deposited at river mouths where tidal forces are usually weakest but still affecting them through wind and currents. Rainfall on mountainsides causes water runoff which feeds into rivers or drains directly onto the sea, contributing to coastline sedimentation (erosion) as well. Additionally it can cause landsliding leading to mudflows if there is too much rain for soil conditions that may create a dangerous flow of debris downstream from eroding hillsi