# Using open source models

Using OpenAI models cost money. In my project on summarizing headlines, it cost me $1.24 or INR 100 to summarize the articles and create bulletted headlines for the top 15 news articles from NDTV.com. The cost of a newspaper is around 12 cents or INR 10.<br><br>
Hence, it makes more sense to use open-source models such as GPT4All as an alternative 

### 1. Downloading the GPT4All model

<b>Manual download from browser</b>

The model can be manually downloaded from https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin <br><br>
Once we download the model, create a new directory "models" and move the model to that folder

<b>Download the files using python</b>

In [1]:
import os
import requests
from tqdm import tqdm

In [2]:
url='https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin' #This is the same url referred to earlier

In [3]:
if 'models' not in os.listdir():  #Make the directory "models"
    os.makedirs('models')
    
response=requests.get(url, stream=True) #Streams the data download as one block of contiguous data at a time

path='./models/gpt4all-lora-quantized-ggml.bin' #The file we downloaded before
with open(path,'wb') as f:
    for block in tqdm(response.iter_content(chunk_size=8192)): #tqdm gives us a visual representation of the status of downloading the data and storing
        if block:
            f.write(block)


514266it [12:06, 708.20it/s] 


### 2. Quantizing the model

Most models have weights that are 32-bit floating point numbers that take a huge amount of memory. <a href="https://github.com/ggerganov">Georgi Gerganov</a> has created a library that allows us to run the LLaMA model using 4-bit integer quantization. This massively helpful library

<center>
<img src="images\5_Quantization.png">
<sub><i>Model Comparison(32 bit vs 8 bit vs 1 bit)</i></sub>
</center>

For more details on Quantization visit <a href="https://medium.com/@satya15july_11937/network-optimization-with-quantization-8-bit-vs-1-bit-af2fd716fcae">  Network Optimization with Quantization — 8 bit vs 1 bit </a>

<b>NOTES:</b><br>To quantize the model, we need to use the python command prompt and not the python scripting environment / IDE / Jupyter notebook <br>
You will also need to have the following libraries installed:<br>
<ul><li><b>Git</b></li></ul>
    
 `pip install git`
    
<br>   
<li><b>Sentencepiece</b></li></ul>
    
`pip install sentencepiece`<br>

**PROCESS:**<br>

1. Download Gregory's Github llama.cpp repository<br>
    `git clone https://github.com/ggerganov/llama.cpp.git`<br><br>
2. Navigate to the llama.cpp directory. We need a specific version of the git for this exercise - "2b26469"<br>
    `cd llama.cpp && git checkout 2b26469`<br><br>
3. Convert the data type of the model we had downloaded earlier to 4-bit integers<br>
    `python3 llama.cpp/convert.py ./models/gpt4all-lora-quantized-ggml.bin`

### 3. Run the model

1. To run the model, we need to install <b>'pyllamacpp'</b>. This is used by Langchain's GPT4All package to load the model from the path that we direct it to. If we do not install this library, we get an error:<br>
<b>ValidationError</b>: 1 validation error for GPT4All<br>
__root__<br>
  Could not import pyllamacpp python package. Please install it with `pip install pyllamacpp`. (type=value_error)

` !pip install pyllamacpp`<br><br>
2. If you are following the tutorial, you will be asked to upgrade langchain to v0.0.152 via `pip install -q langchain==0.0.152`. This is because the callback managers (as per the exercise below) is not part of the older version. We get an error:
`ImportError: cannot import name 'CallbackManager' from 'langchain.callbacks' (D:\miniconda3\envs\llms\lib\site-packages\langchain\callbacks\__init__.py)`

However, this breaks the model loading process with an error:<br><br>
`Gpt4all Model.__init__() got an unexpected keyword argument 'ggml_model' (type=type_error).`

This is solved by rolling back the version of a package pygpt4all (which is part of langchain) as follows:<br>

`!pip install pygpt4all==v1.0.1 --force-reinstall`

<sub><a href="https://github.com/hwchase17/langchain/issues/3839#issuecomment-1534160447">Credit</a>

In [3]:
# !pip install -q langchain==0.0.152
# !pip install pyllamacpp==

In [4]:
# !pip install gpt4all

In [5]:
from langchain.llms import GPT4All
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains.summarize import load_summarize_chain
from langchain.callbacks import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [6]:
callback_manager=CallbackManager([StreamingStdOutCallbackHandler])

#### a. Load the model

In [7]:
llm=GPT4All(model="model/ggml-model-q4_0.bin",callback_manager=callback_manager, verbose=True)

#### b. Create the prompt template

In [8]:
template1 = """Question: {question}

Answer: Let's think step by step."""

In [9]:
template2 = """Question: {question}

Answer: Let's answer in two sentence while being funny."""

In [10]:
prompt1=PromptTemplate(
    template=template1,
    input_variables=['question']
)

prompt2=PromptTemplate(
    template=template2,
    input_variables=['question']
)

In [11]:
question1='What happens when it rains somewhere?'

In [12]:
print(prompt1.format_prompt(question=question1))

text="Question: What happens when it rains somewhere?\n\nAnswer: Let's think step by step."


#### c. Create the chain

In [13]:
llmchain=LLMChain(llm=llm,prompt=prompt1)

#### d. Run the model

In [14]:
response=llmchain.run(question1)

In [15]:
print(response)

 Question: What happens when it rains somewhere?

Answer: Let's think step by step. First, let us consider the basics of rain. Raindrops form from clouds that have reached a moisture level sufficient to seed out as individual droplets or larger clusters (known as hydrometeors). Cloud condensation nuclei are responsible for providing surfaces upon which precipitation can condense into these tiny water particles, and there's always going to be some available if the air is cool enough. When it rains somewhere on Earth, clouds will form due to atmospheric instability or changes in wind direction/strength (or both), which then lead to more cumulus formation within those particular updraft regions where moisture can condense out of the upper atmosphere onto individual nuclei and eventually turn into rain. As precipitation falls through these clouds, it gets caught up by their momentum; after a while its motion slows enough for some particles or parcels that are denser than air to begin falli

In [16]:
print(prompt2.format_prompt(question=question1))

text="Question: What happens when it rains somewhere?\n\nAnswer: Let's answer in two sentence while being funny."


In [17]:
llmchain=LLMChain(llm=llm,prompt=prompt2)

In [18]:
response=llmchain.run(question1)

In [19]:
print(response)

 Question: What happens when it rains somewhere?

Answer: Let's answer in two sentence while being funny. 1) Water from the sky falls on Earth and creates puddles that make people wet, or even floods where they could get stuck if driving through them.2 )Rainfall is measured with rain gauges to monitor water levels of lakes, rivers etc for planning purposes in infrastructural development projects
  


#### Test for larger text

In [20]:
import pickle
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate

In [21]:
def isEnglish(s):
    try:
        s.encode(encoding='utf-8').decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True

In [22]:
with open('data.pkl','rb') as f:
    news_dict=pickle.load(f)

In [23]:
llm=GPT4All(model="model/ggml-model-q4_0.bin",callback_manager=callback_manager, verbose=True)
summarizer = load_summarize_chain(llm, chain_type='refine', verbose=True)

In [24]:
summary_dict = {}
errors=[]
for k, v in news_dict.items():
        if 'No content' in v[2]:
            continue
        hl = v[0]
        sub_text=(v[1]+v[2]).replace('\n\n\n',"").replace('\n\n','')
        if not isEnglish(sub_text):
            sub_text=sub_text.encode("utf-8")
        d = Document(page_content=sub_text)
        result = summarizer.run([d])
        url = v[3]
        summary_dict[k] = (hl, url, sub_text, result, len(sub_text), len(result))



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"The change, which saw state BJP chief Bandi Sanjay replaced with Kishan Reddy, was triggered by rising internal dissent within the party, sources say.G Kishan Reddy is believed to have a more agreeable demeanour towards the BRS.Hyderabad: The BJP's unexpected leadership change in Telangana on Tuesday marked a significant shift in the state's political landscape with less than six months to go before elections, but the makings of the announcement have been there for a while now.The change, which saw state BJP chief Bandi Sanjay replaced with Union Minister G Kishan Reddy, was triggered by rising internal dissent within the party, unusual for the typically disciplined BJP, sources say. This comes on the heels of the party's recent defeat in the Karnataka elections.Bandi Sanjay, often praised publicly by Prime 

In [25]:
for i in summary_dict:
    print('\033[1m',summary_dict[i][0],'\033[0m')
    print(summary_dict[i][3])
    print('\n')

[1m 6 Months Before Telangana Polls, BJP's Big Change Suggests A Reset [0m



[1m Red Magic 8S Pro+, Red Magic 8S Pro With Up to 24GB RAM, Snapdragon 8 Gen 2 SoC Launched: All Details [0m



[1m Man Rapes Girl For 5 Years, Arrested In "Love Jihad" Case: Gujarat Cops [0m



[1m UK Man Found Guilty Of Murdering Neighbour, Two Children In House Fire [0m



[1m Why Sena And Not BJP? Sharad Pawar's Point-By-Point Rebuttal To Rebels [0m



[1m "Petrol Will Be Sold At Rs 15 Per Litre If...": What Minister Nitin Gadkari Said [0m



[1m Bigg Boss Star Palak Purswani Reveals Her Dad "Had A Heart Attack" After Her Break Up With Avinash Sachdev [0m



[1m Urban, Rural India Similarly Polluted, But Mitigation Focuses On Cities: Report [0m



[1m Schools Reopen In Manipur, Attendance Extremely Low On The First Day [0m



[1m Rahul Gandhi's Big Attack On BJP Over Madhya Pradesh Urinating Incident [0m



[1m Amid HDFC Merger, Deepak Parekh's Offer Letter Goes Viral. His Salary Was

<b> The GPT4All model completely fails disastrously on news-related summarization. </b>
<ul>
    <li>A large part of this could be because it has not been trained on any India-specific context aka names, news, vocabulary, etc. However, it does do well on general topics as shown above.</li>
<li>This could also be because if the tokenization dictionary does not contain any of the proper nouns, it will marks almost all of them with the numerical representation of the <'UNK'> token and hence, it will be unable to find any relation between words with the same token representation</li>