
# Customizing Large Language Models with Additional Input

## Table of Contents

1. [Customizing Large Language Models](#introduction)
2. [Question-Answering LLMs](#qa)
3. [Setting up the Environment](#setup)
4. [Paper-QA](#paper)
5. [Demo](#demo)

---
        


## 1. Customizing Large Language Models <a name="introduction"></a>

 Customizing Large Language Models (LLMs) with additional data is a powerful way
 to tailor their capabilities to specific tasks or domains. This process, often
 referred to as "fine-tuning," involves training the model on a new dataset that
 is related to the specific task at hand. The new data effectively guides the
 model to adjust its internal parameters and better align its language
 generation capabilities with the desired task. For instance, you might
 fine-tune a general-purpose language model on medical literature to create a
 model that excels at answering medical questions. Or you could fine-tune a
 model on customer support transcripts to create a chatbot that understands the
 specific language and issues related to a particular product or service.
 Fine-tuning allows us to leverage the power of LLMs that have been trained on
 vast amounts of data, while still creating models that are highly specialized
 and effective in specific domains or tasks.
 
 ## 2. Question-Answering LLMs <a name="qa"></a>

Question-Answering (QA) Large Language Models are a specialized application of LLMs that have been fine-tuned to answer questions based on provided context or broad knowledge learned during training. These models can interpret a wide range of questions and provide precise answers, making them extremely useful in applications like chatbots, virtual assistants, and customer service automation. Some QA models are designed to generate answers based on a specific piece of text or a set of documents, while others can answer questions based on a broad range of general knowledge. The latter, known as "open-domain" QA models, can answer questions about virtually any topic, drawing on the vast amounts of information they were trained on. Examples of open-domain QA models include GPT-3 by OpenAI and T5 by Google. These models have significantly advanced the field of natural language understanding and opened up new possibilities for AI-powered question answering.


## 3. Setting up the Environment <a name="setup"></a>

Before we start coding, we need to install the necessary libraries. This can be done by running the following commands in your Jupyter notebook:
        

In [1]:
%pip install paper-qa        

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [None]:
import os
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
    api_key = input("OPENAI_API_KEY is not defined, please enter it: ")
    os.environ["OPENAI_API_KEY"] = api_key

In [3]:

import paperqa
print('PaperQA version:', paperqa.__version__)

# Required for Jupyter Notebook
import nest_asyncio
nest_asyncio.apply()    

PaperQA version: 3.5.0



## 4. Paper QA <a name="paper"></a>

Paper QA is a minimal package for doing question and answering from
PDFs, HTML or raw text files. It aims to give very good answers, with no hallucinations, by grounding responses with in-text citations.

By default, it uses [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings) with a vector DB called [FAISS](https://github.com/facebookresearch/faiss) to embed and search documents. However, via [langchain](https://github.com/hwchase17/langchain) you can use open-source models or embeddings (see details below).

PaperQA uses the process shown below:

1. embed docs into vectors
2. embed query into vector
3. search for top k passages in docs
4. create summary of each passage relevant to query
5. put summaries into prompt
6. generate answer with prompt

## 5. Demo <a name="demo"></a>
        

### Before fine-tuning

In [6]:
from paperqa import Docs

# Pricing info for OpenAI models: https://openai.com/pricing#language-models
docs = Docs(llm='gpt-4') # Better model, but more expensive
docs = Docs(llm='gpt-3.5-turbo') # Faster and cheaper
answer = docs.query("What is Justice40 initiative?")
print(answer.formatted_answer)


Question: What is Justice40 initiative?

I cannot answer this question due to insufficient information.



### Add a document describing Justice40 initiative

In [7]:
import time
start = time.time()
docs.add_url('https://www.whitehouse.gov/environmentaljustice/justice40/')
end = time.time()
print(f'Adding the document took {(end-start):.2f} seconds')


Adding the document took 4.53 seconds


### After fine-tuning


In [8]:
start = time.time()
answer = docs.query("What is Justice40 initiative?")
end = time.time()
print(answer.formatted_answer)
print(f'Fine-tuning and query took {(end-start):.2f} seconds')


Question: What is Justice40 initiative?

The Justice40 Initiative is a commitment made by the Federal Government to ensure that 40 percent of the overall benefits of certain Federal investments flow to disadvantaged communities that are marginalized, underserved, and overburdened by pollution. It covers a range of investments including climate change, clean energy and energy efficiency, clean transit, affordable and sustainable housing, training and workforce development, remediation and reduction of legacy pollution, and the development of critical clean water and wastewater infrastructure. The initiative is a continuous effort to improve how government programs deliver benefits to disadvantaged communities and aims to address underinvestment, environmental injustice, and the climate crisis (The2023 chunk 21, The2023 chunk 22).

References

1. (The2023 chunk 21): "The White House." The White House, 2023, www.whitehouse.gov/environmentaljustice/justice40/.

2. (The2023 chunk 22): "The 

## Running a local model

You can run Paper-QA with any local model supported by Langchain. Please note that you can also use other tools for fine tuning such as https://www.llamaindex.ai/, https://github.com/imartinez/privateGPT, etc.

In [8]:
%pip install llama-cpp-python
%pip install gpt4all

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Collecting gpt4all
  Obtaining dependency information for gpt4all from https://files.pythonhosted.org/packages/32/74/542dbc9e58cc92b07bfb8c38f7a4d4d9057da5b50a8379dd16304c6d49ec/gpt4all-1.0.8-py3-none-manylinux1_x86_64.whl.metadata
  Downloading gpt4all-1.0.8-py3-none-manylinux1_x86_64.whl.metadata (912 bytes)
Downloading gpt4all-1.0.8-py3-none-manylinux1_x86_64.whl (4.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m64.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: gpt4all
Successfully installed gpt4all-1.0.8
Note: you may need to restart the kernel to use updated packages.


In [8]:
from paperqa import Docs
from langchain.llms import LlamaCpp, GPT4All
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.embeddings import LlamaCppEmbeddings
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler


model_path="/global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/alpaca-native-7B-ggml/ggml-model-q8_0.bin"
model_path="/global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/Llama-2-7B-Chat-GGML/llama-2-7b-chat.ggmlv3.q8_0.bin"
model_path="/global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/Llama-2-7B-Chat-GGML/llama-2-7b-chat.ggmlv3.q4_1.bin"
model_path="/global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/Llama-2-7B-Chat-GGML/llama-2-7b-chat.ggmlv3.q2_K.bin"
#model_path="/global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/ggml-gpt4all-l13b"

# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]

# Make sure the model path is correct for your system!
# LlamaCpp model
llm = LlamaCpp(model_path=model_path, callbacks=callbacks, n_ctx=4096)
embeddings = LlamaCppEmbeddings(model_path=model_path)

# GPT4All model
# Verbose is required to pass to the callback manager
# llm = GPT4All(model=model_path, callbacks=callbacks, verbose=True)
# embeddings = GPT4AllEmbeddings(model_path=model_path)

docs = Docs(llm=llm, embeddings=embeddings)
answer = docs.query("What is Justice40 initiative?")
print(answer.formatted_answer)

llama.cpp: loading model from /global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/Llama-2-7B-Chat-GGML/llama-2-7b-chat.ggmlv3.q2_K.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 1.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 10 (mostly Q2_K)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_

Question: What is Justice40 initiative?

I cannot answer this question due to insufficient information.



llama_new_context_with_model: kv self size  =  512.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 


In [None]:
import time
start = time.time()
docs.add_url('https://www.whitehouse.gov/environmentaljustice/justice40/')
end = time.time()
print(f'Adding the document took {(end-start):.2f} seconds')

answer = docs.query("What is Justice40 initiative?")
print(answer)

In [9]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

local_path = "/global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/ggml-gpt4all-l13b/ggml-gpt4all-l13b-snoozy.bin"
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])
# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]

# Verbose is required to pass to the callback manager
llm = GPT4All(model=local_path, callbacks=callbacks, verbose=True, n_threads=6)

# If you want to use a custom model add the backend parameter
# Check https://docs.gpt4all.io/gpt4all_python.html for supported backends
# llm = GPT4All(model=local_path, backend="gptj", callbacks=callbacks, verbose=True)
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"

llm_chain.run(question)

Found model file at  /global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/ggml-gpt4all-l13b/ggml-gpt4all-l13b-snoozy.bin


llama.cpp: loading model from /global/cfs/cdirs/m4388/Project1-AI4EJ/Tutorials/models/ggml-gpt4all-l13b/ggml-gpt4all-l13b-snoozy.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  17.47 KB
llama_model_load_internal: mem required  = 3976.51 MB (+ 1608.00 MB per state)
error loading model: llama.cpp: tensor 'layers.9.attention_norm.weight' is missing from model
llama_init_from_file: failed to load model
LLAMA ERROR: failed to load model from /glo

Exception: Model not loaded

In [10]:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

/bin/bash: pip: command not found


In [2]:
import os
os.environ["CMAKE_ARGS"] = "-DLLAMA_CUBLAS=on"
os.environ["FORCE_CMAKE"] = "1"

In [3]:
%pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

Defaulting to user installation because normal site-packages is not writeable
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.1.77.tar.gz (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m70.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Obtaining dependency information for typing-extensions>=4.5.0 from https://files.pythonhosted.org/packages/ec/6b/63cc3df74987c36fe26157ee12e09e8f9db4de771e0f3404263117e75b95/typing_extensions-4.7.1-py3-none-any.whl.metadata
  Downloading typing_extensions-4.7.1-py3-none-any.whl.metadata (3.1 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Obtaining dependency information for numpy>=1.20.0 from https://files.pythonhosted.org/packages/32/6a/65dbc57a89078af9ff8bfcd4c0761a50