## GPT4All Basics

One of the few free, locally-running LLMs cleared for commercial use.

**Note:** After installing `gpt4all` with pip, models are stored in `~/.cache/gpt4all/` by default.

In [1]:
import gpt4all

In [3]:
gptj = gpt4all.GPT4All("ggml-gpt4all-j-v1.3-groovy")

Found model file.
gptj_model_load: loading model from '/Users/account/.cache/gpt4all/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size  =  896.00 MB
gptj_model_load: ................................... done
gptj_model_load: model size =  3609.38 MB / num tensors = 285


In [6]:
messages = [{"role": "user", "content": "Name 3 colors"}]
gptj.chat_completion(messages)

### Instruction: 
            The prompt below is a question to answer, a task to complete, or a conversation 
            to respond to; decide which and write an appropriate response.
            
### Prompt: 
Name 3 colors
### Response:
 Blue, Green and Orange


{'model': 'ggml-gpt4all-j-v1.3-groovy',
 'usage': {'prompt_tokens': 239, 'completion_tokens': 23, 'total_tokens': 262},
 'choices': [{'message': {'role': 'assistant',
    'content': ' Blue, Green and Orange'}}]}

Separating the prompt from the message should make things easier.

In [7]:
prompt = "Name 3 colors"
messages = [{"role": "user", "content": prompt}]
gptj.chat_completion(messages)

### Instruction: 
            The prompt below is a question to answer, a task to complete, or a conversation 
            to respond to; decide which and write an appropriate response.
            
### Prompt: 
Name 3 colors
### Response:
 Blue, Green and Red


{'model': 'ggml-gpt4all-j-v1.3-groovy',
 'usage': {'prompt_tokens': 239, 'completion_tokens': 20, 'total_tokens': 259},
 'choices': [{'message': {'role': 'assistant',
    'content': ' Blue, Green and Red'}}]}

In [34]:
prompt = "Name 3 colors"
messages = [{"role": "user", "content": prompt}]
response = gptj.chat_completion(messages)
print(response['choices'][0]['message']['content'])

### Instruction: 
            The prompt below is a question to answer, a task to complete, or a conversation 
            to respond to; decide which and write an appropriate response.
            
### Prompt: 
Name 3 colors
### Response:
 Blue, Green and Red
 Blue, Green and Red


TODO: see if API documentation exists to have `GPT4All` output just the response 

### Summarizing

In [4]:
# Wikipedia article on LLMs
text = "A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.[1] LLMs emerged around 2018 and perform well at a wide variety of tasks. This has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.[2]"

In [13]:
prompt = f"""
Summarize the following text delimited by triple backticks \ 
into two sentences.
```{text}```
"""
messages = [{"role": "user", "content": prompt}]
gptj.chat_completion(messages)

### Instruction: 
            The prompt below is a question to answer, a task to complete, or a conversation 
            to respond to; decide which and write an appropriate response.
            
### Prompt: 

Summarize the following text delimited by triple backticks \ 
into two sentences.
```A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.[1] LLMs emerged around 2018 and perform well at a wide variety of tasks. This has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.[2]```

### Response:
 The prompt is asking for a summary of the text that follows. The response should be in two sentences, with the text enclosed within triple backticks.```A large language model (LLM) is a type of arti

{'model': 'ggml-gpt4all-j-v1.3-groovy',
 'usage': {'prompt_tokens': 800,
  'completion_tokens': 566,
  'total_tokens': 1366},
 'choices': [{'message': {'role': 'assistant',
    'content': ' The prompt is asking for a summary of the text that follows. The response should be in two sentences, with the text enclosed within triple backticks.```A large language model (LLM) is a type of artificial intelligence that consists of a neural network with many parameters, trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. It emerged in 2018 and has been shown to perform well at a wide range of tasks. This shift in focus has led to a decrease in the use of specialized supervised models for specific tasks.'}}]}

I gave the same prompt to `GPT3.5` in my `GPT-3_basics.ipynb` notebook and it did a significantly better job at following directions.

In [40]:
prompt = f"""
Summarize the following text delimited by triple backticks. \ 
Use two bullet points. Be concise in your wording.
```{text}```
"""
messages = [{"role": "user", "content": prompt}]
gptj.chat_completion(messages)

### Instruction: 
            The prompt below is a question to answer, a task to complete, or a conversation 
            to respond to; decide which and write an appropriate response.
            
### Prompt: 

Summarize the following text delimited by triple backticks. \ 
Use two bullet points. Be concise in your wording.
```A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.[1] LLMs emerged around 2018 and perform well at a wide variety of tasks. This has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.[2]```

### Response:
 * A large language model (LLM) is a type of artificial intelligence that uses neural networks to process and analyze large amounts of text. It is trained on a vast am

{'model': 'ggml-gpt4all-j-v1.3-groovy',
 'usage': {'prompt_tokens': 832,
  'completion_tokens': 509,
  'total_tokens': 1341},
 'choices': [{'message': {'role': 'assistant',
    'content': ' * A large language model (LLM) is a type of artificial intelligence that uses neural networks to process and analyze large amounts of text. It is trained on a vast amount of unclassified text using self-supervised learning or semi-supervised learning. The LLM was first developed in 2018 and has since become a popular tool for various natural language processing tasks. The shift away from the traditional paradigm of training specialized supervised models for specific tasks has led to a new focus on LLMs.'}}]}

Modifying the prompt from "sentences" to "bullets" didn't seem to do much. There may be other prompt engineering nuances to get `GPT4ALL` to work better.

In [14]:
# Fictiticious product review (generated by GPT-3)

review = """
I recently purchased an electric sit stand desk and \
I have to say, I'm pretty impressed. The desk is easy \
to adjust with just the touch of a button, and it's \
made a huge difference in my posture and overall \
comfort while working. The desk is also sturdy and \
well-made, so I don't have to worry about it wobbling \
or tipping over. Overall, I'm happy with my purchase \
and would recommend this desk to anyone looking for a \
comfortable and versatile workspace.
"""

In [36]:
prompt = f"""
Identify the following items from the review text: 
- Item purchased by reviewer
- Sentiment (positive or negative)
- 5 most important keywords discussed in the review text

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"item", "sentiment", "keywords" as the keys.

If the information isn't present, use "unknown" \
as the value.

Format the keywords value as a list.

Review text: '''{review}'''
"""

messages = [{"role": "user", "content": prompt}]
response = gptj.chat_completion(messages)

### Instruction: 
            The prompt below is a question to answer, a task to complete, or a conversation 
            to respond to; decide which and write an appropriate response.
            
### Prompt: 

Identify the following items from the review text: 
- Item purchased by reviewer
- Sentiment (positive or negative)
- 5 most important keywords discussed in the review text

The review is delimited with triple backticks. Format your response as a JSON object with "item", "sentiment", "keywords" as the keys.

If the information isn't present, use "unknown" as the value.

Format the keywords value as a list.

Review text: '''
I recently purchased an electric sit stand desk and I have to say, I'm pretty impressed. The desk is easy to adjust with just the touch of a button, and it's made a huge difference in my posture and overall comfort while working. The desk is also sturdy and well-made, so I don't have to worry about it wobbling or tipping over. Overall, I'm happy with my pur

Decent result given its struggles on the earlier prompts. Not sure what happened on identifying keywords.