# GPT4All - setup by downloading models

##  Teaching LLM workflow by using open source models and GPT4All
GPT4all is a framework to source models and handle Jupyter workflow and have them  work within the confines of limited compute eg like a personal computer or a cloud based server like we use for instruction.  

### Shared Filesystem 

In the setup where I was teaching I used this notebook to download models from Huggingface and I put them in a shared-readwrite folder, where the students could access them on Jupyterhub.  I was using a Jupyterhub for teaching that had a shared folder system.  

Your use case may vary 
- shared read write
- each student downloads own models
- download models to local 

In [8]:
# Ensure that your python environment has gpt4all package installed.
try:
    from gpt4all import GPT4All
except ImportError:
    %pip install gpt4all
    from gpt4all import GPT4All



## Which model to download
In the use case for teaching on a Juptyerhub with a CPU, I was looking for **small models**, 
 - ~1bn parameters
 - quantized (Weights have 4 decimal places instead of 10 )


(This info is as of the writing of this notebook in May/June 2025 and this info is changing rapidly) 


You can explore the world of models at :
[Hugging Face Model List](https://huggingface.co/models)

GPT4All is using a subset of these models - Here is the description from their [documentation](https://docs.gpt4all.io/gpt4all_desktop/models.html#explore-models):

- Many LLMs are available at various sizes, quantizations, and licenses.

- LLMs with more parameters tend to be better at coherently responding to instructions

- LLMs with a smaller quantization (e.g. 4bit instead of 16bit) are much faster and less memory intensive, and tend to have slightly worse performance

- Licenses vary in their terms for personal and commercial use



Five that I picked to download are:
- `DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf`
- `Phi-3-mini-4k-instruct.Q4_0.gguf`
- `Llama-3.2-1B-Instruct-Q4_0.gguf`		 
- `qwen2-1_5b-instruct-q4_0.gguf`
- `mistral-7b-instruct-v0.1.Q4_0.gguf`




The simplest way to download a model is just to call for it in GPT4All and then it downloads it if you dont have it

Don't worry if you get `llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen'`  thats about access to GPUs which we don't have in this case

## Let's check out our local filesystem path and where we will download the files

### Approach 1 -  if a Shared Hub is being used 

In [9]:
# This only worked for SP 25 instuction on Berkeley Datahub
#!ls /home/jovyan/_shared/econ148-readwrite

In [10]:
# Cal-ICOR workhop Hub?
#!ls /home/jovyan/shared-rw

### Approach 2 -  if a local machine is being used

In [11]:
#This is my local path to a directory called shared-rw
!ls shared-rw

[34mcourse[m[m                        qwen2-1_5b-instruct-q4_0.gguf
gemma-2b-it.Q4_0.gguf


In [18]:
# or the full path ( this is on my laptop) 
!ls /Users/ericvandusen/Documents/GitHub/SmallLM-SP25/shared-rw

DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf orca-mini-3b-gguf2-q4_0.gguf
[34mcourse[m[m                                  qwen2-1_5b-instruct-q4_0.gguf
gemma-2b-it.Q4_0.gguf


### Set the path where the models will download

In [19]:
path="/Users/ericvandusen/Documents/GitHub/SmallLM-SP25/shared-rw"

## Downloading the models

In [14]:
# Define the "model" object to which this notebook's code will send conversations & prompts
model = GPT4All(
    model_name="DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf",
    allow_download=True,
    model_path=path,
    verbose=True
)



Downloading: 100%|████████████████████████| 1.07G/1.07G [00:14<00:00, 73.3MiB/s]
Model downloaded to '/Users/ericvandusen/Documents/GitHub/SmallLM-SP25/shared-rw/DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf'
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen'
llama_load_model_from_file: failed to load model
LLAMA ERROR: failed to load model from /Users/ericvandusen/Documents/GitHub/SmallLM-SP25/shared-rw/DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf


In [17]:
# Define the "model" object to which this notebook's code will send conversations & prompts
model = GPT4All(
    model_name="orca-mini-3b-gguf2-q4_0.gguf",
    allow_download=True,
    model_path=path,
    verbose=True
)



Downloading: 100%|████████████████████████| 1.98G/1.98G [00:38<00:00, 50.9MiB/s]
Verifying: 100%|███████████████████████████| 1.98G/1.98G [00:02<00:00, 818MiB/s]
Model downloaded to '/Users/ericvandusen/Documents/GitHub/SmallLM-SP25/shared-rw/orca-mini-3b-gguf2-q4_0.gguf'


In [22]:
# Show models available in the Hub shared directory. Larger models may run slowly, or not at all.
import os
print("\n".join(os.listdir(path)))

course
qwen2-1_5b-instruct-q4_0.gguf
gemma-2b-it.Q4_0.gguf
DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf
orca-mini-3b-gguf2-q4_0.gguf


In [24]:
!ls "{path}"

DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf orca-mini-3b-gguf2-q4_0.gguf
[34mcourse[m[m                                  qwen2-1_5b-instruct-q4_0.gguf
gemma-2b-it.Q4_0.gguf


## Bonus Searching for models from a database 

 - We can go to GPT4All database
 - Make that database into a pandas dataframe
 - Filter to pick nodels we want


In [None]:
import requests
import pandas as pd

In [43]:
#Load JSON from the GPT4All models repository
#Small curated list
url = "https://gpt4all.io/models/models3.json"


In [44]:
models = requests.get(url).json()
# Convert to DataFrame
Models_df = pd.DataFrame(models)


In [45]:
Models_df

Unnamed: 0,order,md5sum,name,filename,filesize,requires,ramrequired,parameters,quant,type,description,url,chatTemplate,systemPrompt,promptTemplate,sha256sum,systemMessage,removedIn,disableGUI,embeddingModel
0,a,a54c08a7b90e4029a8c2ab5b5dc936aa,Reasoner v1,qwen2.5-coder-7b-instruct-q4_0.gguf,4431390720,3.6.0,8,8 billion,q4_0,qwen2,"<ul><li>Based on <a href=""https://huggingface....",https://huggingface.co/Qwen/Qwen2.5-Coder-7B-I...,{{- '<|im_start|>system\n' }}\n{% if toolList|...,,,,,,,
1,aa,c87ad09e1e4c8f9c35a5fcef52b6f1c9,Llama 3 8B Instruct,Meta-Llama-3-8B-Instruct.Q4_0.gguf,4661724384,2.7.1,8,8 billion,q4_0,LLaMA3,<ul><li>Fast responses</li><li>Chat based mode...,https://gpt4all.io/models/gguf/Meta-Llama-3-8B...,{%- set loop_messages = messages %}\n{%- for m...,,<|start_header_id|>user<|end_header_id|>\n\n%1...,,,,,
2,aa1,,DeepSeek-R1-Distill-Qwen-7B,DeepSeek-R1-Distill-Qwen-7B-Q4_0.gguf,4444121056,3.8.0,8,7 billion,q4_0,deepseek,<p>The official Qwen2.5-Math-7B distillation o...,https://huggingface.co/bartowski/DeepSeek-R1-D...,{%- if not add_generation_prompt is defined %}...,,,5cd4ee65211770f1d99b4f6f4951780b9ef40e29314bd6...,,,,
3,aa2,,DeepSeek-R1-Distill-Qwen-14B,DeepSeek-R1-Distill-Qwen-14B-Q4_0.gguf,8544267680,3.8.0,16,14 billion,q4_0,deepseek,<p>The official Qwen2.5-14B distillation of De...,https://huggingface.co/bartowski/DeepSeek-R1-D...,{%- if not add_generation_prompt is defined %}...,,,906b3382f2680f4ce845459b4a122e904002b075238080...,,,,
4,aa3,,DeepSeek-R1-Distill-Llama-8B,DeepSeek-R1-Distill-Llama-8B-Q4_0.gguf,4675894112,3.8.0,8,8 billion,q4_0,deepseek,<p>The official Llama-3.1-8B distillation of D...,https://huggingface.co/bartowski/DeepSeek-R1-D...,{%- if not add_generation_prompt is defined %}...,,,0eb93e436ac8beec18aceb958c120d282cb2cf5451b231...,,,,
5,aa4,,DeepSeek-R1-Distill-Qwen-1.5B,DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf,1068807776,3.8.0,3,1.5 billion,q4_0,deepseek,<p>The official Qwen2.5-Math-1.5B distillation...,https://huggingface.co/bartowski/DeepSeek-R1-D...,{%- if not add_generation_prompt is defined %}...,,,b3af887d0a015b39fab2395e4faf682c1a81a6a3fd09a4...,,,,
6,b,27b44e8ae1817525164ddf4f8dae8af4,Llama 3.2 3B Instruct,Llama-3.2-3B-Instruct-Q4_0.gguf,1921909280,3.4.0,4,3 billion,q4_0,LLaMA3,<ul><li>Fast responses</li><li>Instruct model<...,https://huggingface.co/bartowski/Llama-3.2-3B-...,{{- bos_token }}\n{%- set date_string = strfti...,<|start_header_id|>system<|end_header_id|>\nCu...,<|start_header_id|>user<|end_header_id|>\n\n%1...,,,,,
7,c,48ff0243978606fdba19d899b77802fc,Llama 3.2 1B Instruct,Llama-3.2-1B-Instruct-Q4_0.gguf,773025920,3.4.0,2,1 billion,q4_0,LLaMA3,<ul><li>Fast responses</li><li>Instruct model<...,https://huggingface.co/bartowski/Llama-3.2-1B-...,{{- bos_token }}\n{%- set date_string = strfti...,<|start_header_id|>system<|end_header_id|>\nCu...,<|start_header_id|>user<|end_header_id|>\n\n%1...,,,,,
8,d,a5f6b4eabd3992da4d7fb7f020f921eb,Nous Hermes 2 Mistral DPO,Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf,4108928000,2.7.1,8,7 billion,q4_0,Mistral,<strong>Good overall fast chat model</strong><...,https://huggingface.co/NousResearch/Nous-Herme...,{%- for message in messages %}\n {{- '<|im_...,,<|im_start|>user\n%1<|im_end|>\n<|im_start|>as...,,,,,
9,e,97463be739b50525df56d33b26b00852,Mistral Instruct,mistral-7b-instruct-v0.1.Q4_0.gguf,4108916384,2.5.0,8,7 billion,q4_0,Mistral,<strong>Strong overall fast instruction follow...,https://gpt4all.io/models/gguf/mistral-7b-inst...,{%- if messages[0]['role'] == 'system' %}\n ...,,[INST] %1 [/INST],,,,,


In [46]:
# Display the columns of the DataFrame
Models_df.columns

Index(['order', 'md5sum', 'name', 'filename', 'filesize', 'requires',
       'ramrequired', 'parameters', 'quant', 'type', 'description', 'url',
       'chatTemplate', 'systemPrompt', 'promptTemplate', 'sha256sum',
       'systemMessage', 'removedIn', 'disableGUI', 'embeddingModel'],
      dtype='object')

In [47]:
#dimensions of the DataFrame
Models_df.shape

(32, 20)

In [48]:
# Filter models that require less than 4 GB of RAM
# Convert 'ramrequired' to numeric and filter to ramrequired < 4
Models_df[Models_df["ramrequired"].astype(float) < 4]

Unnamed: 0,order,md5sum,name,filename,filesize,requires,ramrequired,parameters,quant,type,description,url,chatTemplate,systemPrompt,promptTemplate,sha256sum,systemMessage,removedIn,disableGUI,embeddingModel
5,aa4,,DeepSeek-R1-Distill-Qwen-1.5B,DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf,1068807776,3.8.0,3,1.5 billion,q4_0,deepseek,<p>The official Qwen2.5-Math-1.5B distillation...,https://huggingface.co/bartowski/DeepSeek-R1-D...,{%- if not add_generation_prompt is defined %}...,,,b3af887d0a015b39fab2395e4faf682c1a81a6a3fd09a4...,,,,
7,c,48ff0243978606fdba19d899b77802fc,Llama 3.2 1B Instruct,Llama-3.2-1B-Instruct-Q4_0.gguf,773025920,3.4.0,2,1 billion,q4_0,LLaMA3,<ul><li>Fast responses</li><li>Instruct model<...,https://huggingface.co/bartowski/Llama-3.2-1B-...,{{- bos_token }}\n{%- set date_string = strfti...,<|start_header_id|>system<|end_header_id|>\nCu...,<|start_header_id|>user<|end_header_id|>\n\n%1...,,,,,
26,v,e479e6f38b59afc51a470d1953a6bfc7,SBert,all-MiniLM-L6-v2-f16.gguf,45887744,2.5.0,1,40 million,f16,Bert,<strong>LocalDocs text embeddings model</stron...,https://gpt4all.io/models/gguf/all-MiniLM-L6-v...,,,,,,2.7.4,True,True
27,w,dd90e2cb7f8e9316ac3796cece9883b5,SBert,all-MiniLM-L6-v2.gguf2.f16.gguf,45949216,2.7.4,1,40 million,f16,Bert,<strong>LocalDocs text embeddings model</stron...,https://gpt4all.io/models/gguf/all-MiniLM-L6-v...,,,,,,3.0.0,,True
29,y,60ea031126f82db8ddbbfecc668315d2,Nomic Embed Text v1,nomic-embed-text-v1.f16.gguf,274290560,2.7.4,1,137 million,f16,Bert,nomic-embed-text-v1,https://gpt4all.io/models/gguf/nomic-embed-tex...,,,,,,,True,True
30,z,a5401e7f7e46ed9fcaed5b60a281d547,Nomic Embed Text v1.5,nomic-embed-text-v1.5.f16.gguf,274290560,2.7.4,1,137 million,f16,Bert,nomic-embed-text-v1.5,https://gpt4all.io/models/gguf/nomic-embed-tex...,,,,,,,True,True
31,zzz,a8c5a783105f87a481543d4ed7d7586d,Qwen2-1.5B-Instruct,qwen2-1_5b-instruct-q4_0.gguf,937532800,3.0,3,1.5 billion,q4_0,qwen2,<ul><li>Very fast responses</li><li>Instructio...,https://huggingface.co/Qwen/Qwen2-1.5B-Instruc...,{%- for message in messages %}\n {%- if loo...,<|im_start|>system\nBelow is an instruction th...,<|im_start|>user\n%1<|im_end|>\n<|im_start|>as...,,,,,
