In [None]:
# https://docs.gpt4all.io/gpt4all_python/home.html using the docs as a guide

!pip install gpt4all

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from gpt4all import GPT4All

In [11]:
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
with model.chat_session():
    print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))

Downloading: 100%|████████████████████████| 4.66G/4.66G [01:45<00:00, 44.2MiB/s]
Verifying: 100%|███████████████████████████| 4.66G/4.66G [00:07<00:00, 613MiB/s]


Large Language Models (LLMs) are powerful AI models that require significant computational resources to train and use. Running them efficiently on your laptop requires a combination of hardware, software, and technique optimizations. Here's a comprehensive guide to help you get the most out of your LLMs:

**Hardware Optimizations**

1. **CPU**: Choose a CPU with multiple cores (at least 4-6) for parallel processing.
2. **GPU**: If possible, use a laptop with a dedicated NVIDIA or AMD GPU, as they can significantly accelerate computations.
3. **RAM**: Ensure you have at least 16 GB of RAM to accommodate the model's memory requirements.

**Software Optimizations**

1. **Python version**: Use Python 3.x (e.g., 3.8) for better performance and compatibility with popular libraries like TensorFlow, PyTorch, or JAX.
2. **Libraries**: Choose optimized libraries:
	* For CPU-based computations: NumPy, SciPy, and scikit-learn are efficient choices.
	* For GPU-accelerated computations: cuDNN (for N

In [12]:
with model.chat_session():
    print(model.generate("How can I run train this LLM on my dataset locally?", max_tokens=1024))

To fine-tune a Large Language Model (LLM) like BERT, RoBERTa, or XLNet on your local machine, you'll need to use a library that supports training transformer-based models. Here's a step-by-step guide:

**Prerequisites:**

1. **Python**: You should have Python 3.x installed.
2. **PyTorch**: Install PyTorch (version >= 1.9) using pip: `pip install torch`
3. **Transformers library**: Install the Transformers library (version >= 4.10) using pip: `pip install transformers`

**Step-by-Step Guide:**

### 1. Prepare your dataset

* Convert your text data into a format that can be used for training, such as JSON or CSV.
* Split your data into training (~80%), validation (~15%), and testing sets (5%).

### 2. Choose an LLM architecture

Select the pre-trained model you want to fine-tune from the Transformers library:
```python
import torch
from transformers import BertTokenizer, BertForSequenceClassification

# Load a pre-trained BERT model for sequence classification
tokenizer = BertTokenizer.f

In [13]:
with model.chat_session():
    print(model.generate("How can I run use this LLM and my dataframe to make a book recommendation system?", max_tokens=1024))

What a great question!

To build a book recommendation system using a Large Language Model (LLM) like OpenAI's Codex or Google's BERT, you'll need to follow these general steps:

1. **Preprocess your data**: Prepare your dataset of books and their corresponding features (e.g., genres, authors, summaries). You can use libraries like Pandas for data manipulation.
2. **Train a model on the preprocessed data**: Use the LLM's API or a library like Hugging Face Transformers to train a model that takes in book metadata as input and predicts relevant books based on user preferences (e.g., genre, author).
3. **Create an inference pipeline**: Set up a system that can take in new user inputs (e.g., favorite authors, genres) and use the trained model to generate personalized book recommendations.
4. **Integrate with your dataframe**: Use the recommended books from step 3 to create a recommendation list for each user in your original dataset.

Here's a high-level example of how you can implement th

In [2]:
df = pd.read_csv('./data/cleaned_df.csv')

In [3]:
df.head()

Unnamed: 0,isbn13,isbn10,title,subtitle,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count
0,9780002261982,0002261987,Spider's Web,A Novel,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,5164.0
1,9780006380832,0006380832,Empires of the Monsoon,A History of the Indian Ocean and Its Invaders,Richard Hall,"Africa, East",http://books.google.com/books/content?id=MuPEQ...,Until Vasco da Gama discovered the sea-route t...,1998.0,4.41,608.0,65.0
2,9780006470229,000647022X,The Gap Into Madness,Chaos and Order,Stephen R. Donaldson,"Hyland, Morn (Fictitious character)",http://books.google.com/books/content?id=4oXav...,A new-cover reissue of the fourth book in the ...,1994.0,4.15,743.0,103.0
3,9780006499626,0006499627,Miss Marple,The Complete Short Stories,Agatha Christie,"Detective and mystery stories, English",http://books.google.com/books/content?id=a96qP...,"Miss Marple featured in 20 short stories, publ...",1997.0,4.2,359.0,6235.0
4,9780006551812,0006551815,'Tis,A Memoir,Frank McCourt,Ireland,http://books.google.com/books/content?id=Q3BhQ...,FROM THE PULIZER PRIZE-WINNING AUTHOR OF THE #...,2000.0,3.68,495.0,44179.0


In [22]:
# **Step 1: Preprocess data**

features = ['title', 'subtitle', 'categories', 'description', 'authors', 'average_rating', 'ratings_count']
features_df = df[features]

In [18]:
features_df.head()

Unnamed: 0,title,subtitle,categories,description,authors,average_rating,ratings_count
0,Spider's Web,A Novel,Detective and mystery stories,A new 'Christie for Christmas' -- a full-lengt...,Charles Osborne;Agatha Christie,3.83,5164.0
1,Empires of the Monsoon,A History of the Indian Ocean and Its Invaders,"Africa, East",Until Vasco da Gama discovered the sea-route t...,Richard Hall,4.41,65.0
2,The Gap Into Madness,Chaos and Order,"Hyland, Morn (Fictitious character)",A new-cover reissue of the fourth book in the ...,Stephen R. Donaldson,4.15,103.0
3,Miss Marple,The Complete Short Stories,"Detective and mystery stories, English","Miss Marple featured in 20 short stories, publ...",Agatha Christie,4.2,6235.0
4,'Tis,A Memoir,Ireland,FROM THE PULIZER PRIZE-WINNING AUTHOR OF THE #...,Frank McCourt,3.68,44179.0


In [24]:
0.8*len(features_df)

1745.6000000000001

In [27]:
# **Step 2: Train a model**

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import Trainer

RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with `pip install tf-keras`.

In [20]:
model_name = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [25]:
# Train the model on your book features dataframe

# create a dataset from `features_df` - about 80% of len(features_df)
train_dataset = features_df.sample(1745)

trainer = Trainer(model=model, train_dataset=train_dataset, tokenizer=tokenizer)

trainer.train(model, train_dataset)

NameError: name 'Trainer' is not defined

In [None]:
# **Step 3: Create an inference pipeline**

def get_recommendations(user_input):
    """
    Take in user input (e.g., favorite authors) and return recommended books.
    """
    # Convert user input to a format the model can understand
    user_features = ...  # e.g., tokenize text, create embeddings

    # Use the trained model to generate recommendations
    outputs = model(user_features)
    predicted_books = [book for book in df.index if outputs[book] > threshold]

    return predicted_books