# Running an LLM on your own laptop
In this notebook, we're going to learn how to run a Hugging Face LLM on our own machine.

## Download the LLM
We're going to write some code to manually download the model.

In [2]:
import os
from huggingface_hub import hf_hub_download

In [3]:
HUGGING_FACE_API_KEY = os.environ.get("HUGGING_FACE_API_KEY")

In [4]:
model_id = "lmsys/fastchat-t5-3b-v1.0"
filenames = [
        "pytorch_model.bin", "added_tokens.json", "config.json", "generation_config.json", 
        "special_tokens_map.json", "spiece.model", "tokenizer_config.json"
]

In [5]:
for filename in filenames:
        downloaded_model_path = hf_hub_download(
                    repo_id=model_id,
                    filename=filename,
                    token=HUGGING_FACE_API_KEY
        )
        print(downloaded_model_path)

/Users/rmian/.cache/huggingface/hub/models--lmsys--fastchat-t5-3b-v1.0/snapshots/0b1da230a891854102d749b93f7ddf1f18a81024/pytorch_model.bin
/Users/rmian/.cache/huggingface/hub/models--lmsys--fastchat-t5-3b-v1.0/snapshots/0b1da230a891854102d749b93f7ddf1f18a81024/added_tokens.json
/Users/rmian/.cache/huggingface/hub/models--lmsys--fastchat-t5-3b-v1.0/snapshots/0b1da230a891854102d749b93f7ddf1f18a81024/config.json
/Users/rmian/.cache/huggingface/hub/models--lmsys--fastchat-t5-3b-v1.0/snapshots/0b1da230a891854102d749b93f7ddf1f18a81024/generation_config.json
/Users/rmian/.cache/huggingface/hub/models--lmsys--fastchat-t5-3b-v1.0/snapshots/0b1da230a891854102d749b93f7ddf1f18a81024/special_tokens_map.json
/Users/rmian/.cache/huggingface/hub/models--lmsys--fastchat-t5-3b-v1.0/snapshots/0b1da230a891854102d749b93f7ddf1f18a81024/spiece.model
/Users/rmian/.cache/huggingface/hub/models--lmsys--fastchat-t5-3b-v1.0/snapshots/0b1da230a891854102d749b93f7ddf1f18a81024/tokenizer_config.json


## Run the LLM
Now let's try running the model. But before we do that, let's disable the Wi-Fi.

In [7]:
# Ensure the current directory is in the Python path
import sys
import os
sys.path.append(os.getcwd())

In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

# tokenizer = AutoTokenizer.from_pretrained(model_id, legacy=False, use_fast=False)
tokenizer = AutoTokenizer.from_pretrained(model_id, legacy=False, use_fast=False)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipeline = pipeline("text2text-generation", model=model, device=-1, tokenizer=tokenizer, max_length=1000)

In [14]:
pipeline("What are competitors to Apache Kafka?")

[{'generated_text': 'Apache Kafka is a popular open source message broker that is used for real-time data streaming and streaming applications. It is a popular choice for companies that need to process large amounts of data quickly and efficiently. Some of the competitors to Apache Kafka include Apache Spark, Apache Kafka Streams, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Streaming, Apache Kafka Streams Stream

In [13]:
pipeline("""My name is Mark.
I have brothers called David and John and my best friend is Michael.
Using only the context above. Do you know if I have a sister?    
""")

[{'generated_text': "No,   I   don't   have   a   sister. \n"}]