# Using Open Source LLMs Natively

Here we will see briefly how you can use popular open source LLM APIs including

- Hugging Face Transformers

## Install Dependencies

In [1]:
!pip install -qq transformers==4.47.0
!pip install -qq accelerate==1.1.0
!pip install -qq groq==0.13.0

In [2]:
pip install -qq torch==2.7.1

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install -qq torch torchvision torchaudio

Note: you may need to restart the kernel to use updated packages.


## Get Hugging Face Access Token

Here you need to get an access token to be able to download or access models using Hugging Face's platform:

- Hugging Face Access Token: Go [here](https://huggingface.co/settings/tokens) and create a key with write permissions. You need to setup an account which is totally free of cost.


1. Go to [Settings -> Access Tokens](https://huggingface.co/settings/tokens) after creating your account and make sure to create a new access token with write permissions

![](https://i.imgur.com/dtS6tFr.png)

2. Remember to __Save__ your key somewhere safe as it will just be shown once as shown below. So copy and save it in a local secure file to use it later on. If you forget, just create a new key anytime.

![](https://i.imgur.com/NmZmpmw.png)

## Load Hugging Face Access Token


In [4]:
from dotenv import load_dotenv
import os

load_dotenv()

True

## Using LLMs Locally with Hugging Face

This is if you want to download LLMs locally completely and run it without the need of sending your data to any external server. Do note you would need a GPU to run any of these models as even the smaller language models are still essentially quite big.

Certain LLMs are gated like [Meta Llama 3.2 1B Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) so make sure to apply for access as shown below else you will get an error when using the model

![](https://i.imgur.com/M88MOu5.png)

## Load the LLM locally using Huggingface

In [7]:
# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
# Define the model ID
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load the pre-trained model
# Set the device to 'cuda' and the data type to 'bfloat16' for improved performance
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
)

In [8]:
chat = [
    { "role": "user", "content": "Explain what is Generative AI in 2 bullet points" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
print(prompt)

<|user|>
Explain what is Generative AI in 2 bullet points</s>
<|assistant|>



Remember to always refer to the [__documentation__](https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate) where all the arguments of the generation pipeline are mentioned in detail. Most notably:

- **max_length:** The maximum length of the sequence to be generated
- **max_new_tokens:** The maximum numbers of tokens to generate, ignore the current number of tokens. Use either max_new_tokens or max_length but not both, they serve the same purpose
- **do_sample:** Whether or not to use sampling. False means use greedy decoding i.e temperature=0
- **temperature:** Between 0 - 1, The value used to module the next token probabilities. Higher temperature means the results may vary and be more creative

In [9]:
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1000)
print(tokenizer.decode(outputs[0]))

<|user|>
Explain what is Generative AI in 2 bullet points</s> 
<|assistant|>
1. Generative AI is a type of artificial intelligence that can generate new ideas, concepts, and solutions based on data. It is a form of machine learning that uses algorithms to analyze large amounts of data and generate new insights or solutions.

2. Generative AI can be used in various industries, including finance, healthcare, marketing, and education. It can help businesses to identify new markets, develop new products, and improve customer experiences.

3. Generative AI can also be used to create new content, such as blog posts, social media posts, and videos. It can generate content based on user data or historical data, making it more relevant and engaging for the audience.

4. Generative AI can also be used to improve the quality of data analysis. By generating new insights and solutions based on data, it can help businesses to make more informed decisions and improve their data analysis processes.

5

### Pipelines make it easier to send prompts

You don't need to encode and decode your inputs and outputs everytime

In [10]:
llama_pipe = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="cuda",
)

Device set to use cpu


In [11]:
chat = [
    { "role": "user", "content": "Explain what is Generative AI in 2 bullet points" },
]

In [12]:
response = llama_pipe(chat, max_new_tokens=1000)
print(response)

[{'generated_text': [{'role': 'user', 'content': 'Explain what is Generative AI in 2 bullet points'}, {'role': 'assistant', 'content': '1. Generative AI is a type of artificial intelligence that can generate new ideas, concepts, and solutions based on data. It is a form of machine learning that uses algorithms to analyze large amounts of data and generate new insights or solutions.\n\n2. Generative AI can be used in various industries, including finance, healthcare, marketing, and education. It can help businesses to identify new markets, develop new products, and improve customer experiences.\n\n3. Generative AI can also be used to create new content, such as blog posts, social media posts, and videos. It can generate content based on user data or historical data, making it more relevant and engaging for the audience.\n\n4. Generative AI can also be used to improve the quality of data analysis. By generating new insights and solutions based on data, it can help businesses to make more

In [13]:
print(response[0]["generated_text"][-1]['content'])

1. Generative AI is a type of artificial intelligence that can generate new ideas, concepts, and solutions based on data. It is a form of machine learning that uses algorithms to analyze large amounts of data and generate new insights or solutions.

2. Generative AI can be used in various industries, including finance, healthcare, marketing, and education. It can help businesses to identify new markets, develop new products, and improve customer experiences.

3. Generative AI can also be used to create new content, such as blog posts, social media posts, and videos. It can generate content based on user data or historical data, making it more relevant and engaging for the audience.

4. Generative AI can also be used to improve the quality of data analysis. By generating new insights and solutions based on data, it can help businesses to make more informed decisions and improve their data analysis processes.

5. Generative AI can also be used to create new products and services. By anal