# **🚀 Getting Started with DeepSeek-R1-Distill-Llama-8B**  
📌 **Copyright 2025, Denis Rothman**  

---

## **🚀 Installing and running DeepSeek-R1-Distill-Llama-8B**  

This notebook provides a **step-by-step guide** on how to **download and run DeepSeek-R1-Distill-Llama-8B** locally in **Google Drive**.  The version downloaded is an open-source distilled version of DeepSeek-R1 provided by  unsloth, an LLM accelerator,  on Hugging Face :https://unsloth.ai/

If you don't want to use Google Drive, you can install the artefacts on a local machine, server or cloud server.

### **🔹 How to Get Started**  
1️⃣ **Install the model's artifacts** → Set `install_deepseek=True` and run all cells.  
2️⃣ **Restart the session** → Disconnect and start a new session.  
3️⃣ **Re-run the model** → Set `install_deepseek=False` and run all cells again.  
4️⃣ **Interact with the model** → Use it in a prompt session!  

⚠️ **System Requirements**  
✅ **GPU** – Minimum **16GB** VRAM required.  
✅ **Google Drive Space** – At least **20GB** free space.  
📌 **Educational Use Only** – For production, deploy artifacts on a **local or cloud server**.

---

## **📖 Table of Contents**  

### **1️⃣ Setting Up the DeepSeek Environment (Hugging Face)**  
✅ Checking GPU Activation  
📂 Mounting Google Drive  
⚙️ Installing the Hugging Face Environment  
🔄 Ensuring `install_deepseek=True` for First Run  
📌 Checking Transformer Version  

### **2️⃣ Downloading DeepSeek-R1-Distill-Llama-8B**  
📂 Verifying Download Path  

### **3️⃣ Running a DeepSeek Session**  
🔄 Setting `install_deepseek=False` for Second Run  
📌 Model Information  
💬 Running an Interactive Prompt Session  

---

### **💡 Ready to Use DeepSeek?**  
Follow the **installation steps**, ensure you have the required **hardware**, and launch your **interactive AI session** 🚀

This notebook was developed in Google Colab. Colab includes many pre-installed libraries and sets `/content/` as the default directory, meaning you can access files directly by their filename if you wish (e.g., `filename` instead of needing to specify `/content/filename`). This differs from local environments, where you'll often need to install libraries or specify full file paths.

# 1. Setting up DeepSeek Hugging Face environment

In [None]:
# Set install_deepseek to True to download and install R1 Distill Llama 8B locally
# Set install_deepseek to False to run an R1 session
install_deepseek=False

## Checking GPU activation

In [None]:
!nvidia-smi

Wed Mar  5 08:11:04 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:00:04.0 Off |                    0 |
| N/A   34C    P0             46W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

## Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os

# Define the cache directory in your Google Drive
cache_dir = '/content/drive/MyDrive/genaisys/HuggingFaceCache'

# Set environment variables to direct Hugging Face to use this cache directory
os.environ['TRANSFORMERS_CACHE'] = cache_dir
#os.environ['HF_DATASETS_CACHE'] = os.path.join(cache_dir, 'datasets')

## Installation Hugging Face environment

Path in this notebook: drive/MyDrive/genaisys/


In [None]:
!pip install transformers==4.48.3

# 2.DeepSeek download



In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import time
if install_deepseek==True:
   # Record the start time
  start_time = time.time()

  model_name = 'unsloth/DeepSeek-R1-Distill-Llama-8B'
  # Load the tokenizer and model
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', torch_dtype='auto')

    # Record the end time
  end_time = time.time()

  # Calculate the elapsed time
  elapsed_time = end_time - start_time

  print(f"Time taken to load the model: {elapsed_time:.2f} seconds")

In [None]:
if install_deepseek==True:
 !ls -R /content/drive/MyDrive/genaisys/HuggingFaceCache

# 3.DeepSeek-R1-Distill-Llama-8B session

## Loading the model

In [None]:
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
if install_deepseek==False:
  # Define the path to the model directory
  model_path = '/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/snapshots/71f34f954141d22ccdad72a2e3927dddf702c9de'

  # Record the start time
  start_time = time.time()
  # Load the tokenizer and model from the specified path
  tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True)
  model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', torch_dtype='auto', local_files_only=True)

  # Record the end time
  end_time = time.time()

  # Calculate the elapsed time
  elapsed_time = end_time - start_time

  print(f"Time taken to load the model: {elapsed_time:.2f} seconds")

In [None]:
if install_deepseek==False:
  print(model.config)

## Prompt

In [None]:
if install_deepseek==False:
  prompt="""
  Explain how a product designer could transform customer requirements for a traveling bag into a production plan.
  """

In [None]:
import time
if install_deepseek==False:
  # Record the start time
  start_time = time.time()


  # Tokenize the input
  inputs = tokenizer(prompt, return_tensors='pt').to('cuda')

  # Generate output with enhanced anti-repetition settings
  outputs = model.generate(
    **inputs,
    max_new_tokens=1200,
    repetition_penalty=1.5,             # Increase penalty to 1.5 or higher
    no_repeat_ngram_size=3,             # Prevent repeating n-grams of size 3
    temperature=0.6,                    # Reduce randomness slightly
    top_p=0.9,                          # Nucleus sampling for diversity
    top_k=50                            # Limits token selection to top-k probable tokens
  )

  # Decode and display the output
  generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

  # Record the end time
  end_time = time.time()

  # Calculate the elapsed time
  elapsed_time = end_time - start_time

  print(f"Time taken to load the model: {elapsed_time:.2f} seconds")

In [None]:
import textwrap
if install_deepseek==False:
  wrapped_text = textwrap.fill(generated_text, width=80)
  print(wrapped_text)