# **🚀 Getting Started with DeepSeek-R1-Distill-Llama-8B**  
📌 **Copyright 2025, Denis Rothman**  

---

## **🚀 Installing and running DeepSeek-R1-Distill-Llama-8B**  

This notebook provides a **step-by-step guide** on how to **download and run DeepSeek-R1-Distill-Llama-8B** locally in **Google Drive**.  The version downloaded is an open-source distilled version of DeepSeek-R1 provided by  unsloth, an LLM accelerator,  on Hugging Face :https://unsloth.ai/

If you don't want to use Google Drive, you can install the artefacts on a local machine, server or cloud server.

### **🔹 How to Get Started**  
1️⃣ **Install the model's artifacts** → Set `install_deepseek=True` and run all cells.  
2️⃣ **Restart the session** → Disconnect and start a new session.  
3️⃣ **Re-run the model** → Set `install_deepseek=False` and run all cells again.  
4️⃣ **Interact with the model** → Use it in a prompt session!  

⚠️ **System Requirements**  
✅ **GPU** – Minimum **16GB** VRAM required.  
✅ **Google Drive Space** – At least **20GB** free space.  
📌 **Educational Use Only** – For production, deploy artifacts on a **local or cloud server**.

---

## **📖 Table of Contents**  

### **1️⃣ Setting Up the DeepSeek Environment (Hugging Face)**  
✅ Checking GPU Activation  
📂 Mounting Google Drive  
⚙️ Installing the Hugging Face Environment  
🔄 Ensuring `install_deepseek=True` for First Run  
📌 Checking Transformer Version  

### **2️⃣ Downloading DeepSeek-R1-Distill-Llama-8B**  
📂 Verifying Download Path  

### **3️⃣ Running a DeepSeek Session**  
🔄 Setting `install_deepseek=False` for Second Run  
📌 Model Information  
💬 Running an Interactive Prompt Session  

---

### **💡 Ready to Use DeepSeek?**  
Follow the **installation steps**, ensure you have the required **hardware**, and launch your **interactive AI session** 🚀

# 1. Setting up DeepSeek Hugging Face environment

In [1]:
# Set install_deepseek to True to download and install R1 Distill Llama 8B locally
# Set install_deepseek to False to run an R1 session
install_deepseek=False

## Checking GPU activation

In [2]:
!nvidia-smi

Wed Mar  5 08:11:04 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:00:04.0 Off |                    0 |
| N/A   34C    P0             46W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

## Mount Google Drive

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
import os

# Define the cache directory in your Google Drive
cache_dir = '/content/drive/MyDrive/genaisys/HuggingFaceCache'

# Set environment variables to direct Hugging Face to use this cache directory
os.environ['TRANSFORMERS_CACHE'] = cache_dir
#os.environ['HF_DATASETS_CACHE'] = os.path.join(cache_dir, 'datasets')

## Installation Hugging Face environment

Path in this notebook: drive/MyDrive/genaisys/


In [5]:
!pip transformers==4.48.3

ERROR: unknown command "transformers==4.48.3"


# 2.DeepSeek download



In [6]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import time
if install_deepseek==True:
   # Record the start time
  start_time = time.time()

  model_name = 'unsloth/DeepSeek-R1-Distill-Llama-8B'
  # Load the tokenizer and model
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', torch_dtype='auto')

    # Record the end time
  end_time = time.time()

  # Calculate the elapsed time
  elapsed_time = end_time - start_time

  print(f"Time taken to load the model: {elapsed_time:.2f} seconds")



In [7]:
if install_deepseek==True:
 !ls -R /content/drive/MyDrive/genaisys/HuggingFaceCache

/content/drive/MyDrive/genaisys/HuggingFaceCache:
models--unsloth--DeepSeek-R1-Distill-Llama-8B  version.txt

/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B:
blobs  refs  snapshots

/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/blobs:
03910325923893259d090bfa92baa4088cd46573
0ab389d23c02726e56c53379f99a420035974a33
0c15378b8bf8af3ceaa5e7a81372996b5080fe2035fd304b491064f95b8625e2
0fd8120f1c6acddc268ebc2583058efaf699a771
13263c27b6e1c82a791559fc2fe27af0748060180c559220d45b93b5fffe239e
21b8ca8f9ab09417c124d32ba5b9a59bcd417c4594e41a48d3e869a8a328a021
49d6c171706a9c36a4ba5f358e9ce94c27557fa812da99aa5ef0961fcd35de3f
846e0e5df0c5f053a21f9390ceec3eabc52d06b3
afcf8b83f782748e77f548ee46a21ced225f3431
d91915040cfac999d8c55f4b5bc6e67367c065e3a7a4e4b9438ce1f256addd86

/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/refs:
main

/content/drive/MyDrive/genaisy

# 3.DeepSeek-R1-Distill-Llama-8B session

## Loading the model

In [13]:
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
if install_deepseek==False:
  # Define the path to the model directory
  model_path = '/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/snapshots/71f34f954141d22ccdad72a2e3927dddf702c9de'

  # Record the start time
  start_time = time.time()
  # Load the tokenizer and model from the specified path
  tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True)
  model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', torch_dtype='auto', local_files_only=True)

  # Record the end time
  end_time = time.time()

  # Calculate the elapsed time
  elapsed_time = end_time - start_time

  print(f"Time taken to load the model: {elapsed_time:.2f} seconds")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Time taken to load the model: 14.71 seconds


In [15]:
if install_deepseek==False:
  print(model.config)

LlamaConfig {
  "_attn_implementation_autoset": true,
  "_name_or_path": "/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/snapshots/71f34f954141d22ccdad72a2e3927dddf702c9de",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pad_token_id": 128004,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat1

## Prompt

In [10]:
if install_deepseek==False:
  prompt="""
  Explain how a product designer could transformer customer requirements for a traveling bag into a production plan.
  """

In [11]:
import time
if install_deepseek==False:
  # Record the start time
  start_time = time.time()


  # Tokenize the input
  inputs = tokenizer(prompt, return_tensors='pt').to('cuda')

  # Generate output with enhanced anti-repetition settings
  outputs = model.generate(
    **inputs,
    max_new_tokens=1200,
    repetition_penalty=1.5,             # Increase penalty to 1.5 or higher
    no_repeat_ngram_size=3,             # Prevent repeating n-grams of size 3
    temperature=0.6,                    # Reduce randomness slightly
    top_p=0.9,                          # Nucleus sampling for diversity
    top_k=50                            # Limits token selection to top-k probable tokens
  )

  # Decode and display the output
  generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

  # Record the end time
  end_time = time.time()

  # Calculate the elapsed time
  elapsed_time = end_time - start_time

  print(f"Time taken to load the model: {elapsed_time:.2f} seconds")

  #print(generated_text)

Time taken to load the model: 48.78 seconds


In [12]:
import textwrap
if install_deepseek==False:
  # Assuming 'generated_text' contains the text you want to format
  wrapped_text = textwrap.fill(generated_text, width=80)  # Adjust 'width' as needed

  print(wrapped_text)


   Explain how a product designer could transformer customer requirements for a
traveling bag into a production plan.       The process involves understanding
the user's needs, defining specifications through collaboration with
manufacturers and suppliers to ensure quality standards are met. Finally,
creating detailed blueprints that guide manufacturing.  Okay, so I need to
explain step-by-step how aproduct designer transformscustomer
requirementstoaproductionplanforatripplingbag.Let me think aboutthisprocessand
break it down in my mind first.I guess starting fromthebeginningwhen someone
wantsto createa newtraveling袋，they must talktosthe客户或者用户了解他们的具体需求。那么，这个过程是怎样的呢？
首先，我应该考虑与顾客进行沟通和理解其真实需要。这可能包括讨论该背包将用于何种场合，是轻便、耐用还是有其他特性。
接下来，或许会收集所有必要信息，如尺寸要求（宽度、高度）、重量限制，以及材料偏好。此外，还要关注功能方面的问题，比如是否带拉链，有没有内部 pockets
或者多条肩绑等设计元素。  然后，将这些整理成一个详细且清晰的产品规格说明书，以供制造商参考。在这个阶段，也很重要的是确保每一项都被准确定义，没有遗漏或误解的地
方。如果出现不清楚之处，最好的方法就是回去向客户澄清，所以这也是关键的一步。  之后，就可以开始在脑海中构思如何把这些函数转化为物理形态了。一位优秀的手工艺品店
主通常有一套工具箱，其中包含各种草图纸张，不同类型笔记本