#Installation


In [1]:
!pip install transformers datasets peft torch



Import


In [2]:
import torch
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
device="cuda" if torch.cuda.is_available() else "cpu"
model=model.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Generate Domain-Specific Document

In [3]:
text="""AI, short for Artificial Intelligence, is a field of technology focused on creating systems that can perform tasks typically requiring human intelligence. These tasks can include learning, problem-solving, language understanding, decision-making, and even recognizing patterns in data.
In essence, AI is about teaching machines to "think" and adapt in ways that mimic human reasoning. It's used in a v
"""

#Converting Text data into huggig Face data

In [4]:
from datasets import Dataset
sentences=text.split(".")
dataset=Dataset.from_dict({"text":sentences})

In [5]:
print (sentences)

['AI, short for Artificial Intelligence, is a field of technology focused on creating systems that can perform tasks typically requiring human intelligence', ' These tasks can include learning, problem-solving, language understanding, decision-making, and even recognizing patterns in data', '\nIn essence, AI is about teaching machines to "think" and adapt in ways that mimic human reasoning', " It's used in a v\n"]


# Examine the data schema

In [6]:
print(type(dataset))

<class 'datasets.arrow_dataset.Dataset'>


In [7]:
dataset

Dataset({
    features: ['text'],
    num_rows: 4
})

Setup Tokenizer

In [8]:
def preprocess_function(examples):
    inputs=tokenizer(examples["text"],truncation=True,padding="max_length",max_length=512)

    inputs["labels"]=inputs["input_ids"].copy()
    return inputs

tokenized_dataset=dataset.map(preprocess_function,batched=True)

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

In [9]:
tokenized_dataset

Dataset({
    features: ['text', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 4
})

In [10]:
tokenized_dataset["text"]

['AI, short for Artificial Intelligence, is a field of technology focused on creating systems that can perform tasks typically requiring human intelligence',
 ' These tasks can include learning, problem-solving, language understanding, decision-making, and even recognizing patterns in data',
 '\nIn essence, AI is about teaching machines to "think" and adapt in ways that mimic human reasoning',
 " It's used in a v\n"]

# Setup LORA for efficient finetuning
transforming huge model into small one with minimum parametres

In [14]:
from peft import get_peft_model, LoraConfig,TaskType
lora_config= LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    target_modules=["q_proj","v_proj"]
)
model = get_peft_model(model, lora_config)

Configuration of Training Hyperparameters

In [17]:
from transformers import TrainingArguments,Trainer
training_args=TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    warmup_steps=200,
    num_train_epochs=150,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    output_dir="outputs",
    report_to="none",
    remove_unused_columns=False
)

memory optimisation tricks free up memory to avoid crush session

In [18]:
model=model.to("cpu")
trainer=Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
)
import torch
import gc
gc.collect()
torch.cuda.empty_cache()
model= torch.compile(model)
model=model.to("cuda")

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [19]:
trainer.train()

Step,Training Loss
10,9.9866
20,9.1826
30,6.6361
40,2.9933
50,0.4747
60,0.1636
70,0.1484
80,0.1284
90,0.1131
100,0.0992


TrainOutput(global_step=150, training_loss=2.0139715458949405, metrics={'train_runtime': 163.2022, 'train_samples_per_second': 3.676, 'train_steps_per_second': 0.919, 'total_flos': 2849390670643200.0, 'train_loss': 2.0139715458949405, 'epoch': 150.0})

Save Models

In [20]:
domain="madeby-Hana_ai-v1"
model.save_pretrained(f"fine-tuned-deepseek_r1_1.5b-/{domain}")
tokenizer.save_pretrained(f"fine-tuned-deepseek_r1_1.5b-/{domain}")

('fine-tuned-deepseek_r1_1.5b-/madeby-Hana_ai-v1/tokenizer_config.json',
 'fine-tuned-deepseek_r1_1.5b-/madeby-Hana_ai-v1/special_tokens_map.json',
 'fine-tuned-deepseek_r1_1.5b-/madeby-Hana_ai-v1/tokenizer.json')

Inference

In [21]:
from transformers import AutoModelForCausalLM,AutoTokenizer
import torch
domain="madeby-Hana_ai-v1"
model_path=f"fine-tuned-deepseek_r1_1.5b-/{domain}"
model=AutoModelForCausalLM.from_pretrained(model_path)
tokenizer=AutoTokenizer.from_pretrained(model_path)
device="cuda" if torch.cuda.is_available() else "cpu"
model=model.to(device)


In [22]:
model

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 1536)
    (layers): ModuleList(
      (0-27): 28 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): lora.Linear(
            (base_layer): Linear(in_features=1536, out_features=1536, bias=True)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=1536, out_features=16, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=16, out_features=1536, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): Linear(in_features=1536, out_features=256, bias=True)
          (v_proj): lora.Linear(
            (base_layer): Linear(in_features=1536, out_features=256, bi

In [25]:
def generate_text(prompt, max_length=200):
    inputs = tokenizer(prompt, return_tensors="pt").to(device) # Changed line
    with torch.no_grad():
        output = model.generate(
            input_ids=inputs.input_ids, # Changed line
            attention_mask=inputs.attention_mask, # Changed line
            max_length=max_length,
            temperature=0.7,
            top_k=50,
            top_p=0.9,
        )
    return tokenizer.decode(output[0], skip_special_tokens=True)

In [29]:
prompt="Artificial Intelligence, or AI, refers to systems or machines designed to mimic human intelligence to perform tasks and improve themselves"
generated_text=generate_text(prompt,max_length=1024)
print(generated_text)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


Artificial Intelligence, or AI, refers to systems or machines designed to mimic human intelligence to perform tasks and improve themselves. 

In this context, when discussing AI, what are the key factors that influence its development and progress?

I know that one factor is the availability of data. AI systems need data to learn and improve. But what if the data isn't reliable or isn't diverse enough? How does that affect AI's ability to generalize and perform tasks effectively?

Another factor I've heard about is the complexity of AI algorithms. AI systems are built using various algorithms, each with different strengths and weaknesses. How does the complexity of these algorithms influence their performance and scalability?

Lastly, I'm considering the ethical implications of AI. As AI becomes more integrated into society, issues like bias, privacy, and job displacement come into play. How can we ensure that AI is developed and used in a way that respects human values and avoids unin

output is inot a simple reasoning but generation with reasoning capabilities
to increase creativity we can increase temperature.
we notice that Deepseek is asking himself.
