Follow me on Twitter 🐦, connect with me on LinkedIn 🔗, and check out my GitHub 🐙. You won't be disappointed!

👉 Twitter: https://twitter.com/NdiranguMuturi1  
👉 LinkedIn: https://www.linkedin.com/in/isaac-muturi-3b6b2b237  
👉 GitHub: https://github.com/Isaac-Ndirangu-Muturi-749  

# TASK 1

# GENERATIVE AI: FINE-TUNING A HUGGING FACE TRANSFORMER MODEL FOR CODE GENERATION APPLICATION.

**Installation of Required Libraries**

In [None]:
!pip install -q -U torch
!pip install -q -U transformers
!pip install -q -U datasets
!pip install -q -U trl
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U bitsandbytes
!pip install -q -U accelerate
!pip install -q -U huggingface_hub
!pip install -q -U wandb
!pip install -q -U einops scipy

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tokenizers 0.14.1 requires huggingface_hub<0.18,>=0.16.4, but you have huggingface-hub 0.18.0 which is incompatible.[0m[31m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25h

**Git Credential Helper Configuration**

Please run this cell to configure Git credential helper for secure access to repositories.


In [None]:
!git config --global credential.helper store

**Hugging Face Hub Login**

Run the following cell to log in to your Hugging Face Hub account.


In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

**Load the Training Dataset**

In this cell, we are loading the training dataset named "evol-codealpaca-v1" from the Hugging Face Datasets library and specifying the split as "train." The dataset is stored in the variable `dataset`. We then print the contents of the dataset.


In [None]:
from datasets import load_dataset

# Load your dataset
dataset = load_dataset("theblackcat102/evol-codealpaca-v1", split="train")
print(dataset)

Downloading readme:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/255M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['instruction', 'output'],
    num_rows: 111272
})


**Fine Tuning**

This code is for fine-tuning a Salesforce CodeGen model using specific quantization and memory optimization settings. It also utilizes the TRL (Text-to-Text Transfer Transformer) and PEFT (Progressive Embedding Fine-Tuning) techniques for training. The training process and memory usage have been optimized for the specified model and dataset.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from peft import LoraConfig
from transformers import BitsAndBytesConfig

# Load the desired model with quantization
model_name = "Salesforce/codegen-350M-mono"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

# Load the desired tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Define the formatting function
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

# Define your PEFT configuration
peft_config = LoraConfig(
    r=16,  # Reducing this value to 16 for memory optimization
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

# Define TrainingArguments for optimal training
training_args = TrainingArguments(
    output_dir="./results",
    overwrite_output_dir=True,
    per_device_train_batch_size=2,  # Reducing batch size to 2 for memory optimization
    logging_steps=500,
    save_steps=1000,
    num_train_epochs=1,
    optim="paged_adamw_32bit",
    warmup_ratio=0.1,
    lr_scheduler_type="linear",
    fp16=True,
    max_grad_norm=0.3,
    max_steps = -1,
    gradient_accumulation_steps=1,  # Reducing gradient accumulation steps to 1 for memory optimization
)

# Create the SFTTrainer with training arguments
trainer = SFTTrainer(
    model,
    train_dataset=dataset,
    formatting_func=formatting_prompts_func,
    peft_config=peft_config,
    args=training_args,
    max_seq_length=512

)

# Pre-process the model by upcasting the layer norms in float32 for more stable training
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

# Train the model
trainer.train()


(…)degen-350M-mono/resolve/main/config.json:   0%|          | 0.00/999 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/797M [00:00<?, ?B/s]

(…)-mono/resolve/main/tokenizer_config.json:   0%|          | 0.00/240 [00:00<?, ?B/s]

(…)odegen-350M-mono/resolve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

(…)odegen-350M-mono/resolve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

(…)en-350M-mono/resolve/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

(…)350M-mono/resolve/main/added_tokens.json:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

(…)ono/resolve/main/special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/111272 [00:00<?, ? examples/s]

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a CodeGenTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
100,2.1964
200,2.0652
300,1.9939
400,1.9663
500,2.0269
600,1.9295
700,1.9089
800,2.0329
900,2.0087
1000,1.9283


TrainOutput(global_step=55636, training_loss=1.7573995420853747, metrics={'train_runtime': 17617.2965, 'train_samples_per_second': 6.316, 'train_steps_per_second': 3.158, 'total_flos': 9.971404632332698e+16, 'train_loss': 1.7573995420853747, 'epoch': 1.0})

This code cell saves the trained model to a directory named "outputs." It also checks for distributed or parallel training and handles the saving process accordingly.


In [None]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model
# Take care of distributed/parallel training

model_to_save.save_pretrained("outputs")

In this code cell, the model's configuration for Progressive Embedding Fine-Tuning (PEFT) is loaded from the "outputs" directory using the LoraConfig class. Then, a new model is instantiated with PEFT applied, using the loaded model and the PEFT configuration.


In [None]:
from peft import get_peft_model

lora_config = LoraConfig.from_pretrained('outputs')
model = get_peft_model(model, lora_config)

In this code cell, the provided prompts are used to generate responses from the model. The model is moved to the GPU device specified by `device`. The responses are generated for each prompt, ensuring they do not exceed the maximum token limit specified by `max_token_limit`. The generated responses are then printed for each prompt.


In [None]:
device = "cuda:0"

# Move the model to the GPU
model.to(device)

prompts = [
    "Add 5 and 7.",
    "Multiply 3 by 9.",
    "Write code to print 'Hello, world!' in Python.",
    "Calculate the square root of 16.",
    "Find the result of 12 divided by 4.",
    "Write a Python program to check if a number is even or odd.",
]

# Initialize an empty list to store the model's responses
responses = []

# Maximum token limit for responses
max_token_limit = 100  # Adjust this limit as needed

# Loop through the prompts and generate responses
for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=max_token_limit, num_return_sequences=1, no_repeat_ngram_size=2)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    responses.append(response)

# Print the responses
for i, response in enumerate(responses):
    print(f"PROMPT {i + 1}:\n{prompts[i]}\nRESPONSE:\n{response}\n")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


PROMPT 1:
Add 5 and 7.
RESPONSE:
Add 5 and 7.
#
print(f"The sum of the numbers is {sum(numbers)}")

n = int(input("Enter the number of elements: "))
for i in range(0, n):
    print("Element: ", end="")

    element = input()


PROMPT 2:
Multiply 3 by 9.
RESPONSE:
Multiply 3 by 9.

# def multiply(a, b):
    
def multiply_3(x, y):

    return x * y


print(multipy(3, 9))


PROMPT 3:
Write code to print 'Hello, world!' in Python.
RESPONSE:
Write code to print 'Hello, world!' in Python.
#
print('Hello', 'world!')

"""
Output:
Hello world!
 """


PROMPT 4:
Calculate the square root of 16.
RESPONSE:
Calculate the square root of 16.

# In[ ]:


import math
print(math.sqrt(16))



PROMPT 5:
Find the result of 12 divided by 4.
RESPONSE:
Find the result of 12 divided by 4.

# In[ ]:


def divide(x, y):
    return x / y
print(divide(12, 4))



PROMPT 6:
Write a Python program to check if a number is even or odd.
RESPONSE:
Write a Python program to check if a number is even or odd.

# def is_even(

**Conclusion:**

Overall, the model performed decently.

The responses contain both relevant information based on the prompts and unrelated code.

It seems that the model generated code beyond the desired response.

In the next steps, we may want to train for more epochs to get better results.



In [None]:
!pip install -q -U git+https://github.com/aiplanethub/genai-stack.git
!pip install -q -U langchain


  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [None]:
import os
from getpass import getpass

api_key = getpass("Enter OpenAI API Key:")
os.environ['OPENAI_API_KEY'] = api_key

Enter OpenAI API Key:··········


In [None]:
from genai_stack.stack.stack import Stack
from genai_stack.etl.langchain import LangchainETL
from genai_stack.embedding.langchain import LangchainEmbedding
from genai_stack.vectordb.chromadb import ChromaDB
from genai_stack.prompt_engine.engine import PromptEngine
from genai_stack.model.gpt3_5 import OpenAIGpt35Model
from genai_stack.retriever.langchain import LangChainRetriever
from genai_stack.memory.langchain import ConversationBufferMemory

In [None]:
# Create a list of websites for ETL
websites = [
    "https://github.com/DataTalksClub/machine-learning-zoomcamp"
    ]

etl = LangchainETL.from_kwargs(name="WebBaseLoader",
                               fields={"web_path": websites
                                       }
                               )

In [None]:
config = {
    "model_name": "sentence-transformers/all-mpnet-base-v2",
    "model_kwargs": {"device": "cpu"},
    "encode_kwargs": {"normalize_embeddings": False},
    }

embedding = LangchainEmbedding.from_kwargs(name="HuggingFaceEmbeddings", fields=config)

In [None]:
chromadb = ChromaDB.from_kwargs()

In [None]:
llm = OpenAIGpt35Model.from_kwargs(parameters={"openai_api_key": api_key})

In [None]:
retriever=LangChainRetriever.from_kwargs()

Stack(
    etl=etl,
    embedding=embedding,
    vectordb=chromadb,
    model=llm,
    prompt_engine=PromptEngine.from_kwargs(should_validate=False),
    retriever=retriever,
    memory=ConversationBufferMemory.from_kwargs(),
    )

<genai_stack.stack.stack.Stack at 0x79649658f220>

In [None]:
etl.run()

In [None]:

prompts = [
    "Could you provide an overview of ML Zoomcamp? What is its primary focus, and what can participants expect to learn from the program?",
    "What are the key concepts and topics covered in ML Zoomcamp's curriculum?",
    "To obtain a certificate from ML Zoomcamp, what are the specific requirements that participants need to fulfill?"
]

for prompt in prompts:
    response = retriever.retrieve(prompt)
    print("PROMPT:", prompt)
    print("ANSWER:", response['output'])
    print("\n")

PROMPT: Could you provide an overview of ML Zoomcamp? What is its primary focus, and what can participants expect to learn from the program?
ANSWER: ML Zoomcamp is a free online program offered by DataTalksClub that focuses on teaching participants about machine learning engineering. The program is designed to be completed in four months and covers various topics related to machine learning. Participants can expect to learn about the fundamentals of machine learning, regression and classification techniques, evaluation metrics, deploying machine learning models, decision trees and ensemble learning, neural networks and deep learning, serverless deep learning, and Kubernetes and TensorFlow serving. The program also includes hands-on projects and homework assignments to reinforce the learning. By the end of the program, participants will have gained practical skills in machine learning engineering and be able to apply their knowledge to real-world projects.


PROMPT: What are the key con