## Prerequisites

Before delving into the fine-tuning process, ensure that you have the following prerequisites in place:

1. **GPU**: [gemma-2b](https://huggingface.co/google/gemma-2b) - can be finetuned on T4(free google colab) while [gemma-7b](https://huggingface.co/google/gemma-7b) requires an A100 GPU.
2. **Python Packages**: Ensure that you have the necessary Python packages installed. You can use the following commands to install them:

Let's begin by checking if your GPU is correctly detected:

In [None]:
!nvidia-smi

Wed Jun 19 11:54:32 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Step 2 - Model loading
We'll load the model using QLoRA quantization to reduce the usage of memory


In [None]:
!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

Now we specify the model ID and then we load it with our previously defined quantization configuration.Now we specify the model ID and then we load it with our previously defined quantization configuration.

In [None]:
# if you are using google colab

import os
from google.colab import userdata
os.environ["HF_WRITE"] = userdata.get('HF_WRITE')

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# model_id = "google/gemma-7b-it"
# model_id = "google/gemma-7b"
model_id = "google/gemma-2b-it"
# model_id = "google/gemma-2b"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [None]:
def get_completion(query: str, model, tokenizer) -> str:
  device = "cuda:0"

  prompt_template = """
  <start_of_turn>user
  Name all the skills present in the following description in a single list. Response should have only the skills, no other information or words. Skills should be keywords, each being no more than 3 words.
  Below text is the Description:

  {query}
  <end_of_turn>\n<start_of_turn>model
"""
  prompt = prompt_template.format(query=query)

  encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

  model_inputs = encodeds.to(device)


  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  # decoded = tokenizer.batch_decode(generated_ids)
  decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
  return (decoded)

In [None]:
query_text = '''SANCORP is seeking FTE Level II Data Scientist to support the office of DoD Chief Digital and Artificial Intelligence Office (CDAO) Chief Technology Officer (CTO). CDAO CTO requires support in multiple functional areas to ensure deliverables associated with the CDAO Architecture Council, CTO Federation, and CTO Future Architecture Activities. The mission of the CDAO CTO is to accelerate the DoD's adoption of data, analytics, and AI to improve decision making across all levels of the department. The following are examples of responsibilities:
          Support development of insider threat strategy in support of protecting CDAO technical offerings; balance short-term wins with long-term investments to progressively mature CDAO’s defenses against insider threats.
          Lead coordination of policy and strategy related to insider threats with industry partners and other DoD components.
          Lead exploration of data sources that are relevant to measuring, identifying, and defending against insider threats.
          Provide technical leadership in developing capabilities to detect insider threats among large user communities, leveraging combination of statistical, classical machine learning, and deep learning methods.
          Sancorp Consulting LLC shall, in its discretion, modify or adjust the position to meet Sancorp’s changing needs. This job description is not a contract and may be adjusted as deemed appropriate at Sancorp’s sole discretion.
          Sancorp Consulting, LLC, is an SDVOSB and SBA 8(a) company seeking highly motivated and qualified professionals and offer an attractive salary and benefits package that includes: Medical, Dental, life and Disability Insurance; 401K, and holidays to ensure the highest quality of life for our employees. Please visit our website for more information at www.sancorpconsulting.com.
          Sancorp Consulting, LLC is an equal opportunity employer. At Sancorp Consulting, LLC we are committed to providing equal employment opportunities (EEO) to all employees and applicants without regard to race color, religion, sex, national origin, age, disability, or any other protected characteristic as defined by applicable law. We strive to create an inclusive and diverse workplace where everyone feels valued, respected, and supported."""
          '''
result = get_completion(query=query_text, model=model, tokenizer=tokenizer)
print(result)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.



  user
  Name all the skills present in the following description in a single list. Response should have only the skills, no other information or words. Skills should be keywords, each being no more than 3 words.
  Below text is the Description:

  SANCORP is seeking FTE Level II Data Scientist to support the office of DoD Chief Digital and Artificial Intelligence Office (CDAO) Chief Technology Officer (CTO). CDAO CTO requires support in multiple functional areas to ensure deliverables associated with the CDAO Architecture Council, CTO Federation, and CTO Future Architecture Activities. The mission of the CDAO CTO is to accelerate the DoD's adoption of data, analytics, and AI to improve decision making across all levels of the department. The following are examples of responsibilities:
          Support development of insider threat strategy in support of protecting CDAO technical offerings; balance short-term wins with long-term investments to progressively mature CDAO’s defenses aga

## Step 3 - Load dataset for finetuning

### Lets Load the Dataset

For this tutorial, we will fine-tune Gemma-2B-IT Instruct for code generation.

The dataset structure should resemble the following:

```json
{
  ...keys,
  "RSD Name/Skill Tag": "Skill 1", "Skill 2", "Skill 3", "Skill 4", ...
  "Skill Statement/Task": "Task 1", "Task 2", "Task 3", "Task 4", ...
}
```

In [None]:
from datasets import load_dataset
# Combined OSN taxanomy dataset consists of comp, ind, and pr occupations
dataset = load_dataset("Phanindra-max/osn_combined", split="train")
dataset

Dataset({
    features: ['Unnamed: 0', 'Canonical URL', 'RSD Name', 'Author', 'Skill Statement', 'Category', 'Keywords', 'Standards', 'Certifications', 'Occupation Major Groups', 'Occupation Minor Groups', 'Broad Occupations', 'Detailed Occupations', 'O*Net Job Codes', 'Employers', 'Alignment Name', 'Alignment URL', 'Alignment Framework'],
    num_rows: 932
})

In [None]:
df = dataset.to_pandas()
df.head(10)

Unnamed: 0.1,Unnamed: 0,Canonical URL,RSD Name,Author,Skill Statement,Category,Keywords,Standards,Certifications,Occupation Major Groups,Occupation Minor Groups,Broad Occupations,Detailed Occupations,O*Net Job Codes,Employers,Alignment Name,Alignment URL,Alignment Framework
0,0,https://osmt.wgu.edu/api/skills/e50fb44e-9a8b-...,Contextual Analysis,Western Governors University,Analyze a wide range of business contexts for ...,Business Ethics,Business Ethics; Professional_Ethics; Analysis...,ISTE_EdLeaders_5a; InTASC_3a; InTASC_3d; InTAS...,,11-0000; 13-0000; 15-0000; 25-0000; 37-0000; 3...,11-1000; 11-2000; 11-3000; 11-9000; 13-1000; 1...,11-1010; 11-1020; 11-2020; 11-3010; 11-3050; 1...,11-1011; 11-1021; 11-2022; 11-3012; 11-3051; 1...,,,Business Ethics,https://skills.emsidata.com/skills/KS1218P66BG...,
1,1,https://osmt.wgu.edu/api/skills/2c83604e-d247-...,Business Ethics Strategies Analysis,Western Governors University,Analyze business contexts for strategies to na...,Business Ethics,Business Ethics; Professional_Ethics; Analysis...,ISTE_EdLeaders_5a; InTASC_3a; InTASC_3d; InTAS...,,11-0000; 13-0000; 15-0000; 25-0000; 37-0000; 3...,11-1000; 11-2000; 11-3000; 11-9000; 13-1000; 1...,11-1010; 11-1020; 11-2020; 11-3010; 11-3050; 1...,11-1011; 11-1021; 11-2022; 11-3051; 11-3071; 1...,,,Business Ethics,https://skills.emsidata.com/skills/KS1218P66BG...,
2,2,https://osmt.wgu.edu/api/skills/ab1014bb-3d48-...,Business Context Ethics Analysis,Western Governors University,Analyze a wide range of business contexts for ...,Business Ethics,Business Ethics; Professional_Ethics; Analysis...,ISTE_EdLeaders_5a; InTASC_3a; InTASC_3d; InTAS...,,11-0000; 13-0000; 15-0000; 25-0000; 37-0000; 3...,11-1000; 11-2000; 11-3000; 11-9000; 13-1000; 1...,11-1010; 11-1020; 11-2020; 11-3010; 11-3050; 1...,11-1011; 11-1021; 11-2022; 11-3012; 11-3051; 1...,,,Business Ethics,https://skills.emsidata.com/skills/KS1218P66BG...,
3,3,https://osmt.wgu.edu/api/skills/df5d6e14-3df1-...,Create a Plan to Achieve Goals,Western Governors University,Create a plan to achieve self-motivated goals.,Self-Motivation,Self-Motivation; Social Emotional Learning (SE...,UDL_3.9,,15-0000,15-1200,15-1210; 15-1230; 15-1240; 15-1250; 15-1290,15-1211; 15-1231; 15-1232; 15-1244; 15-1245; 1...,,,Self-Motivation,https://skills.emsidata.com/skills/ESED820E606...,Lightcast Open Skills Library
4,4,https://osmt.wgu.edu/api/skills/b599cbbf-6a58-...,Identify the Benefits of Self-Motivated Goals,Western Governors University,Identify the benefits of achieving self-motiva...,Self-Motivation,Self-Motivation; Social Emotional Learning (SE...,UDL_3.9,,15-0000,15-1200,15-1240; 15-1250; 15-1290,15-1244; 15-1245; 15-1251; 15-1256; 15-1257; 1...,,,Self-Motivation,https://skills.emsidata.com/skills/ESED820E606...,
5,5,https://osmt.wgu.edu/api/skills/ad18fa7d-71cd-...,Create Self-Motivated Goals,Western Governors University,Create self-motivated goals.,Self-Motivation,Self-Motivation; Social Emotional Learning (SE...,,,15-0000,15-1200,15-1250,15-1251,,,Self-Motivation,https://skills.emsidata.com/skills/ESED820E606...,
6,6,https://osmt.wgu.edu/api/skills/6c564ce6-4f5e-...,Identify Self-Motivation Activities,Western Governors University,Identify activities that strengthen self-motiv...,Self-Motivation,Self-Motivation; Social Emotional Learning (SE...,,,15-0000,15-1200,15-1240; 15-1250; 15-1290,15-1244; 15-1245; 15-1251; 15-1256; 15-1257; 1...,,,Self-Motivation,https://skills.emsidata.com/skills/ESED820E606...,
7,7,https://osmt.wgu.edu/api/skills/be7d4c47-c0e6-...,Prevent Burnout,Western Governors University,Demonstrate strategies that prevent burnout.,Self-Motivation,Self-Motivation; Social Emotional Learning (SE...,,,15-0000,15-1200,15-1250,15-1251,,,Self-Motivation,https://skills.emsidata.com/skills/ESED820E606...,
8,8,https://osmt.wgu.edu/api/skills/1f05d4bf-5a49-...,Develop Action Plans,Western Governors University,Develop action plans designed to motivate and ...,Goal Oriented,21st_Century_Skills; Doing; SEL; Power_Skills_...,ISTE_Educators_4d; InTASC_3a; InTASC_3i; InTAS...,,11-0000; 13-0000; 15-0000; 17-0000; 19-0000; 2...,11-3000; 13-1000; 13-2000; 15-1200; 15-2000; 1...,11-3030; 13-1110; 13-1160; 13-2010; 13-2040; 1...,11-3031; 13-1111; 13-1161; 13-2011; 13-2041; 1...,,,Goal Oriented,https://skills.emsidata.com/skills/KSS30WWC1QZ...,
9,9,https://osmt.wgu.edu/api/skills/407c530e-56c5-...,Plan a Schedule in Advance,Western Governors University,Plan a schedule in advance to include necessar...,Time Management,Time Management; Social Emotional Learning (SE...,,,15-0000; 19-0000; 33-0000,15-1200; 15-2000; 19-4000; 33-3000,15-1210; 15-1220; 15-1230; 15-1240; 15-1250; 1...,15-1221; 15-1231; 15-1232; 15-1251; 15-1259; 1...,,,Time Management,https://skills.emsidata.com/skills/KS44175745H...,


Instruction Fintuning - Prepare the dataset under the format of "prompt" so the model can better understand :
1. the function generate_prompt : take the instruction and output and generate a prompt
2. shuffle the dataset
3. tokenizer the dataset

### Formatting the Dataset

Now, let's format the dataset in the required [gemma instruction formate](https://huggingface.co/google/gemma-7b-it).

> Many tutorials and blogs skip over this part, but I feel this is a really important step.

```
<start_of_turn>user What is your favorite condiment? <end_of_turn>
<start_of_turn>model Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavor to whatever I'm cooking up in the kitchen!<end_of_turn>
```

You can use the following code to process your dataset and create a JSONL file in the correct format:

In [None]:
def generate_prompt(data_point):
    """Gen. input text based on a prompt, task instruction, (context info.), and answer

    :param data_point: dict: Data point
    :return: dict: tokenzed prompt
    """
    prefix_text = 'Name all the skills present in the following description in a single list. Response should have only the skills, no other information or words. Skills should be keywords, each being no more than 3 words. Below text is the Description:\n\n'
    # Samples with additional context into.
    text = f"""<start_of_turn>user {prefix_text} {data_point["Skill Statement"]} <end_of_turn>\n<start_of_turn>model {data_point["RSD Name"]} <end_of_turn>"""
    return text

# add the "prompt" column in the dataset
text_column = [generate_prompt(data_point) for data_point in dataset]
dataset = dataset.add_column("prompt", text_column)

We'll need to tokenize our data so the model can understand.


In [None]:
dataset = dataset.shuffle(seed=1234)  # Shuffle dataset here
dataset = dataset.map(lambda samples: tokenizer(samples["prompt"]), batched=True)

Split dataset into 90% for training and 10% for testing

In [None]:
dataset = dataset.train_test_split(test_size=0.2)
train_data = dataset["train"]
test_data = dataset["test"]

### After Formatting, We should get something like this

```json
{
"text":"<start_of_turn>user Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] <end_of_turn>
<start_of_turn>model # Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum <end_of_turn>",
"instruction":"Create a function to calculate the sum of a sequence of integers",
"input":"[1, 2, 3, 4, 5]",
"output":"# Python code def sum_sequence(sequence): sum = 0 for num in,
 sequence: sum += num return sum",
"prompt":"<start_of_turn>user Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] <end_of_turn>
<start_of_turn>model # Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum <end_of_turn>"

}
```

While using SFT (**[Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/main/en/sft_trainer)**) for fine-tuning, we will be only passing in the “text” column of the dataset for fine-tuning.

In [None]:
print(test_data)

Dataset({
    features: ['Unnamed: 0', 'Canonical URL', 'RSD Name', 'Author', 'Skill Statement', 'Category', 'Keywords', 'Standards', 'Certifications', 'Occupation Major Groups', 'Occupation Minor Groups', 'Broad Occupations', 'Detailed Occupations', 'O*Net Job Codes', 'Employers', 'Alignment Name', 'Alignment URL', 'Alignment Framework', 'prompt', 'input_ids', 'attention_mask'],
    num_rows: 187
})


## Step 4 - Apply Lora  
Here comes the magic with peft! Let's load a PeftModel and specify that we are going to use low-rank adapters (LoRA) using get_peft_model utility function and  the prepare_model_for_kbit_training method from PEFT.

In [None]:
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [None]:
print(model)

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear4bit(in_features=16384, out_features=2048, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
     

In [None]:
import bitsandbytes as bnb
def find_all_linear_names(model):
  cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
  lora_module_names = set()
  for name, module in model.named_modules():
    if isinstance(module, cls):
      names = name.split('.')
      lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names: # needed for 16-bit
      lora_module_names.remove('lm_head')
  return list(lora_module_names)

In [None]:
modules = find_all_linear_names(model)
print(modules)

['up_proj', 'q_proj', 'v_proj', 'down_proj', 'gate_proj', 'o_proj', 'k_proj']


In [None]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=64,
    lora_alpha=32,
    target_modules=modules,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

In [None]:
trainable, total = model.get_nb_trainable_parameters()
print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")

Trainable: 78446592 | total: 2584619008 | Percentage: 3.0351%


## Step 5 - Run the training!

Setting the training arguments:
* for the reason of demo, we just ran it for few steps (100) just to showcase how to use this integration with existing tools on the HF ecosystem.

In [None]:
# import transformers

# tokenizer.pad_token = tokenizer.eos_token


# trainer = transformers.Trainer(
#     model=model,
#     train_dataset=train_data,
#     eval_dataset=test_data,
#     args=transformers.TrainingArguments(
#         per_device_train_batch_size=1,
#         gradient_accumulation_steps=4,
#         warmup_steps=0.03,
#         max_steps=100,
#         learning_rate=2e-4,
#         fp16=True,
#         logging_steps=1,
#         output_dir="outputs_mistral_b_finance_finetuned_test",
#         optim="paged_adamw_8bit",
#         save_strategy="epoch",
#     ),
#     data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
# )


### Fine-Tuning with qLora and Supervised Fine-Tuning

We're ready to fine-tune our model using qLora. For this tutorial, we'll use the `SFTTrainer` from the `trl` library for supervised fine-tuning. Ensure that you've installed the `trl` library as mentioned in the prerequisites.

In [None]:
#new code using SFTTrainer
import transformers

from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token
torch.cuda.empty_cache()

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=test_data,
    dataset_text_field="prompt",
    peft_config=lora_config,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=0.03,
        max_steps=20,
        learning_rate=2e-4,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        save_strategy="epoch",
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)



Map:   0%|          | 0/745 [00:00<?, ? examples/s]

Map:   0%|          | 0/187 [00:00<?, ? examples/s]



## Lets start training

In [None]:
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()



Step,Training Loss
1,7.9591
2,7.7477
3,5.7764
4,4.1055
5,3.3998
6,2.8781
7,2.4152
8,2.239
9,1.9632
10,1.8563


Checkpoint destination directory outputs/checkpoint-20 already exists and is non-empty. Saving will proceed but saved results may be invalid.


TrainOutput(global_step=20, training_loss=2.710095316171646, metrics={'train_runtime': 83.5073, 'train_samples_per_second': 0.958, 'train_steps_per_second': 0.239, 'total_flos': 71662433120256.0, 'train_loss': 2.710095316171646, 'epoch': 0.11})

 Share adapters on the 🤗 Hub

In [None]:
new_model = "gemma-Code-Instruct-Finetune-test" #Name of the model you will be pushing to huggingface model hub

In [None]:
trainer.model.save_pretrained(new_model)



In [None]:
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
merged_model= PeftModel.from_pretrained(base_model, new_model)
merged_model= merged_model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained("merged_model",safe_serialization=True)
tokenizer.save_pretrained("merged_model")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [27]:
# Push the model and tokenizer to the Hugging Face Model Hub
# merged_model.push_to_hub(new_model, use_temp_dir=False)
# tokenizer.push_to_hub(new_model, use_temp_dir=False)

KeyboardInterrupt: 

## Test out Finetuned Model

In [28]:
text = '''As a Research Engineer at Whissle LLC, you will play a pivotal role in bringing our cutting-edge research to life. Your work will involve implementing and experimenting with the latest research techniques, and developing tools and infrastructure that streamline the transition of research into viable products.


Key Responsibilities:

    Experiment with and adopt the latest research techniques in AI and machine learning, open-source and our own research.
    Develop, maintain, and enhance benchmarks for evaluating AI performance.
    Implement and continuously refine our agent architecture to improve functionality and efficiency.
    Contribute to the creation of a seamless experimental framework, supporting the integration of research findings into product development.


Qualifications:

    Solid background in software engineering and ML development, with expertise in optimized deployment and inference of multi-modal AI solutions (video, audio, speech LLMs).
    Proficiency in Python and C++, with experience in machine learning frameworks and dependencies (e.g., torch, tensorrt, cuda).
    Experience with creating dev tools, which include hosting on-demand micro-services.
    Strong skills in version control, code reviews, and CI/CD practices.
    Familiarity with automated integration and deployment environments.
    Exceptional problem-solving abilities and innovative thinking.
    Excellent teamwork and communication skills.


Preferred Experience:

    Previous role as a research engineer at leading general AI companies like Netflix, Google AI, Anthropic, or OpenAI or in a leading AI research organization.
    Extensive exposure to cutting-edge AI research, especially in multi-modal AI technologies.
    Track record of contributing to AI advancements through publications, patents, or open-source projects.
'''

In [29]:
result = get_completion(query=text, model=model, tokenizer=tokenizer)
print(result)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.



  user
  Name all the skills present in the following description in a single list. Response should have only the skills, no other information or words. Skills should be keywords, each being no more than 3 words.
  Below text is the Description:

  As a Research Engineer at Whissle LLC, you will play a pivotal role in bringing our cutting-edge research to life. Your work will involve implementing and experimenting with the latest research techniques, and developing tools and infrastructure that streamline the transition of research into viable products.


Key Responsibilities:

    Experiment with and adopt the latest research techniques in AI and machine learning, open-source and our own research.
    Develop, maintain, and enhance benchmarks for evaluating AI performance.
    Implement and continuously refine our agent architecture to improve functionality and efficiency.
    Contribute to the creation of a seamless experimental framework, supporting the integration of research find

In [30]:
result = get_completion(query=text, model=merged_model, tokenizer=tokenizer)
print(result)


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.



  user
  Name all the skills present in the following description in a single list. Response should have only the skills, no other information or words. Skills should be keywords, each being no more than 3 words.
  Below text is the Description:

  As a Research Engineer at Whissle LLC, you will play a pivotal role in bringing our cutting-edge research to life. Your work will involve implementing and experimenting with the latest research techniques, and developing tools and infrastructure that streamline the transition of research into viable products.


Key Responsibilities:

    Experiment with and adopt the latest research techniques in AI and machine learning, open-source and our own research.
    Develop, maintain, and enhance benchmarks for evaluating AI performance.
    Implement and continuously refine our agent architecture to improve functionality and efficiency.
    Contribute to the creation of a seamless experimental framework, supporting the integration of research find