# Installing Python Packages

This code is used to install Python packages in a notebook using the pip package manager. The packages being installed are `datasets`, `transformers`, `trl`, `peft`, and `accelerate`.


In [None]:
%pip install datasets transformers trl peft
%pip install accelerate

# Importing Modules and Libraries in Python
## Modules and Libraries
- `os`: This module provides a way of using operating system-dependent functionality in Python.
- `torch`: This library is the core package for tensor computation and deep learning algorithms used with PyTorch.
- `transformers`: This library provides state-of-the-art natural language processing (NLP) architectures, pre-trained models, and fine-tuning tools.
- `datasets`: This library provides an easy-to-use and efficient way to work with various datasets for machine learning tasks.
- `trl`: This library provides tools for training models with self-training and fine-tuning techniques.

In [None]:
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, AutoConfig, TrainingArguments, Trainer
from datasets import load_dataset
from trl import SFTTrainer

# Model Configuration

In this section, we explore the configuration of a language model using Hugging Face's `AutoConfig` class. The model in focus is `"microsoft/Phi-3-mini-128k-instruct"`, which is likely a version of the Phi-3 model optimized for instruction-based tasks with a reduced size.

## Initial Configuration Load

```python
config = AutoConfig.from_pretrained("microsoft/Phi-3-mini-128k-instruct", trust_remote_code=True)
```

- **Purpose**: Retrieves the default configuration settings for the specified model from the Hugging Face model hub. The `trust_remote_code=True` parameter enables the execution of any code found in the model's repository, allowing for dynamic adjustments to the configuration if needed.

## Custom Configuration Adjustments

The subsequent lines modify several critical parameters of the model's configuration:

```python
config.max_position_embeddings = 256
config.num_attention_heads = 8
config.num_hidden_layers = 3
config.tie_words_embeddings = True
config.hidden_size = 128
config.intermediate_size = 512
```

- **`max_position_embeddings`**: Sets the maximum number of tokens the model can process in a single input sequence to 256.
- **`num_attention_heads`**: Specifies the number of attention heads in the model's multi-head attention mechanism to 8.
- **`num_hidden_layers`**: Establishes the depth of the model by setting the number of hidden layers to 3.
- **`tie_words_embeddings`**: Indicates whether word embeddings should be tied to the output layer weights, facilitating parameter sharing between embedding and output layers.
- **`hidden_size`**: Defines the size of the hidden layers to 128 units.
- **`intermediate_size`**: Sets the dimensionality of the intermediate layers' activations to 512 units.

## Rope Scaling Factor Adjustment

The final segment of the code modifies the scaling factors utilized in the model's attention mechanism:

```python
required_length = config.hidden_size // (config.num_key_value_heads * 2)
config.rope_scaling['long_factor'] = config.rope_scaling['long_factor'][:required_length]
config.rope_scaling['short_factor'] = config.rope_scaling['short_factor'][:required_length]
```

- **Rope Scaling Factors**: These modifications involve trimming the 'long_factor' and 'short_factor' within the `rope_scaling` dictionary. The calculation of `required_length` ensures that these factors align with the model's current configuration, particularly concerning the number of key-value attention heads (`num_key_value_heads`) and the hidden size.


In [None]:
config = AutoConfig.from_pretrained("microsoft/Phi-3-mini-128k-instruct", trust_remote_code=True)
config.max_position_embeddings = 256
config.num_attention_heads = 8
config.num_hidden_layers = 3
config.tie_words_embeddings = True
config.hidden_size = 128
config.intermediate_size = 512
config.num_attention_heads = 8
config.num_key_value_heads = 8

required_length = config.hidden_size // (config.num_key_value_heads * 2)
config.rope_scaling['long_factor'] = config.rope_scaling['long_factor'][:required_length]
config.rope_scaling['short_factor'] = config.rope_scaling['short_factor'][:required_length]

# Function: Count Model Parameters

This function, `count_model_params`, is designed to calculate and display the total number of parameters and the number of trainable parameters within a given PyTorch model. It serves as a utility function to understand the complexity and training requirements of a neural network model.

## Description

The `count_model_params` function performs two primary calculations:
- **Total Parameters**: Calculates the total number of parameters across all layers of the model. It iterates through each parameter in the model, sums up the number of elements (size of the tensor) for each, and then totals these values.
- **Trainable Parameters**: Identifies and sums the number of parameters that require gradients, indicating they are part of the model that will be updated during training. This calculation filters out non-trainable parameters before summing their element counts.

## Purpose

This function is invaluable for debugging and model selection. Knowing the number of parameters, especially trainable ones, allows for an assessment of the computational cost associated with training the model and ensures it aligns with available resources.


In [None]:
def count_model_params(model):
    total_params = sum(prams.numel() for prams in model.parameters())
    trainable_parms = sum(prams.numel() for prams in model.parameters() if prams.requires_grad)
    print(total_params, trainable_parms)

initialize model

In [None]:
my_model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
my_model.to('cuda')

print model config and parameters

In [None]:
print(my_model.config)
count_model_params(my_model)

tokenizer

In [None]:
t = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct", trust_remote_code=True)

dataset

In [None]:
dataset = load_dataset('HuggingFaceTB/cosmopedia-20k', split="train")
dataset = dataset.shuffle(seed=42)
print(f'Number of prompts: {len(dataset)}')
print(f'Column names are: {dataset.column_names}')

formating dataset

In [None]:
def create_prompt_formats(sample):
    output_texts = []
    for i in range(len(sample['text'])):
        formatted_prompt = sample['text'][i]
        output_texts.append(formatted_prompt)
    return output_texts

# training time

In [None]:
args = TrainingArguments(
    per_device_train_batch_size=8,
    gradient_accumulation_steps=1,
    gradient_checkpointing=True,
    learning_rate=2e-5,
    lr_scheduler_type="cosine",
    num_train_epochs=8,
    max_steps=1000,
    fp16=True,
    evaluation_strategy="steps",
    logging_steps=10,
    save_steps=1000,
    optim="paged_adamw_32bit",
    output_dir="pretrained-model",
    push_to_hub=False,
    report_to="none",
)

prepare dataset

In [None]:
train_d = d["train"].select([i for i in range(1000)])
eval_d = d["validation"].select([i for i in range(10)])
print(train_d)
print(eval_d)

In [None]:
trainer = SFTTrainer(
    model=my_model,
    train_dataset=train_d,
    eval_dataset=eval_d,
    tokenizer=t,
    args=args,
    # dataset_text_field="text",
    max_seq_length=256,
    formatting_func=create_prompt_formats,
)

In [None]:
%pip install bitsandbytes

In [None]:
trainer.train()

In [None]:
trainer.save_model("my_model")

In [None]:
from huggingface_hub import notebook_login
notebook_login()

In [None]:
from huggingface_hub import create_repo, HfApi

api = HfApi()

create_repo(
 repo_id = "Vortex4ai/phi-from-scratch",
 repo_type="model", # model
 exist_ok=True,
 token="hf_***********************"
)

# Upload folder files
api.upload_folder(
 folder_path="/teamspace/studios/this_studio/my_model",
 repo_id="Vortex4ai/phi-from-scratch",
 token="hf_***********************"
)
