Skip to content

Bhooyas/GRPO_Training

Repository files navigation

GRPO Training for Liner Expression

GRPO Training

Python Hugging Face

A repository for doing GRPO Finetunning on Qwen2.5 0.5B model for solving linear eqautions. We generate custom dataset for this use case code for dataset generation can be found in this file

Infernece Using the trained model

The first step would be to clone the project using the following command: -

git clone https://github.com/Bhooyas/GRPO_Training.git

The next step is to install the requirements for the project. We do that using the following command: -

cd GRPO_Training
pip install -r requirements.txt

Then we can infer from the model using the following script: -

python run.py --model-name Bhooyas/Qwen2.5-0.5B-Instruct-linearexpression

This script will spin up gradio instance with chat ui where you can test the model. Some testing questions are as follows:

3x + 7 = 19

What is the capital on India?

Which is the 4th planet?

-56x + 9 = -47

Write python code for printing hello world.

Which planet is known as the red planet.

30x - 5 = 55

Training the Model

For training the model we use combination of SFT and GRPO Training with LoRA on the custom dataset created for Linear Equation solving.

The first step would be to clone the project using the following command: -

git clone https://github.com/Bhooyas/GRPO_Training.git

The next step is to install the requirements for the project. We do that using the following command: -

cd GRPO_Training
pip install -r requirements.txt

SFT Finetunning

The config for SFT Finetunning can be found in sft_config.yaml. We can run the traing using following command:

python train_sft.py --config sft_config.yaml

Note: This command may take some time to run based on the compute used.

GRPO Finetunning

The config for GRPO Finetunning can be found in grpo_config.yaml. We can run the traing using following command:

python train_grpo.py --config grpo_config.yaml

Note: This command may take some time to run based on the compute used.

Evalution

We can evaulate the model using following command:

python evaluate.py --config evaluate_config.yaml

Note: This command may take some time to run based on the compute used. You can list the models to evaluate on Linear Equation data in the evaluate_config.yaml

Results

The results for evaluation are as follows: Eavaluation

We can see a good jump in the results of the model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages