GRPO Training for Liner Expression

A repository for doing GRPO Finetunning on Qwen2.5 0.5B model for solving linear eqautions. We generate custom dataset for this use case code for dataset generation can be found in this file

Infernece Using the trained model

The first step would be to clone the project using the following command: -

git clone https://github.com/Bhooyas/GRPO_Training.git

The next step is to install the requirements for the project. We do that using the following command: -

cd GRPO_Training
pip install -r requirements.txt

Then we can infer from the model using the following script: -

python run.py --model-name Bhooyas/Qwen2.5-0.5B-Instruct-linearexpression

This script will spin up gradio instance with chat ui where you can test the model. Some testing questions are as follows:

3x + 7 = 19

What is the capital on India?

Which is the 4th planet?

-56x + 9 = -47

Write python code for printing hello world.

Which planet is known as the red planet.

30x - 5 = 55

Training the Model

For training the model we use combination of SFT and GRPO Training with LoRA on the custom dataset created for Linear Equation solving.

The first step would be to clone the project using the following command: -

git clone https://github.com/Bhooyas/GRPO_Training.git

The next step is to install the requirements for the project. We do that using the following command: -

cd GRPO_Training
pip install -r requirements.txt

SFT Finetunning

The config for SFT Finetunning can be found in sft_config.yaml. We can run the traing using following command:

python train_sft.py --config sft_config.yaml

Note: This command may take some time to run based on the compute used.

GRPO Finetunning

The config for GRPO Finetunning can be found in grpo_config.yaml. We can run the traing using following command:

python train_grpo.py --config grpo_config.yaml

Note: This command may take some time to run based on the compute used.

Evalution

We can evaulate the model using following command:

python evaluate.py --config evaluate_config.yaml

Note: This command may take some time to run based on the compute used. You can list the models to evaluate on Linear Equation data in the evaluate_config.yaml

Results

The results for evaluation are as follows:

We can see a good jump in the results of the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRPO Training for Liner Expression

Infernece Using the trained model

Training the Model

SFT Finetunning

GRPO Finetunning

Evalution

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
evaluate_config.yaml		evaluate_config.yaml
grpo_config.yaml		grpo_config.yaml
llm_eval_grpo.png		llm_eval_grpo.png
requirements.txt		requirements.txt
run.py		run.py
sft_config.yaml		sft_config.yaml
train_grpo.py		train_grpo.py
train_sft.py		train_sft.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

GRPO Training for Liner Expression

Infernece Using the trained model

Training the Model

SFT Finetunning

GRPO Finetunning

Evalution

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages