A repository for doing GRPO Finetunning on Qwen2.5 0.5B model for solving linear eqautions. We generate custom dataset for this use case code for dataset generation can be found in this file
The first step would be to clone the project using the following command: -
git clone https://github.com/Bhooyas/GRPO_Training.gitThe next step is to install the requirements for the project. We do that using the following command: -
cd GRPO_Training
pip install -r requirements.txtThen we can infer from the model using the following script: -
python run.py --model-name Bhooyas/Qwen2.5-0.5B-Instruct-linearexpressionThis script will spin up gradio instance with chat ui where you can test the model. Some testing questions are as follows:
3x + 7 = 19
What is the capital on India?
Which is the 4th planet?
-56x + 9 = -47
Write python code for printing hello world.
Which planet is known as the red planet.
30x - 5 = 55
For training the model we use combination of SFT and GRPO Training with LoRA on the custom dataset created for Linear Equation solving.
The first step would be to clone the project using the following command: -
git clone https://github.com/Bhooyas/GRPO_Training.gitThe next step is to install the requirements for the project. We do that using the following command: -
cd GRPO_Training
pip install -r requirements.txtThe config for SFT Finetunning can be found in sft_config.yaml. We can run the traing using following command:
python train_sft.py --config sft_config.yamlNote: This command may take some time to run based on the compute used.
The config for GRPO Finetunning can be found in grpo_config.yaml. We can run the traing using following command:
python train_grpo.py --config grpo_config.yamlNote: This command may take some time to run based on the compute used.
We can evaulate the model using following command:
python evaluate.py --config evaluate_config.yamlNote: This command may take some time to run based on the compute used. You can list the models to evaluate on Linear Equation data in the evaluate_config.yaml
The results for evaluation are as follows:

We can see a good jump in the results of the model.