SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning

SAFE-QAQ is an end-to-end framework for audio-text fraud detection that leverages reinforcement learning to enable slow-thinking decision-making. Below are instructions for setting up the environment, training the model, and running experiments.

Overview

This repository contains the source code for SAFE-QAQ, which consists of three main stages:

Rule-Based Reinforcement Learning (Stage 1): Train a rule-based RL model.
Rejection Sampling Fine-Tuning (RSFT) and Length-Constrained Reinforcement Learning (LCRL) (Stage 2): Refine the model using rejection sampling and LCRL techniques.
Real-Time Fine-Tuning (Stage 3): Fine-tune the model for real-time inference.

The prompts for both real-time inference and training are defined in prompt.py.

Environment Setup

To set up the environment, follow the instructions provided in ms-swift.

Training and Inference Pipeline

Stage 1: Rule-Based Reinforcement Learning

Run the following script to train the initial rule-based RL model:

bash run_swift_grpo_stage1.sh

Stage 2: Rejection Sampling Fine-Tuning (RSFT) and Length-Constrained Reinforcement Learning (LCRL)

Rejection Sampling: Generate samples using:
```
bash sample.sh
```
Process the sampled data with:
```
bash process_samples.sh
```
Fine-Tuning with RSFT: Fine-tune the model using the processed data:
```
bash run_swift_sft_stage2_RSFT.sh
```
Length-Constrained Reinforcement Learning (LCRL): Further refine the model with LCRL:
```
bash run_swift_grpo_stage2_LCRL.sh
```

Stage 3: Real-Time Fine-Tuning

Perform real-time fine-tuning by running:

bash run_swift_grpo_stage3.sh

Additional Notes

The prompt.py file contains the definitions of prompts used during training and real-time inference.
Ensure all dependencies are installed as per the ms-swift documentation before running the scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
plugin.py		plugin.py
process_samples.py		process_samples.py
process_samples.sh		process_samples.sh
prompt.py		prompt.py
readme.md		readme.md
run_swift_grpo_stage1.sh		run_swift_grpo_stage1.sh
run_swift_grpo_stage2_LCRL.sh		run_swift_grpo_stage2_LCRL.sh
run_swift_grpo_stage3.sh		run_swift_grpo_stage3.sh
run_swift_sft_stage2_RSFT.sh		run_swift_sft_stage2_RSFT.sh
sample.py		sample.py
sample.sh		sample.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning

Overview

Environment Setup

Training and Inference Pipeline

Stage 1: Rule-Based Reinforcement Learning

Stage 2: Rejection Sampling Fine-Tuning (RSFT) and Length-Constrained Reinforcement Learning (LCRL)

Stage 3: Real-Time Fine-Tuning

Additional Notes

About

Uh oh!

Releases

Packages

Languages

License

Control-derek/SAFE-QAQ

Folders and files

Latest commit

History

Repository files navigation

SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning

Overview

Environment Setup

Training and Inference Pipeline

Stage 1: Rule-Based Reinforcement Learning

Stage 2: Rejection Sampling Fine-Tuning (RSFT) and Length-Constrained Reinforcement Learning (LCRL)

Stage 3: Real-Time Fine-Tuning

Additional Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages