# NeoFuzz

## About

NeoFuzz is a framework for generating and evaluating fuzzing inputs for the Lua interpreter using large language models.

The framework leverages GPT Neo to learn the characteristics of Lua code from large-scale datasets using Self Supervised Learning (SSL), and applies Proximal Policy Optimization (PPO) to further fine-tune the model for generating syntactically and semantically valid Lua scripts.

The goal is to maximize code coverage and discover edge cases in the Lua interpreter by producing diverse and effective fuzzing inputs through reinforcement learning and language modeling.

## Environment Setup

1. Install system dependencies: `make install-deps`
2. Install pip dependencies into new virtual environment: `make install-dev`
3. Use the created venv as your Jupyter kernel.

In [3]:
from src.classes.view.training_view import TrainingView
from src.classes.controller.training_env import TrainingEnv
from src.classes.enum.train_types import TrainType

from src.classes.log.logger import Logger
from src.classes.log.logging_env import LoggingEnv

# Initialize the logger
logger = Logger(LoggingEnv.DEV)

# Initialize the training environment
training_env = TrainingEnv()

# Start the training view
training_view = TrainingView(training_env)

# Data Preprocessing

Start the proprocessing process.

In [None]:
# Initialize training data
logger.info("Initializing training data...")
if training_view.init_data():
    logger.info("Training data initialized successfully.")
else:
    logger.error("Failed to initialize training data.")

## SSL Training


We start the initial SSL downstream training for generating LUA code.

In [8]:
phase = TrainType.SSL

# start ssl training
logger.info(f'Starting {phase.value} training...')

if training_view.start_training(phase):
    logger.info(f'{phase.value} training completed successfully.')
else:
    logger.error(f'Could not finish training for phase {phase.value}')

## Evaluate SSL Training

We can now determine the baseline validity of our generated samples.

In [None]:
from src.classes.view.train_eval_view import EvalView

train_eval_view = EvalView(training_env)

In [None]:
train_eval_view.evaluate_generations(TrainType.SSL)

## PPO Training - Structure

Proceeding to the next step: PPO fine-tuning to improve semantic correctness and syntactic validity.

In [None]:
phase = TrainType.PPO_STRUCTURE

if training_view.start_training(phase):
    logger.info(f'{phase.value} training completed successfully.')
else:
    logger.error(f'Could not finish training for phase {phase.value}')

## Evaluate PPO Training

Compare generations against SSL baseline.

In [None]:
train_eval_view.evaluate_generations(TrainType.PPO_STRUCTURE)

## AFL Evaluation

Before executing the fuzzing evaluation build the docker container for evaluation.

In [None]:
!FUZZER=gcov_afl TARGET=lua ./submodules/magma_neo_fuzz/tools/captain/build_gcov_docker.sh

**Note: Make sure to setup crash handling for AFL++**

To do so run the following command in a root shell: `echo core > /proc/sys/kernel/core_pattern`

In [4]:
from src.classes.view.afl_eval_view import AflEvalView

afl_eval_view = AflEvalView(training_env)

In [None]:
afl_eval_view.start_afl_eval_process(time_limit='5m')

## GPT Neo Evaluation

In [None]:
afl_eval_view.start_model_eval_process(t_limit='1m', t_type=TrainType.PPO_STRUCTURE)