DIGGER

Overview

In this work, we introduce a universal optimization framework, denoted as Digger, designed to test whether a sample has been learned by the target LLM. We conduct a thorough feature study to understand the characteristics of sample loss and the rate of change in sample loss as samples are learned by the LLM. Based on these characteristics, we formulate the difference in loss change as an indicator to distinguish between samples that have been learned by the LLM and those that have not.

Dependencies:

We develop the codes on Windows operation system, and run the codes on Ubuntu 20.04. The codes depend on Python 3.10.9. Other packages (e.g., transformers) can be found in the ./requirements.txt.

Usage

1. Data processing

Put the book in txt format into the specified directory, cut out n paragraphs, the length of each paragraph is l.

  cd Datasets
  python build_samples_set.py

The sample set is then randomly sliced proportionally and book lists are constructed for each dataset.

  python build_dataset.py

2. Fine-tune LLM & get loss

  cd Digger
  python finetune.py
  Parameter model_type: 
       benchmark:  for Reference LLM
       test: for vanilla-tuned
       union: for Reference-tuned
  python get_loss.py

All loss values are saved in batches as npy files and stored in the outputs directory

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Datasets		Datasets
Digger		Digger
preliminary_study		preliminary_study
real_world		real_world
README.md		README.md
config		config
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIGGER

Overview

Dependencies:

Usage

1. Data processing

2. Fine-tune LLM & get loss

About

Releases

Packages

Languages

Dirtyboy1029/DIGGER

Folders and files

Latest commit

History

Repository files navigation

DIGGER

Overview

Dependencies:

Usage

1. Data processing

2. Fine-tune LLM & get loss

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages