Skip to content

This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".

Notifications You must be signed in to change notification settings

Nix07/finetuning

Repository files navigation

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

This repository contains the code used for the experiments in the paper Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking.

We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the task of entity tracking in Llama-7B, and in its fine-tuned variants - Vicuna-7B, Goat-7B, Float-7B.

Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model.

Please check finetuning.baulab.info for more information.

Methods

In order to discover the underlying mechanism for performing entity tracking task, we employed: 1) Path Patching (experiment_1/path_patching.py) and 2) Desiderata-based Component Masking (experiment_2/DCM.py). Both the methods are implemented using baukit, which can be easily adopted for other tasks.

Moreover, in order to uncover the reason behind the performance enhancement in fine-tuned models employing the same mechanism, we have introduced a novel approach called CMAP (Cross-Model Activation Patching). This method involves patching activations across models to elucidate the enhanced mechanisms. The notebook experiment_3/cmap.ipynb provides a demonstration on how to execute the complete experiment.

Note: You need to have the weights for the LLaMA-7b model which is under a non-commercial license. Use this form to request access to the model, if you do not have it already.

Setup

To get all the dependencies run:

conda env create -f environment.yml
conda activate finetuning

How to Cite

@inproceedings{prakash2023fine,
  title={Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking},
  author={Prakash, Nikhil and Shaham, Tamar Rott and Haklay, Tal and Belinkov, Yonatan and Bau, David},
  booktitle={Proceedings of the 2024 International Conference on Learning Representations},
  note={arXiv:2402.14811},
  year={2024}
}

About

This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published