LLMs-See-Further

Aim to push the boundaries of LLM’s capabilities by enabling them to handle tasks requiring broader context due to their limited ”memory” of past inform

1. Implementation of Self-Extend in LLM Maybe LongLM: Self-Extend LLM Context Window.

1.1 Overview

The innate capacity of LLMs to manage lengthy contexts without fine-tuning is evoked by this work. The use of Large Language Models (LLMs) on lengthy input sequences for inference may be restricted due to the training sequence's limited length.

Building bi-level attention information—the group level and the neighbor level—is the fundamental concept. Since the new model computes the two levels using the self-attention of the original model, no training is necessary.

1.2. How to Use SelfExtend

1.2.1 Setup

For current Llama Implementation, the python packages used are:

transformers==4.38.2
flash_attn==2.5.6

Installation

Clone the repository to your machine and copy your modeling files into the cloned repo directory.

1.2.2 Run

import SelfExtend

# Load your model, e.g., loaded_model = AutoModelForCausalLM.from_pretrained(model_path) 

# group size, neighbor window. 

SelfExtend.apply(loaded_model, group_size, window_size, enable_flash_attention=False)

# Inference, e.g., loaded_model.generate(...)

enable_flash_attention=False by default, you may set enable_flash_attention=True, if the model is loaed with FlashAttention enabled.

We use passkeyretrieval as an example to show how to use self-extend. You may check example.py:

python example.py

1.3 How to choose the group_size and neighbor_window

The ideas that follow are grounded in the implementation:

2~64 are reasonable for group_size using Llama-2 as the basic model; 512~1536 are possible for neighbor_window. However, in many situations, smaller neighbor windows and greater group sizes are equally beneficial.
When selecting group_size and neighbor_window, the basic approach is to make sure the length of the input sequence is within the maximum extended window size (for llama-2, this would be (4096 - neighbor_window) * group_size + neighbor_window).
It could be possible to attempt the least group size [determined by G * (L- w_n) + w_n] for a series of length L first, and then see whether a bigger group works better.

1.3.1 SelfExtend on 'Needle in a Haystack'

1.3.2 Emperical Rule:

SelfExtend is not overly sensitive to hyperparameter selection. One could use a representative task to find proper hyperparameters. Or direcly follow our empirical inequality: $(\frac{1}{2} \sim \frac{2}{3}) \times L > W + \frac{N-W}{G}$

1.4 Possible issues unrelated to Self-Extend:

Gemma-7b has to be loaded in bfloat16. But Gemma-2b still works well with float16.
If using transformers 4.36, the default attention used by Llama is LlamaSpdaAttention rather than LlamaSpdaAttention. Be careful about this and make sure you replace the forward method with the correct class.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Results		Results
Self-Extend-Training		Self-Extend-Training
Testing		Testing
LICENSE		LICENSE
README.md		README.md
Report.pdf		Report.pdf
SelfExtend.py		SelfExtend.py
example.py		example.py
passkey_examples.jsonl		passkey_examples.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs-See-Further

1. Implementation of Self-Extend in LLM Maybe LongLM: Self-Extend LLM Context Window.

1.2. How to Use SelfExtend

1.2.1 Setup

Installation

1.2.2 Run

1.3 How to choose the group_size and neighbor_window

1.3.1 SelfExtend on 'Needle in a Haystack'

1.3.2 Emperical Rule:

1.4 Possible issues unrelated to Self-Extend:

About

Releases

Packages

Languages

License

bhanmrinal/LLMs-See-Further

Folders and files

Latest commit

History

Repository files navigation

LLMs-See-Further

1. Implementation of Self-Extend in LLM Maybe LongLM: Self-Extend LLM Context Window.

1.2. How to Use SelfExtend

1.2.1 Setup

Installation

1.2.2 Run

1.3 How to choose the group_size and neighbor_window

1.3.1 SelfExtend on 'Needle in a Haystack'

1.3.2 Emperical Rule:

1.4 Possible issues unrelated to Self-Extend:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages