This Colab notebook demonstrates how to set up and run federated fine-tuning of Large Language Models (LLMs) using the Flower framework with a pretrained LLama2 model on the Alpaca-GPT4 dataset. We will follow the steps provided in the Flower repository for the LLM FlowerTune example.


In [3]:
# Federated LLM Fine-tuning with Flower - LLM FlowerTune

# Cell 1: Clone the repository and set up the necessary directory
!git clone --depth=1 https://github.com/adap/flower.git
!mv flower/examples/llm-flowertune . && rm -rf flower
%cd llm-flowertune



Cloning into 'flower'...
remote: Enumerating objects: 2356, done.[K
remote: Counting objects: 100% (2356/2356), done.[K
remote: Compressing objects: 100% (1886/1886), done.[K
remote: Total 2356 (delta 518), reused 1461 (delta 309), pack-reused 0[K
Receiving objects: 100% (2356/2356), 59.08 MiB | 31.25 MiB/s, done.
Resolving deltas: 100% (518/518), done.
/content/llm-flowertune


In [None]:



# Cell 2: Install dependencies
!pip install -r requirements.txt



Collecting flwr[rest,simulation]<2.0,>=1.8.0 (from -r requirements.txt (line 1))
  Downloading flwr-1.8.0-py3-none-any.whl (330 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/330.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/330.1 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m330.1/330.1 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting flwr-datasets>=0.0.2 (from -r requirements.txt (line 2))
  Downloading flwr_datasets-0.1.0-py3-none-any.whl (39 kB)
Collecting hydra-core==1.3.2 (from -r requirements.txt (line 3))
  Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.5/154.5 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl==0.7.2 (from -r requirements.txt (line 4))
  Downloading trl-0.7.2-py3-none-any.whl (124 kB)



## Running the Federated LLM Fine-Tuning
We will execute the main script to start the federated learning simulation. The default configuration involves a 4-bit OpenLLaMA 7Bv2 model with 2 clients per round for 100 FL rounds. You can modify these settings directly in the command line.

In [None]:

# Cell 3: Run the federated learning simulation with default settings
!python main.py


### Customize Model and Training Parameters
You may want to test different configurations, such as changing the model size or the quantization bits. Here's how to customize the parameters.


In [None]:


# Cell 4: Example of running with a different model and configuration
!python main.py model.name="openlm-research/open_llama_3b_v2" model.quantization=8 num_rounds=50 fraction_fit.fraction_fit=0.25


## VRAM Consumption and Resource Management
The VRAM consumption varies depending on the model and settings. Adjusting the VRAM allocation allows you to manage how much GPU resources each client uses. Here's an example of how to allocate 50% of your GPU's VRAM to each client.


In [None]:


# Cell 5: Assign 50% of the GPU's VRAM to each client
!python main.py model.name="openlm-research/open_llama_3b_v2" model.quantization=4 client_resources.num_gpus=0.5



## Testing the Model
After training, use the following script to test your trained model by passing your questions. This helps verify the practical utility of the federated fine-tuned model.


In [None]:

# Cell 6: Test the trained model with a custom question
!python test.py --peft-path=/path/to/trained-model-dir/ --question="What is the ideal 1-day plan in London?"



# Conclusion
This notebook has walked you through setting up and running a federated LLM fine-tuning scenario using Flower. You can further experiment with different configurations and models to explore the capabilities of federated learning in privacy-preserving AI development.