CodeGenFL

Fine-tuning Code Generation models on separate clients holding different datasets, then applying federated learning techniques to learn global fine-tuned model

Code Organization

experiments/

This folder contains all of the Google Colab notebooks within which we conducted our model experiments.

CodeParrot Experiments.ipynb: experiments on CodeParrot models w/ IBM Project CodeNet and StarCoder datasets
CodeGen Experiments.ipynb: experiments on CodeGen model w/ IBM Project CodeNet dataset
MDPP Eval Dataset.ipynb: experiments on DeciCoder model w/ MBPP dataset
Data and Visualizations [for Exp2].ipynb: creating visualizations for usage in report (loss functions, etc)
Aggregate and Save.ipynb: for testing with aggregating the model
IBM Data Preview.ipynb: for investigation of IBM dataset
model_sizes.ipynb: for computing the size of models before/after quantization, and trainable params before/after lora

helpers/

This folder contains the helper functions used across our experiments.

We standardized commonly used code (training loop, filtered generations, etc)

Please reference docstrings of functions themselves for more details.

train.py: main training loop which trains and saves models to local directories.
train_alt.py: alternative version of training loop
fl_impl.py: our implementation of FedAvg on provided clients
evaluation.py: contains helper functions for generations, humaneval, rouge, etc.

References

Models:

CodeParrot-Small: https://huggingface.co/codeparrot/codeparrot-small
CodeParrot: https://huggingface.co/codeparrot/codeparrot
CodeGen: https://huggingface.co/Salesforce/codegen-2B-mono
DeciCoder: https://huggingface.co/Deci/DeciCoder-1b

Datasets:

StarCoder: https://huggingface.co/bigcode/starcoder
IBM Project CodeNet: https://developer.ibm.com/exchanges/data/all/project-codenet/
HumanEval: https://github.com/openai/human-eval
MBPP: https://huggingface.co/datasets/mbpp

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
experiments		experiments
helpers		helpers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeGenFL

Code Organization

References

Models:

Datasets:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeGenFL

Code Organization

References

Models:

Datasets:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages