coMind: Collaborative Machine Learning
For over a year we have been working in the intersection of machine learning and blockchain. This has led us to do extensive research in distributed machine learning algorithms. Federated averaging has a set of features that makes it perfect to train models in a collaborative way while preserving the privacy of sensitive data. In this repository you can learn how to start training ML models in a federated setup.
What can you expect to find here.
We have developed a custom optimizer for TensorFlow to easily train neural networks in a federated way (NOTE: everytime we refer to federated here, we mean federated averaging).
What is federated machine learning? In short, it is a step forward from distributed learning that can improve performance and training times. In our tutorials we explain in depth how it works, so we definitely encourage you to have a look!
In addition to this custom optimizer, you can find some tutorials and examples to help you get started with TensorFlow and federated learning. From a basic training example, where all the steps of a local classification model are shown, to more elaborated distributed and federated learning setups.
In this repository you will find 3 different types of files.
federated_averaging_optimizer.pywhich is the custom optimizer we have created to implement federated averaging in TensorFlow.
advanced_federated_classifier.pywhich are three basic and three advanced examples on how to train and evaluate TensorFlow models in a local, distributed and federated way.
Basic Distributed Classifier.ipynb,
Basic Federated Classifier.ipynbwhich are three IPython Notebooks where you can find the three basic examples named above and in depth documentation to walk you through.
- Python 3
- matplotlib (for the examples and tutorials)
Download and open the notebooks with Jupyter or Google Colab. The notebook with the local training example
Basic Classifier.ipynb and the python scripts
advanced_classifier.py can be run right away. For the others you will need to open three different shells. One of them will be executing the parameter server and the other two the workers.
For example, to run the
1st shell command should look like this:
python3 basic_distributed_classifier.py --job_name=ps --task_index=0
python3 basic_distributed_classifier.py --job_name=worker --task_index=0
python3 basic_distributed_classifier.py --job_name=worker --task_index=1
Follow the same steps for the
Check sockets to find an implementation with python sockets. The same idea as with MPI but in this case we only need to know the public IP of the chief worker, and a custom hook will take care of the synchronization for us!
Check this to see an easier implementation with keras!
Check this script to see how to generate CIFAR-10 TFRecords.
Troubleshooting and Help
coMind has public Slack and Telegram channels which are a great place to ask questions and all things related to federated machine learning.
Bugs and Issues
Have a bug or an issue? Open a new issue here on GitHub or join our community in Slack or Telegram.
The Federated Averaging algorithm is explained in more detail in the following paper:
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. In Conference on Artificial Intelligence and Statistics, 2017.
The datsets used in these examples were:
Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images.
Han Xiao, Kashif Rasul, Roland Vollgraf. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
coMind is an open source project for training privacy-preserving federated deep learning models.