-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Personal "cheatsheet" repository for my ideal machine learning tech-stack. I use this repository to play around and familiarize with ML libraries, advanced git and GitHub features, virtualization and so on 🤓.
- IDEs Plugins 🧰
- Machine Learning Libraries 🤖
- Environments 🌎
- CLI Utilities 👨💻
- High Performance Computing 🦾
- Git 🐱
- Python
- RainbowCSV
- Remote
- CoPilot
- GitLens
- Docker
- Jupiter
- Gitignore
- vscode-pdf
- GitToolBox
- CoPilot
- Docker
- NumPy - Math operations, manipulations, linear algebra and more.
- Pandas - Tabular data management.
- MatplotLib and Seaborn - All sorts of plots.
- OpenCV2, Pillow, and Sci-Kit Image - Image manipulation
PyTorch is currently the reference ML framework for Python.
Weights and Biases (W&B) allows to easily track experiments, performances, parameters and so on in a single place.
PyTorch Lightning gets rid of most of the usual PyTorch boilerplate code, like train/val/test loops, backward and optim steps and so on. It also allows to easily use powerful pytorch features and other libraries (like W&B) by inserting just few optional parameters here and there.
All sorts of popular image augmentations, like ColorJitter, ZoomBlur, Gaussian Noise... are implemented by albumentations.
Manipulation of tensors (reshaping, concatenating, ...) with einops is extremely intuitive and time-saving.
To quickly create interactive apps based on trained machine learning models, gradio and streamlit are among the most popular frameworks.
Conda allows to easily create and share virtual environments. The
command conda env export > environment.yml
creates a .yml file that can be used to create an identical virtual
environment.
Docker allows to emulate a whole operating system.
-
nvidia-smi
➡️ Check NVIDIA Cards current status -
ps
,top
,htop
➡️Check currently running processes -
nvitop
➡️Likenvidia-smi
, but better. -
tmux
➡️Terminal multiplexer, allows to easily detach jobs. -
~/.ssh/config
and~/.ssh/authorized_keys
files to define known host names and authorized ssh keys.
HPC clusters typically use a cluster management and job scheduling tool. Slurm allows to schedule jobs, handle priorities, design partitions and much more. Cheatsheet files for slurm are under the /slurm folder.
Taking the time to go through most of GitHub's Documentation at least once is very important. Here's a few features to keep in mind.
Important commits can be tagged. Then, jumping to a commit is easy as
Git Large File System allows to push bigger files to the GitHub repository. Careful: There is a global usage quota per GitHub account that goes across repositories.
The .github
directory allows to keep the landing page of the GitHub repository "clean" and includes:
- CONTRIBUTING.md ➡️ Guidelines to contribute to the repository.
- ISSUE_TEMPLATE.md ➡️ Template for issues.
- PULL_REQUEST_TEMPLATE.md ➡️Template for pull requests.
- README.md ➡️Repository's README (i.e. this) file.
- workflows ➡️Directory which contains .yaml files for GitHub actions.
GitHub Actions allows to execute custom actions automatically upon some triggers by some events (pull requests, pushes, issues opened, ...).
GitHub Pages allows to host a webpage for each GitHub repository.