Skip to content
Peutlefaire edited this page Nov 21, 2022 · 1 revision

ML-Tech-Cheatsheet 📄

Personal "cheatsheet" repository for my ideal machine learning tech-stack. I use this repository to play around and familiarize with ML libraries, advanced git and GitHub features, virtualization and so on 🤓.

Table of Contents 📜

  1. IDEs Plugins 🧰
    1. VSCode
    2. PyCharm
  2. Machine Learning Libraries 🤖
    1. The classics
    2. Pytorch, Lightning and W&Bs
    3. albumentations
    4. einops
    5. gradio and streamlit
  3. Environments 🌎
    1. conda
    2. docker
  4. CLI Utilities 👨‍💻
  5. High Performance Computing 🦾
    1. slurm
  6. Git 🐱
    1. Protected Branches
    2. Tags and Releases
    3. LFS
    4. Hidden Directory
    5. GitHub Actions
    6. GitHub Pages

IDEs plugins 🧰

VSCode

  • Python
  • RainbowCSV
  • Remote
  • CoPilot
  • GitLens
  • Docker
  • Jupiter
  • Gitignore
  • vscode-pdf

PyCharm

  • GitToolBox
  • CoPilot
  • Docker

Machine Learning Libraries 🤖

The classics

  • NumPy - Math operations, manipulations, linear algebra and more.
  • Pandas - Tabular data management.
  • MatplotLib and Seaborn - All sorts of plots.
  • OpenCV2, Pillow, and Sci-Kit Image - Image manipulation

Pytorch, Lightning and W&Bs

PyTorch is currently the reference ML framework for Python.

Weights and Biases (W&B) allows to easily track experiments, performances, parameters and so on in a single place.

PyTorch Lightning gets rid of most of the usual PyTorch boilerplate code, like train/val/test loops, backward and optim steps and so on. It also allows to easily use powerful pytorch features and other libraries (like W&B) by inserting just few optional parameters here and there.

albumentations

All sorts of popular image augmentations, like ColorJitter, ZoomBlur, Gaussian Noise... are implemented by albumentations.

einops

Manipulation of tensors (reshaping, concatenating, ...) with einops is extremely intuitive and time-saving.

gradio and streamlit

To quickly create interactive apps based on trained machine learning models, gradio and streamlit are among the most popular frameworks.

Environments 🌎

conda

Conda allows to easily create and share virtual environments. The command conda env export > environment.yml creates a .yml file that can be used to create an identical virtual environment.

Docker

Docker allows to emulate a whole operating system.

CLI Utilities 👨‍💻

  • nvidia-smi ➡️ Check NVIDIA Cards current status
  • ps, top, htop ➡️Check currently running processes
  • nvitop ➡️Like nvidia-smi, but better.
  • tmux ➡️Terminal multiplexer, allows to easily detach jobs.
  • ~/.ssh/config and ~/.ssh/authorized_keys files to define known host names and authorized ssh keys.

High Performance Computing 🦾

slurm

HPC clusters typically use a cluster management and job scheduling tool. Slurm allows to schedule jobs, handle priorities, design partitions and much more. Cheatsheet files for slurm are under the /slurm folder.

Git 🐱

Taking the time to go through most of GitHub's Documentation at least once is very important. Here's a few features to keep in mind.

Protected Branches

Protected branches

Tags and Releases

Important commits can be tagged. Then, jumping to a commit is easy as

LFS

Git Large File System allows to push bigger files to the GitHub repository. Careful: There is a global usage quota per GitHub account that goes across repositories.

Hidden Directory

The .github directory allows to keep the landing page of the GitHub repository "clean" and includes:

GitHub Actions

GitHub Actions allows to execute custom actions automatically upon some triggers by some events (pull requests, pushes, issues opened, ...).

GitHub Pages

GitHub Pages allows to host a webpage for each GitHub repository.