Skip to content
View Nadavka-cmd's full-sized avatar

Block or report Nadavka-cmd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Nadavka-cmd/README.md

Nadav Kama — HPC & Infrastructure Engineer

Linux systems engineer specializing in HPC cluster administration, GPU compute infrastructure, and research platform engineering.

What I Work On

  • HPC Cluster Operations — Multi-node GPU cluster running Slurm, Open OnDemand, Rocky Linux 9, Active Directory/SSSD integration
  • GPU Infrastructure — Deployment and management of NVIDIA A5000, A6000, RTX 3090, P40 nodes; Apptainer/Singularity container workflows
  • Automation & IaC — AWX/Ansible playbooks for node provisioning, driver installation, cluster finalization
  • Storage — TrueNAS/ZFS NFS backends, XFS scratch filesystem management across compute nodes
  • Monitoring — Prometheus + Grafana + Loki/Promtail stack for cluster observability
  • Tooling — Python TUI tools (Textual) for Slurm administration, config sync, and scratch auditing

Stack

Linux (Rocky 9 / RHEL) Slurm Open OnDemand Ansible Python Bash
Active Directory / SSSD TrueNAS / ZFS Prometheus Grafana Loki NVIDIA CUDA

Location

HPC Infrastructure & Platform Engineer — Electrical & Computer Engineering Dept., Ben-Gurion University of the Negev

Pinned Loading

  1. slurm-advisor slurm-advisor Public

    Web-based Slurm job advisor — partition recommendations, queue pressure, pending job explainer, and GPU efficiency tracking for HPC clusters

    HTML 1

  2. hpc-admin-portal-demo hpc-admin-portal-demo Public

    Sanitized demo of a FastAPI-based HPC admin portal for Slurm, AWX automation, LDAP/AD workflows, storage views, quotas, and cluster operations.

    HTML 1

  3. hpc-admin-tui hpc-admin-tui Public

    HPC Administrator terminal application, used to manage and maintain the GPU Cluster in Electrical Engineering Department

    Python

  4. g-seff g-seff Public

    GPU job efficiency reporter for Slurm HPC clusters — tracks GPU, CPU, and memory utilization per job via Prometheus DCGM and sacct

    HTML