Skip to content
View asaiacai's full-sized avatar

Highlights

  • Pro

Organizations

@ocf

Block or report asaiacai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
asaiacai/README.md

👋 I'm Andrew.

Currently

  • I'm CTO at Trainy (YCombinator Summer 2023) where I help customers spin up infrastructure for managing GPU clusters at scale aimed at generative AI training/serving applications. 🤖
  • Our reference architecture that I currently work on is Konduktor, a Kubernetes based platform built using existing open source tools, to make it easier for ML engineers to scale out model training on GPU clusters and give cluster administrators cloud native tools to maintain the health of their GPU clusters. 🚅
  • When I do get a free moment, I occassionally make small contributions to Skypilot. I currently develop/maintain its Digital Ocean/Paperspace integration. In the summer of 2023, this project saved my life and got me GPUs when I needed them. I highly recommend it and the team running it is awesome 😊 🛩️
  • Prior, I was professionally an ML Engineer at Hive AI where I led the ML engineering side for a logo detection product as well as internal research on stylistic transfer via textual inversion for text2image diffusion models.

In a previous life, I was a physics Ph.D. student 👨‍🔬 studying solid-state physics ⚛️ under Mike Crommie. You'll probably find a lot of my old research code here. My personal website is here, opinions my own.

Pinned Loading

  1. Trainy-ai/konduktor Trainy-ai/konduktor Public

    cluster/scheduler health monitoring for GPU jobs on k8s

    Python 43 1

  2. skypilot skypilot Public

    Forked from skypilot-org/skypilot

    SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

    Python

  3. Trainy-ai/nodify Trainy-ai/nodify Public

    Profiling tools for distributed training

    HTML 37 4

  4. Trainy-ai/llm-atc Trainy-ai/llm-atc Public archive

    Fine-tuning and serving LLMs on any cloud

    Python 86 2