Understanding 3rd Party Packaging
Summary

This
 template
 provides a starting point for ML/DL projects leveraging GPU acceleration. It uses mainstream Python tools like virtualenv, pip, Docker, PyTorch, and TensorFlow to configure a development environment with GPU access, isolate dependencies, and test GPU training.

## Top 4 Key Points

- Check virtualenv is active to manage packages separately

- Dockerfiles included to build GPU container images

- PyTorch and TensorFlow tests validate GPU works

- Tools like BentoML and Hugging Face integrate nicely

## Reflection Questions

🔹 1. How could a Makefile streamline training model experiments?

A Makefile provides a standardized way to define commands as targets. In ML experiments, there are usually repetitive steps such as:

setting up data,

preprocessing,

training,

evaluating,

cleaning checkpoints.

By writing these as Makefile rules (e.g., make train, make eval), you:

avoid remembering long shell commands,

enforce consistency across different runs,

enable reproducibility by documenting the exact workflow.

This makes collaboration easier too—anyone on the team can run the same experiments with make.

🔹 2. Why isolate Python dependencies in virtualenvs and containers?

ML projects often rely on specific versions of TensorFlow, PyTorch, CUDA, scikit-learn, etc. If you install everything globally:

version conflicts arise between projects,

reproducing results becomes difficult,

system packages might break.

Using virtualenvs (or venv) ensures project-level isolation.
Using containers (e.g., Docker) goes further by isolating not just Python but the entire OS environment, including CUDA drivers and system libraries.

👉 This guarantees that training runs on one machine (or in production) behave the same on another.

🔹 3. What role do GitHub Actions play in an ML Ops pipeline?

GitHub Actions act as the automation backbone of an ML pipeline:

CI/CD for ML code: test code, linting, style checks.

Data and model workflows: trigger training jobs when new data is pushed.

Model validation: automatically evaluate models on benchmarks before merging.

Deployment: push trained models to a registry or cloud service when tests pass.

This removes manual steps, reduces human error, and ensures continuous, reliable delivery of ML models.

🔹 4. How could BentoML serve models for low-latency requests?

BentoML is designed for model serving. It wraps trained models into a standardized API service (REST/gRPC) with:

efficient model loading,

optimized inference pipelines,

autoscaling support.

For low-latency inference (e.g., fraud detection, personalized recommendations), BentoML:

keeps the model in memory,

handles concurrent requests,

integrates with GPU/CPU optimizations,

supports containerization for deployment.

This makes it production-ready without needing to manually write Flask/FastAPI wrappers.

🔹 5. When would fine-tuning a Hugging Face model be preferred over training from scratch?

Fine-tuning a pre-trained Hugging Face model (e.g., BERT, GPT-2, ViT) is preferred when:

Data is limited → you leverage knowledge from massive pretraining corpora.

Domain transfer is needed → e.g., fine-tuning BERT on medical text (BioBERT) instead of training from scratch.

Compute efficiency → training from scratch on billions of tokens/images requires GPUs/TPUs at scale.

Training from scratch is only justified when:

you have a huge, domain-specific dataset,

or the existing pre-trained models are fundamentally mismatched with your task.

## Challenge Exercises

Adjust hyperparameters in the PyTorch GPU test code

Log GPU usage with nvidia-smi during model training

Build a Docker container to run TensorFlow code

Serve a scikit-learn model with BentoML locally

Fine-tune DistilBERT model on a small text corpus