## Key Terms

Microservice - Encapsulated, reusable logic that is deployed into production environments.

Continuous Integration (CI) - The practice of frequently merging code changes into a shared repository and automatically building and testing changes to catch issues early.

Continuous Delivery - A development practice where incremental software changes can be reliably released at any time through automated deployments.

End-to-End MLOps - Fully automating the machine learning lifecycle from model development through deployment and hosting via platforms like Hugging Face Spaces.

AWS App Runner - A fully managed service for deploying containerized web services and APIs.

Flask - A popular, lightweight Python web application framework.

Makefile - A file containing a set of directives used to automate building and managing a project.

Requirements File - A text file containing a list of Python package dependencies used by an application.
## Top 3 Key Points:

MLOps inherits from DevOps and brings automation to ML

A lightweight or heavy MLOps approach depends on needs

Must have DevOps first, then data ops, MLOps platform, and business alignment


# Reflection Question- Short Questions


**1. How is MLOps different from traditional software engineering?**
MLOps focuses on managing the full ML lifecycle—data, models, and deployment—whereas traditional software engineering mainly manages code and application logic.

**2. When would you choose a lightweight vs heavy MLOps system?**
Lightweight for small projects or quick experiments; heavy for large, complex systems that need scalability, monitoring, and governance.

**3. What cultural changes are needed to adopt MLOps practices?**
Teams must embrace collaboration between data scientists, engineers, and operations, and adopt continuous improvement and automation mindsets.

**4. How could data poisoning threats impact an organization?**
They can corrupt training data, leading to biased or malicious models that harm decision-making and trust.

**5. Why is business alignment important for MLOps success?**
Without business alignment, ML efforts risk producing models that don’t deliver real value or solve the right problems.

**6. What data storage approach best meets your model training needs?**
Choose based on data type and scale—structured data may suit relational databases, while large unstructured data needs object storage.

**7. Which maturity level best describes your team's current MLOps state?**
Depends on your practices—manual workflows = early stage, automated pipelines with monitoring = advanced.

**8. How could a centralized feature store help your model development process?**
It standardizes features, reduces duplication, and ensures consistency between training and serving environments.

**9. What cultural changes are needed to implement CI/CD pipelines?**
A shift toward automation, rapid iteration, and shared responsibility for code quality and deployment.

**10. Which MLOps platform provider is best suited to your applications?**
The one that matches your needs—AWS for scalability, GCP for AI tools, Azure for enterprise integration, or open-source for flexibility.

**11. What types of logic would work well packaged as a microservice?**
Reusable, independent tasks like prediction APIs, data validation, or preprocessing pipelines.

**12. How could you improve the CI/CD pipeline example?**
Add automated tests, monitoring, rollback strategies, and security checks to make it more robust.

**13. What other pre-trained models could you deploy besides Hugging Face?**
Models from TensorFlow Hub, PyTorch Hub, or OpenAI APIs depending on your task.

**14. How else could App Runner or Flask apps be triggered besides HTTP?**
Via event-driven triggers such as message queues, cron jobs, or cloud functions.

**15. Why is having a requirements.txt file important?**
It ensures consistent dependencies across environments, making projects reproducible and easier to share.




# Challenge Questions



### **Challenge 1: Diagram your own organizational MLOps landscape architecture**


                ┌───────────────────────────────┐
                │   Business Use Cases / Apps   │
                └───────────────┬───────────────┘
                                │
                 ┌──────────────▼───────────────┐
                 │   Data Sources (DBs, APIs,   │
                 │  Data Lakes, Streaming, etc.)│
                 └───────────────┬──────────────┘
                                 │
             ┌───────────────────▼───────────────────┐
             │        Data Ingestion & Storage        │
             │ (ETL, Data Warehouse, Feature Store)   │
             └───────────────────┬───────────────────┘
                                 │
                 ┌───────────────▼───────────────┐
                 │    Model Development / Lab    │
                 │ (Notebooks, Experiment Mgmt,  │
                 │   Versioning, CI/CD)          │
                 └───────────────┬───────────────┘
                                 │
                 ┌───────────────▼───────────────┐
                 │     Model Training & Eval     │
                 │  (Pipelines, AutoML, GPUs)    │
                 └───────────────┬───────────────┘
                                 │
                 ┌───────────────▼───────────────┐
                 │   Deployment & Serving Layer  │
                 │ (Microservices, APIs, Batch,  │
                 │   Real-time Inference)        │
                 └───────────────┬───────────────┘
                                 │
                 ┌───────────────▼───────────────┐
                 │ Monitoring & Feedback Loops   │
                 │ (Drift Detection, Logging,    │
                 │   Retraining Triggers)        │
                 └───────────────────────────────┘




### **Challenge 2: Interview teams to create your own maturity model assessment**

This one is less about coding and more about **process + framework**.
The idea: you “interview” different teams (data science, engineering, DevOps, product) and assess **where they stand** on MLOps practices.

Here’s a **simple maturity model (4 levels)** you can use in those interviews:

1. **Level 1 – Initial / Ad hoc**

   * Models built in notebooks, manual deployment.
   * No versioning, little collaboration.

2. **Level 2 – Repeatable / Basic Automation**

   * Code + data versioned (Git, DVC).
   * Basic CI/CD pipeline for ML models.
   * Manual monitoring.

3. **Level 3 – Defined / Standardized**

   * Central feature store.
   * Automated training + deployment pipelines.
   * Continuous monitoring for drift/performance.

4. **Level 4 – Optimized / Scalable**

   * End-to-end automation (data → training → deploy → retrain).
   * Governance, explainability, compliance.
   * Business-aligned metrics drive retraining.

---

💡 **How to “interview”:**

* Ask **data scientists** → How do you track experiments?
* Ask **DevOps** → How do you deploy ML models?
* Ask **engineers** → How do you monitor and test ML code?
* Ask **business team** → Are ML outputs tied to KPIs?




# Challenge 3: Check out the ml-ops-ci-demo folder

# Challenge 4: Prototype a Feature Store 
Organize into feature groups (tables). Example:

Customer Profile Features → static attributes

customer_id (key)

age

gender

signup_date

Behavioral Features → usage patterns

avg_session_length

last_login_days

clicks_past_week

Transaction Features → spending history

total_spent

avg_monthly_spend

purchase_count

CREATE TABLE customer_profile (
    customer_id VARCHAR PRIMARY KEY,
    age INT,
    gender VARCHAR,
    signup_date DATE
);

-- Behavioral Features

CREATE TABLE customer_behavior (
    customer_id VARCHAR PRIMARY KEY,
    avg_session_length FLOAT,
    last_login_days INT,
    clicks_past_week INT
);

-- Transaction Features

CREATE TABLE customer_transactions (
    customer_id VARCHAR PRIMARY KEY,
    total_spent FLOAT,
    avg_monthly_spend FLOAT,
    purchase_count INT
);



# Challenge 5: Basic MLOps Pipeline (Data → Training → Deployment)
## Check ml-ops-pipeline-demo

# **Challenge 6: Diagram a lightweight MLOps workflow for a hobby project**.

This one is **theory + diagram**, meant to show how a small-scale setup would look (for personal projects, prototypes, or class assignments).

## 🚀 Lightweight MLOps Workflow (Hobby Project)

```
      ┌─────────────────────┐
      │   Data Collection   │
      │ (CSV files, APIs)   │
      └─────────┬───────────┘
                │
      ┌─────────▼───────────┐
      │   Data Prep & EDA   │
      │ (Notebooks, Pandas) │
      └─────────┬───────────┘
                │
      ┌─────────▼───────────┐
      │   Training Script   │
      │ (scikit-learn, etc.)│
      └─────────┬───────────┘
                │
      ┌─────────▼───────────┐
      │ Save Model Artifact │
      │ (pickle, joblib)    │
      └─────────┬───────────┘
                │
      ┌─────────▼───────────┐
      │ Simple Deployment   │
      │ (Flask API)         │
      └─────────┬───────────┘
                │
      ┌─────────▼───────────┐
      │   Manual Testing    │
      │ (curl/Postman)      │
      └─────────────────────┘
```

---

## 🔑 Key Notes for Hobby Workflow

* **Data** → simple files (CSV, JSON, API dumps). No big data infra.
* **Training** → run in notebooks or a single script.
* **Model Storage** → save as `model.pkl` or `joblib` file.
* **Deployment** → Flask or FastAPI, maybe Docker if needed.
* **Testing** → manual (curl, Postman), not automated monitoring.




# Challenge 7 : Set up a basic MLOPs pipeline using open source tools
## Check same demo as above

# Challenge 8 : Interview DevOps teams on lessons for MLOps adoption
👉 Ask DevOps about automation, monitoring, deployment pain points. Summarize lessons like:

Automate testing and deployment.

Monitor for failures and drift.

Build CI/CD culture around ML.



# Challenge 9: Research real-world examples of data poisoning issues
👉 Example answers:

Microsoft’s Tay chatbot was poisoned by malicious inputs.

Attackers can inject fake reviews into training data to bias recommendations.

Poisoning can lead to reputational damage and faulty predictions.



# Challenge 10: Analyze costs/benefits of ML for a business case
👉 Benefits: better predictions, automation, customer insights.
👉 Costs: infrastructure, talent, ongoing monitoring.
👉 Answer: Always balance ROI — don’t build ML unless benefits clearly outweigh costs.



# Challenge 11: Research other MLOps platforms to replace Hugging Face
👉 Examples: MLflow, Kubeflow, TFX (TensorFlow Extended), SageMaker, Azure ML, Google Vertex AI. Each has different strengths in orchestration, experiment tracking, or deployment.

# 🚀 Challenge 11: Containerize and Deploy the Flask Random Fruit Microservice

## 📝 Summary: Flask Random Fruit Microservice - Check ml-ops-microservice-demo

1. **Built a simple Flask app (`app.py`)** → returns a random fruit at `/fruit`.

   * Purpose: Demonstrate a lightweight microservice.

2. **Created `requirements.txt`** → listed dependencies (`flask`).

   * Purpose: Ensure environment reproducibility.

3. **Wrote a `Dockerfile`** → defined how to package the app into a container.

   * Purpose: Make the service portable and consistent.

4. **Built the Docker image** → `docker build -t flask-fruit .`.

   * Purpose: Bundle code + dependencies into a single image.

5. **Ran the container** → `docker run -p 5001:5000 flask-fruit`.

   * Purpose: Deploy the app locally inside Docker.

6. **Tested the endpoint** → `curl http://127.0.0.1:5001/fruit`.

   * Purpose: Verify the containerized service works as expected.


