# CI/CD for Machine Learning – Personal Notes

## 1. Introduction
This course covers **Continuous Integration (CI)** and **Continuous Delivery/Deployment (CD)** techniques tailored for machine learning workflows.

---

## 2. Software Development Life Cycle (SDLC)

### Definition
SDLC is a structured process for developing, deploying, and maintaining software applications.

### Key Stages
- **Build**: Compile source code into executable form.
- **Test**: Validate functionality and quality.
- **Deploy**: Release software into target environments.

---

## 3. SDLC in Machine Learning

### Unique Challenges
- ML models evolve with data; not static algorithms.
- Data engineering is resource-intensive: includes collection, transformation, storage, and serving.
- Integration with SDLC requires automation for speed and quality.

### Benefits of CI/CD in ML
- Streamlines delivery of high-quality ML software.
- Enables rapid prototyping and testing.
- Facilitates algorithm and hyperparameter exploration.
- Improves decision-making through faster iteration.

**Reference**: [Google Cloud – ML Lifecycle](https://cloud.google.com/blog/products/ai-machine-learning/making-the-machine-the-machine-learning-lifecycle)

---

## 4. What is CI/CD?

### Continuous Integration (CI)
- Automatically builds and tests code on integration into a shared repo.
- Prevents integration issues and ensures code stability.

### Continuous Delivery (CD)
- Automates delivery of code to production-like environments.
- Requires manual approval before deployment.

### Continuous Deployment (CD)
- Fully automates release to production without manual intervention.

---

## 5. CI/CD in Machine Learning

### Key Differences from Traditional Software
- ML = Code + Data -> both must be versioned.
- Experimentation requires tracking model performance and configurations.
- Reproducibility demands versioning of data, models, and code.

### Testing in ML CI
- Goes beyond unit tests: includes data preprocessing, training, and evaluation.
- Ensures pipeline reliability and model quality.

### Deployment Considerations
- More complex than traditional software.
- Requires:
  - Model serving infrastructure
  - Performance monitoring
  - Update management and rollback strategies

---

## 6. Course Scope

Focus areas:
- Data preparation and versioning
- Model development and evaluation
- Hyperparameter tuning
- CI/CD integration across these stages

---

## 7. Summary

### SDLC Workflow
- Build -> Test -> Deploy

### CI/CD Benefits in ML
- **CI**: Frequent code merging, early bug detection
- **CD (Delivery)**: Manual approval before release
- **CD (Deployment)**: Fully automated release

### ML-Specific Enhancements
- Data/model versioning for reproducibility
- Automation for experimentation
- Full pipeline testing
- Reliable and rapid deployment

 Continuous deployment is the practice of automatically releasing every code change to production, while continuous delivery is the practice of preparing code changes for release but allowing for manual approval before deployment.
 
Continuous deployment is actually the automated process of deploying code changes to production, while continuous delivery is the practice of preparing code changes for release.

![image.png](attachment:3e1638c9-d138-42da-9357-d31700bafc0d.png)

Generic Workflow


# YAML for CI/CD in Machine Learning – Personal Notes

## 1. What is YAML?

### Definition
- YAML stands for **"YAML Ain't Markup Language"**
- A human-readable data serialization format used for:
  - Configuration files
  - Data exchange
  - Structured data representation

### Comparison
- Alternative to XML
- Comparable to JSON in functionality
- Designed for readability and simplicity

### Usage in CI/CD
- YAML is the backbone of configuration in tools like:
  - **GitHub Actions** (workflow orchestration)
  - **DVC** (pipeline stages and metadata)
- File extensions: `.yaml` or `.yml`

---

## 2. YAML Syntax

### Structure Rules
- Uses **indentation** and **line separation** to define hierarchy
- Indentation is space-based (no tabs allowed)
- Syntax errors often stem from inconsistent spacing

name: Santosh
occupation: Instructor
# this is valid format
programming_languges: R, Python # this is too
  python:advanced
  javascript: advanced

  


### Best Practices
- Use YAML-aware IDEs with validation support
- Comments begin with `#` and are ignored during parsing

---

## 3. YAML Scalars

### Supported Scalar Types
- **String**: quoted or unquoted "Rustam" and Rustam both are sstrings
- **Number**: integer or float
- **Boolean**: `true` / `false` (unquoted)
- **Null**: `null` or `~`

### Notes
- Booleans and nulls must not be quoted to retain type
- Strings can be wrapped in `'single'` or `"double"` quotes when needed

---

## 4. YAML Collections

### Sequences (Lists)
- Ordered elements
- **a. Block style**: uses hyphens
  ```yaml
  - item1
  - item2
  - item3
```

** b. Flow style: uses brackets **

yaml
[item1, item2, item3]
Mappings (Key-Value Pairs)
Uniquely keyed values

Syntax:

yaml
key1: value1
key2:
  - nested1
  - nested2
key3: [val1, val2, val3]
```

![image.png](attachment:f089ce70-4ec8-479a-8f0b-a56108899283.png)


![image.png](attachment:9fa891f4-202e-42ca-be75-9262776af238.png)


# GitHub Actions (GHA) – Personal Notes

## 1. What is GitHub Actions?
![image.png](attachment:1742f301-4612-4a96-930c-eafad069f3b9.png)
### Definition
- GitHub Actions (GHA) is GitHub’s built-in automation and CI/CD system.
- Enables automation of build, test, and deployment pipelines directly within GitHub repositories.
- A **pipeline** is a sequence of interconnected steps representing the flow of work and data.

### Analogy
- Similar to a car assembly line: each step performs a specific task (e.g., attach engine, paint).
- In GHA, each step automates a part of the software development lifecycle.

**Reference**: [Medium – CI/CD with GitHub Actions for Android](https://medium.com/empathyco/applying-ci-cd-using-github-actions-for-android-1231e40cc52f)

---

## 2. Core Components of GitHub Actions

### Event
- An **event** triggers the execution of a workflow.
- Examples:
  - `push` to a branch
  - `pull_request` opened
  - `issue` created

### Workflow
- A **workflow** is a YAML-defined automated process.
- Stored in `.github/workflows/` directory.
- Can be triggered by:
  - Events
  - Manual triggers
  - Scheduled intervals
- Multiple workflows can exist in a repo:
  - One for testing PRs
  - One for deployment
  - One for issue labeling

### Steps and Actions
- A **step** is a unit of work executed in sequence.
- Steps share the same runner and can pass data between them.
- Examples:
  - Build application
  - Run tests
  - Execute shell scripts
- An **action** is a reusable application that performs a task.
  - Examples: `actions/checkout`, auto-commenting on PRs
  - 
![image.png](attachment:f1a3e195-9a37-435d-a37f-827024b89bfd.png)

### Jobs and Runners
- A **job** is a set of steps.
- Jobs are independent and can run in parallel.
- Jobs can be configured with dependencies.
- All steps in a job run on the same **runner** (compute machine).

---
![image.png](attachment:08478645-eea1-4b6a-a595-ce3aedde42b1.png)

## 3. Example Workflow

### Trigger
- A `push` event initiates the workflow.

### Job
- Runs on an Ubuntu Linux runner.

### Steps
```yaml
- name: Checkout code
  uses: actions/checkout@v3

- name: Run Python app
  run: python app.py
```

![image.png](attachment:7b91431b-6067-4613-a0aa-ac558b56ad8f.png)


![image.png](attachment:760a7523-1175-42a7-b9bd-9fbc12a62494.png)


you can also specify a "job" to be dependent on another "job."

# Intermediate YAML 

## 1. Overview

To work effectively with GitHub Actions and other CI/CD tools, a deeper understanding of YAML is required—especially for handling multiline strings, dynamic values, and multi-document structures.

---

## 2. Multiline Strings: Block Scalar Format

### Purpose
- Used to represent multi-line strings with preserved formatting.
- Common in:
  - Shell commands
  - Log messages
  - Configuration blocks

### Styles
- **Literal (`|`)**: preserves line breaks and indentation exactly.
- **Folded (`>`)**: collapses line breaks into spaces for wrapped text.

---

## 3. Literal Style (`|`)

### Behavior
- Maintains all line breaks and indentation.
- Ideal for shell scripts or formatted logs.

### Example
```yaml
script: |
  echo "Starting process"
  
    indented line
  echo "Done"
## 4. Folded Style (>)
Behavior
Converts line breaks into spaces.

Preserves blank lines and indented blocks.

### Example
```yaml
message: >
  This is a long message
  that will be folded into
  a single paragraph.
      
## 5. Chomping Indicators
Purpose
Control how trailing newlines are handled in block scalars.

Modes
Clip (default): adds one newline at end (no symbol needed).

Strip (-): removes all trailing newlines.

Keep (+): retains all trailing newlines.

Example
yaml
log: |-
  Line one
  Line two
yaml
log: |+
  Line one
  Line two

## 6. Dynamic Value Injection
### Description
Not part of standard YAML spec.

Used by specific tools to inject runtime values.

``` 
Syntax: ${{ expression }} or $ENV_VAR

### Use Cases
Referencing environment variables

Accessing config values from other YAML sections

Example

``` 
yaml
database: ${{ secrets.DB_URL }}
Note: Support depends on the tool (e.g., GitHub Actions, Helm, etc.).

## 7. Multi-Document YAML
### Purpose
Store multiple independent YAML documents in one file.

Useful for grouping related configs or metadata.

Syntax
Use --- to separate documents.

Example
``` 
yaml
---
name: Alice
age: 30
---
name: Bob

age: 40
occupation: Engineer
---
name: Carol
age: 25
References
yaml-multiline.info