## 📚 Table of Contents 
- [Data Ingestion](#data-ingestion)





# Data Ingestion 

# 🔗 Resource

To understand more about **Data Ingestion**, refer to this Coursera reading resource:

[Batch and Streaming Tools (Coursera)](https://www.coursera.org/learn/source-systems-data-ingestion-and-pipelines/supplement/YD08f/batch-and-streaming-tools)

---
Nearly all data originates as a **continuous stream of events** (e.g., button clicks, stock price changes, IoT sensor readings).  
To handle and process that data, we use ingestion techniques that fall along a **continuum** of frequency.

---

## 📈 Ingestion Frequencies

![Ingestion Frequencies](./images/ingestion_frequencies.png)

| Frequency Type | Description       |
|----------------|-------------------|
| Batch          | Semi-Frequent     |
| Micro-batch    | Frequent          |
| Streaming      | Very Frequent     |

> The **choice of ingestion frequency** depends on:
- The **source systems**
- The **end use case**

---

# 🔌 Ways to Ingest Data

---

## 🗄️ From Databases

![Ingest from DB](./images/ways_to_ingest.png)

### 🔗 Using Connectors (JDBC/ODBC APIs)
- Pulls data using **standard drivers**.
- Ingests:
  - At regular intervals
  - After a threshold of new records

> JDBC (Java Database Connectivity) and ODBC (Open Database Connectivity) allow apps to query databases in a standard, language-independent way.

---

## 🔄 Using Ingestion Tools

- Example: **AWS Glue ETL**
- Automates the pull from databases
- Ingests data **on a regular basis**

---

## 📁 From Files

![Ingest via Files](./images/ingest_via_files.png)

### 🛠️ Manual File Download
- Receive file from external source
- Upload it manually to the system

### 🔐 Secure File Transfer (e.g., AWS Transfer Family)
- Protocols used:
  - **SFTP**: Secure File Transfer Protocol
  - **SCP**: Secure Copy Protocol

---

## 📡 From Streaming Systems

![Streaming Ingestion](./images/ingest_via_streaming_systems.png)

- For **real-time or near-real-time** event ingestion
- Source: Event Producers like **IoT devices**, apps, etc.
- Sent to: Message Queues or Streaming Platforms (e.g., **Amazon Kinesis**, **Apache Kafka**)
- Consumed by: Downstream **event consumers**

---

# 🧠 Batching vs Streaming: Conceptual Continuum

Every event can be ingested either:
- **One-by-one** (→ **Streaming**)
- **Grouped together** (→ **Batch**)

### You can impose batch boundaries using:
- **Size** (e.g., 10GB chunks)
- **Count** (e.g., every 1,000 events)
- **Time** (e.g., every 24 hours, every hour)

> 🌀 High-frequency batch ingestion eventually approaches real-time streaming.

---

# ⚖️ Choosing the Right Ingestion Pattern

Your choice depends on:
- 🔹 What kind of **source system** you're working with (API, DB, Stream)
- 🔹 What **latency** the business case demands
- 🔹 What the **API or system constraints** are (rate limits, payload size)

---

# 🧪 Practical Use Cases Coming Up

This module covers **two hands-on case studies**:
- **Batch ingestion from an API**
- **Streaming ingestion from Amazon Kinesis**

You'll work with real-world tools like:
- **AWS Glue**
- **Streaming platforms**
- **Secure file transfers**
- **Custom connectors**

---