# 📚 Table of Contents

- [Source Systems for Data Ingestion](#source-systems-for-data-ingestion)

##  Source Systems for Data Ingestion

As a data engineer, one of the core responsibilities is to extract **raw data** from different source systems. This raw data can be **structured**, **semi-structured**, or **unstructured**, and needs to be ingested and processed downstream.

---

### 🧱 Types of Data

There are **three main categories** of data based on structure:

- **Structured Data**:  
  Data organized as tables of rows and columns.  
  *Example*: SQL tables, CSV files.

- **Semi-Structured Data**:  
  Data not in strict tabular form but still containing structure like tags or keys.  
  *Example*: JSON, XML.

- **Unstructured Data**:  
  Data with no predefined structure.  
  *Example*: Text, audio, video, images.

![Types of Data](./images/types_of_data.png)

---

### 🔑 Example: Semi-Structured Data (JSON Format)

A common example of semi-structured data is **JSON (JavaScript Object Notation)**, which stores information as a collection of key-value pairs. These can also be **nested**, allowing complex data structures.

![Semi-Structured JSON Example](./images/semi_structured.png)

---

### 🗃️ Where Data is Stored

Depending on the structure, data can be stored in various mediums:

- **Structured / Semi-Structured**: Stored in relational or NoSQL databases.
- **Unstructured**: Stored as files (text, images, audio, etc.).
- **Streaming**: Real-time events from producers like sensors or logs.

![Source Systems](./images/source_systems.png)

---

### 🗄️ Relational vs Non-Relational Databases

- **Relational Databases (SQL)**:  
  Store data in fixed tables (rows and columns). Best for structured data.

- **Non-Relational Databases (NoSQL)**:  
  Store data in key-value pairs or documents. Good for semi-structured or nested data.

![Databases](./images/databases.png)

---

### 🔁 Putting It All Together: Source System Ingestion

Whether from **databases**, **files**, or **streaming systems**, all types of data eventually flow into the **ingestion pipeline**. These source systems are the starting point of the data engineering lifecycle.

![Source System Ingestion Overview](./images/source_system_ingestion.png)