# Data Architecture Diagram

The data architecture diagram consists of several key layers:

## 1️⃣ Data Sources
- **Databases (SQL, NoSQL)**: Structured and semi-structured data from transactional and analytical databases.
- **APIs & Web Services**: Data retrieved from external APIs (e.g., REST, GraphQL) or internal microservices.
- **IoT & Streaming Data**: Real-time data from IoT devices, sensors, and streaming platforms.
- **Files & Logs**: Raw data stored in files such as CSV, JSON, logs, and flat files.

## 2️⃣ Ingestion Layer
- **Data Ingestion (ETL/ELT)**: Extract, Transform, and Load processes that move data from sources into storage.
- This process ensures data is cleansed, transformed, and prepared for processing.

## 3️⃣ Processing Layer
- **Batch Processing**: Periodic processing of large data sets (e.g., via Apache Spark, Hadoop).
- **Stream Processing**: Real-time data processing (e.g., Apache Kafka, Azure Stream Analytics).

## 4️⃣ Storage Layer
- **Data Lake**: Stores raw, semi-structured, and unstructured data at scale.
- **Data Warehouse**: Structured storage optimized for analytics and reporting.
- **Data Lakehouse**: Hybrid storage combining features of Data Lakes and Data Warehouses.
- **Data Mesh**: A decentralized approach where domains own and manage their own data.

## 5️⃣ Consumption Layer
- **BI & Analytics**: Business Intelligence dashboards, reporting tools (e.g., Power BI, Tableau).
- **Machine Learning**: AI models trained on structured and unstructured data.
- **Operational Applications**: Embedded analytics and real-time data in business applications.

![Data Architecture](https://drive.google.com/uc?id=1HO27aUW6XpGjIoGfs-NwkZtgwoFwwVYL)


<!--
<img src="https://drive.google.com/uc?id=1HO27aUW6XpGjIoGfs-NwkZtgwoFwwVYL" width="600">
-->

```
# This is formatted as code
```




Code to generate the image

In [7]:
from graphviz import Digraph

def generate_data_architecture_diagram2():
    dot = Digraph("Data_Architecture")
    dot.attr(rankdir="TB")  # Top to Bottom layout

    # Data Sources
    dot.node("A1", "Databases (SQL, NoSQL)", shape="parallelogram", style="filled", fillcolor="lightblue")
    dot.node("A2", "APIs & Web Services", shape="parallelogram", style="filled", fillcolor="lightblue")
    dot.node("A3", "IoT & Streaming Data", shape="parallelogram", style="filled", fillcolor="lightblue")
    dot.node("A4", "Files & Logs", shape="parallelogram", style="filled", fillcolor="lightblue")

    # Ingestion Layer
    dot.node("B", "Data Ingestion (ETL/ELT)", shape="box", style="filled", fillcolor="lightgray")
    dot.edge("A1", "B", label="Structured Data")
    dot.edge("A2", "B", label="API Calls")
    dot.edge("A3", "B", label="Real-time Data")
    dot.edge("A4", "B", label="Flat Files & Logs")

    # Processing Layer
    dot.node("C", "Processing (Batch/Stream)", shape="box", style="filled", fillcolor="lightgray")
    dot.edge("B", "C", label="Transformed Data")

    # Storage Layer
    dot.node("D1", "Data Lake", shape="cylinder", style="filled", fillcolor="lightyellow")
    dot.node("D2", "Data Warehouse", shape="cylinder", style="filled", fillcolor="lightyellow")
    dot.node("D3", "Data Lakehouse", shape="cylinder", style="filled", fillcolor="lightyellow")
    dot.node("D4", "Data Mesh", shape="cylinder", style="filled", fillcolor="lightyellow")
    dot.edge("C", "D1", label="Raw & Semi-Structured Data")
    dot.edge("C", "D2", label="Structured & Aggregated Data")
    dot.edge("C", "D3", label="Unified Storage & Processing")
    dot.edge("C", "D4", label="Decentralized Data Management")

    # Consumption Layer
    dot.node("E1", "BI & Analytics", shape="ellipse", style="filled", fillcolor="lightgreen")
    dot.node("E2", "Machine Learning", shape="ellipse", style="filled", fillcolor="lightgreen")
    dot.node("E3", "Operational Applications", shape="ellipse", style="filled", fillcolor="lightgreen")
    dot.edge("D1", "E2", label="Big Data Processing")
    dot.edge("D2", "E1", label="Querying & Dashboards")
    dot.edge("D3", "E1", label="BI & AI Workloads")
    dot.edge("D4", "E3", label="Decentralized Use Cases")

    # Save Diagram
    dot.render("data_architecture2", format="png", cleanup=True)
    print("Data architecture diagram generated as 'data_architecture2.png'")


if __name__ == "__main__":
    generate_data_architecture_diagram2()



Data architecture diagram generated as 'data_architecture.png'
