# Lecture 28. Multi-hop Architecture

### Reference
  - [Databricks Glossary > Medallion Architecture](https://www.databricks.com/glossary/medallion-architecture)  



In this video, we will talk about **multi-hop architecture** in the *Lakehouse*.

You will 
- learn what the *incremental multi-hop pipeline* is and 
- understand its different layers, represented by *bronze*, *silver*, and *gold* tables.

## Introduction to Multi-Hop Architecture

A **multi-hop architecture**, also known as **Medallion architecture**, is a data design pattern used to logically organize data in a multilayered approach. 

Its goal is to **incrementally improve the structure and the quality of the data as it flows through each layer** of the architecture.

### Layers of Multi-Hop Architecture

The **multi-hop architecture** usually consists of three layers: *bronze*, *silver*, and *gold*.

<div style="text-align: center;">
<img src="../../assets/images/Presentation-Images/Multi-Hop Architecture.jpg" style="width:640px" >
</div> 

- The **bronze table** contains raw data ingested from various sources, such as *JSON files*, *operational databases*, or *Kafka Stream*, for example.
  
- Then, the **silver table** provides a more refined view of the data. For example, data can be cleaned and filtered at this level. We can also join fields from various *bronze tables* to enrich our *silver records*.
  
- Lastly, the **gold table** provides business-level aggregations, often used for *reporting*, *dashboarding*, or even for *machine learning*.

### Incremental Improvement of Data

So, as you can see, with this architecture, we incrementally improve the structure and the quality of data as it flows through each layer.

## Benefits of Multi-Hop Architecture

There are many benefits to using **multi-hop architecture**:

- Clearly, it is a **simple data model** that is easy to understand and implement.
  
- It enables **incremental ETL** (Extract, Transform, Load) by allowing data to be processed incrementally.
  
- It can **combine *streaming* and *batch* workloads in the same pipeline**, allowing each stage to be configured as a *batch* or *streaming* job.
  
- Additionally, it can **recreate your tables from *raw data* at any time**.