# Introduction to dbt: 10 Must-Know Concepts For Data Engineers

## What is dbt and why should you care?

I have recently read somewhere on the Internet that the first data scientist a company hires becomes a data engineer. 

## Prerequisites

Only a few:
- __Basic to intermediate SQL__: if you know how to use the WHERE and GROUP BY clauses, you are good to go.
- __Familiarity in the terminal__: you should feel comfortable in the terminal, virtual environments and installing software with package managers like `pip` or `homebrew`. 
- __Basics of data warehouses__: fundamental knowledge of data engineering is a giant plus. It shouldn't be necessarily deep like you know [Kimball's four step process](https://campus.datacamp.com/courses/introduction-to-data-warehousing/data-warehouse-data-modeling?ex=5) but enough to infer some of the keywords.

If you can't meet these criteria and yet, your boss (or yourself) still requires you to learn dbt, you can use the following resources:

PASTE THE RESOURCES HERE

## What will this guide cover?

The open-source community loves dbt, so they managed to integrate it with nearly every tool that works with data. The result? A documentation so massive that even the quick start guides are larger than the docs of entire Python libraries. 

So, my goal with this article is to introduce you to 10 core concepts of dbt with moderate technicality sprinkled in-between. After finishing the tutorial, you can go to any page of dbt docs and figure out what is going on. 

Let's get started!

## 0. Data warehouse

The first concept we must familiarize ourselves is a data warehouse. A warehouse is where you store all data that belongs to some company. 

Companies build warehouses because they enable analytics and everything else you can do with data (ahem, structured data). They store historical data organized into tables and they are structured for fast querying and analysis. 

There are many tools that implement data warehouses:
- PostgreSQL
- MySQL
- Snowflake
- BigQuery
- Redshift

and so on. 

dbt can connect to all these tools and help you perform transformations on the data within. This means that to use dbt, you should already have a warehouse (a database) populated with data. 

dbt doesn't help you collect or load data but transform it. In other words, it does the T in the ETL/ELT process (extraction, transformation, load) that is at the heart of all warehouses. 

## 1. dbt Core vs. dbt Cloud

dbt is offered through two interfaces: __dbt Core__ and __dbt Cloud__.

dbt Core is an open-source library that implements most of the functionality of dbt. It has a command-line interface (the `dbt` command you will come to love) that you can use to manage data transformations in your projects. 

dbt Cloud is an enterprise solution for teams. On top of the CLI, dbt Cloud also provides a more user-friendly web-based IDE. With it, you don't have to worry about database connections and editing YAML files so much (as you will see in the coming sections). 

dbt Cloud also offers additional features like job scheduling, advanced integrations and high priority support. 

Here is a table summarizing the differences between dbt Core and dbt Cloud:

![image.png](attachment:a7c57259-f2aa-44ea-b4dd-95af66981802.png)

Despite the additional features, we will cover dbt Core as it is best suited for local projects, testing and learning. You can install it with `pip` on any OS (inside a virtual environment, of course). 

I will use a Conda environment:

```
$ conda create -n learn_dbt -y
$ pip install dbt-<adapter_name>
```

You should replace `adapter_name` with the database you want to use. dbt Labs (the company behind dbt) has integrated many adapters for different data platforms. 

In this article, we will use `dbt-duckdb` that let us connect to a DuckDB database. But you can use any of the adapters listed on [this page of dbt docs](https://docs.getdbt.com/docs/core/connect-data-platform/about-core-connections).

```
$ pip install dbt-duckdb
```

That's it for initial setup!

## 2. dbt Projects

## 3. dbt Project profiles

## 4. dbt Models

## 5. dbt Commands

## 6. dbt Important YAML files

## 7. Hierarchy in dbt Models

## 8. Jinja Templating in dbt

## 9. dbt Tests

## 10. dbt Documentations

## A typical dbt workflow you can follow

## Conclusion and further resources