Skip to content

Commit

Permalink
docs: Tutorial small cleanup (meltano#6809)
Browse files Browse the repository at this point in the history
* fix tutorial part 1's link to part 2

* minor wording updates
  • Loading branch information
pnadolny13 committed Sep 29, 2022
1 parent f3fcbc2 commit e02c96f
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion docs/src/_getting-started/part1.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ $ cat output/commits.jsonl
</div>
## Next Steps
Next, head over to [Part 2: Loading extracted data into a target (currently inside the large Getting Started Tutorial)](/getting-started/#add-a-loader-to-send-data-to-a-destination).
Next, head over to [Part 2: Loading extracted data into a target (currently inside the large Getting Started Tutorial)](/getting-started/part2).
<script src="/js/termynal.js" data-termynal-container="#termy1|#termy2|#termy3|#termy4|#termy5|#termy6|#termy7|#termy8|#termy9|#termy10"></script>
Expand Down
8 changes: 4 additions & 4 deletions docs/src/_getting-started/part3.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Throughout this tutorial, we’ll walk you through the creation of a end-to-end

In parts [1](/getting-started/part1) & [2](/getting-started/part2), we extracted data from GitHub and loaded it into a (local) PostgreSQL database. Now it is time to have more fun. We decide to load all attributes from the data we selected previously, and then build a model listing the different authors of commits to our repository.

That means, in this part we're going to unleash dbt [(data build tool)](https://www.getdbt.com/) onto our data to transform it into meaningful information. Don't worry, you don't need to know anything about dbt, this tutorial is self-contained. You do not need to install dbt yourself, it works as a dbt plugin.
That means, in this part we're going to unleash dbt [(data build tool)](https://www.getdbt.com/) onto our data to transform it into meaningful information. Don't worry, you don't need to know anything about dbt, this tutorial is self-contained. You do not need to install dbt yourself, it works as a Meltano plugin.

<div class="notification is-success">
<p>If you're having trouble throughout this tutorial, you can always head over to the <a href="https://meltano.com/slack">Slack channel</a> to get help.</p>
Expand Down Expand Up @@ -61,7 +61,7 @@ INFO METRIC: {"type": "timer", "metric": [...]
Next, we add the dbt plugin to transform this data.

## Install and configure the postgres specific dbt transformer
Dbt uses different [adapters](https://docs.getdbt.com/docs/supported-data-platforms) depending on the database/warehouse/platform you use. Meltano transformers match this pattern; in this case our transformer is `dbt-postgres`. As usual, you can use the `meltano add` command to add it to your project.
dbt uses different [adapters](https://docs.getdbt.com/docs/supported-data-platforms) depending on the database/warehouse/platform you use. Meltano transformers match this pattern; in this case our transformer is `dbt-postgres`. As usual, you can use the `meltano add` command to add it to your project.

<div class="termy">

Expand Down Expand Up @@ -100,7 +100,7 @@ To learn more about file bundle 'files-dbt-postgres', visit https://hub.meltano.
</div>

<br />
As you can see, this adds both the transformer as well as a "file bundle" to your project. You can verify that this worked by viewing the newly populated directory `transform`.
As you can see, this adds both the transformer as well as a [file bundle](/concepts/plugins#file-bundles) to your project. You can verify that this worked by viewing the newly populated directory `transform`.

## Configure dbt
Configure the dbt-postgres transformer to use the same configuration as our target-postgres loader using `meltano config`:
Expand All @@ -123,7 +123,7 @@ $ meltano config dbt-postgres set schema analytics
</div>

## Add our source data to dbt
The E(t)L pipeline run already added our source data into the schema `tap_github` as table `commits`. Dbt will need to know where to locate this dataa. Let's add that to dbt to work with:
The E(t)L pipeline run already added our source data into the schema `tap_github` as table `commits`. dbt will need to know where to locate this data. Let's add that to our dbt project:
```bash
mkdir transform/models/tap_github
Expand Down
2 changes: 1 addition & 1 deletion docs/src/_getting-started/part4.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ These lines define the name "hide-github-mails" as the name of our mapping. We c
field_paths: ["author/email", "committer/email"]
type: "HASH"
```
These lines define one transformation. We instruct to target the stream "commits", and therein the field "commit". We then use the field paths to navigate to the two emails we know are contained within this message and set the type to "HASH". Using "HASH" means we will still be able to tell whether two emails are the same, but not be able to read the email. They will be replaced with a random hash.
These lines define one transformation. We instruct to target the stream "commits", and therein the field "commit". We then use the field paths to navigate to the two emails we know are contained within this message and set the type to "HASH". Using "HASH" means we will still be able to tell whether two emails are the same, but not be able to read the email. They will be replaced with a SHA-256 hash of the email.

## Run the data integration (E(t)LT) pipeline
Now we're ready to run the data integration process with these modifications again. To do so, we'll need to clean up first, since we already ran the EL process in part 1. The primary key is still the same and as such the ingestion would fail.
Expand Down

0 comments on commit e02c96f

Please sign in to comment.