Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions docs/2-getting-started/start-free-with-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,11 +172,13 @@ Set up CI/CD to automatically upload metadata and run validation checks on every

See the CI/CD sections for complete setup guides:

- [Setup CD](/7-cicd/setup-cd/)
- [Setup CI](/7-cicd/setup-ci/)

- GitHub integration configured
- Team plan subscription or free trial
- [Getting Started with CI/CD](../7-cicd/ci-cd-getting-started.md)
- GitHub CI/CD
- [Setup CI for GitHub](../7-cicd/github/setup-ci.md)
- [Setup CD for GitHub](../7-cicd/github/setup-cd.md)
- GitLab CI/CD
- [Setup CI for Gitlab](../7-cicd/gitlab/setup-ci.md)
- [Setup CD for Gitlab](../7-cicd/gitlab/setup-cd.md)

### Automation Benefits

Expand Down
91 changes: 91 additions & 0 deletions docs/7-cicd/ci-cd-getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: CI/CD Getting Started
---

# CI/CD Getting Started

Automate data validation in your development workflow. Catch data issues before they reach production with continuous integration and delivery built specifically for dbt projects.

## What you'll achieve

Set up automated workflows that:

- **Maintain current baselines** - Auto-update comparison baselines on every merge to main
- **Validate every PR/MR** - Run data validation checks automatically when changes are proposed
- **Prevent regressions** - Catch data quality issues before they reach production
- **Save team time** - Eliminate manual validation steps for every change

!!!note
CI/CD automation requires Recce Cloud Team plan. A free trial is available.

## Understanding CI vs CD

Recce uses both continuous integration and continuous delivery to automate data validation:

**Continuous Integration (CI)**

- **When**: Runs on every PR/MR update
- **Purpose**: Validates proposed changes against baseline
- **Benefit**: Catches issues before merge, with results in your PR/MR

**Continuous Delivery (CD)**

- **When**: Runs after merge to main branch
- **Purpose**: Updates your baseline Recce session with latest production state
- **Benefit**: Ensures future comparisons use current baseline

## Choose your platform

Recce integrates with both GitHub Actions and GitLab CI/CD.

Select your Git platform to get started:

### GitHub
If your dbt project uses GitHub:

1. [Setup CI](./github/setup-ci.md) - Auto-validate changes in every PR
2. [Setup CD](./github/setup-cd.md) - Auto-update baseline on merge to main

### GitLab
If your dbt project uses GitLab:

2. [Setup CI](./gitlab/setup-ci.md) - Auto-validate changes in every MR
1. [Setup CD](./gitlab/setup-cd.md) - Auto-update baseline on merge to main
3. [GitLab Personal Access Token Guide](./gitlab/gitlab-pat-guide.md) - Required for GitLab integration

## Prerequisites

Before setting up, ensure you have:

- **Recce Cloud account** with Team plan or free trial
- **Repository connected** to Recce Cloud ([setup guide](../2-getting-started/start-free-with-cloud.md#git-integration))
- **dbt artifacts** (`manifest.json` and `catalog.json`) from your project

## Architecture overview

Both CI and CD workflows follow the same pattern:

1. **Trigger event** (merge to main, or PR/MR opened/updated)
2. **Generate dbt artifacts** (`dbt docs generate` or external source)
3. **Upload to Recce Cloud** (automatic via workflow action)
4. **Validation results** appear in Recce dashboard and PR/MR

<figure markdown>
![Recce CI/CD architecture](../assets/images/7-cicd/ci-cd.png){: .shadow}
<figcaption>Automated validation workflow for pull requests</figcaption>
</figure>

## Next steps

1. Choose your platform (GitHub or GitLab)
2. Start with CD setup to establish baseline updates
3. Add CI setup to enable PR/MR validation
4. Review [best practices](./best-practices-prep-env.md) for environment preparation

## Related workflows

After setting up CI/CD automation, explore these workflow guides:

- [Development workflow](./scenario-dev.md) - Validate changes during development
- [PR/MR review workflow](./scenario-pr-review.md) - Collaborate on validation results
- [Preset checks](./preset-checks.md) - Configure automatic validation checks
211 changes: 211 additions & 0 deletions docs/7-cicd/github/scenario-ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
---
title: Setup CI in Open Source
---

# Recce CI integration with GitHub Action

Recce provides the `recce run` command for CI/CD pipeline. You can integrate Recce with GitHub Actions (or other CI tools) to compare the data models between two environments when a new pull-request is created. The below image describes the basic architecture.

![ci/cd architecture](/assets/images/7-cicd/ci-cd.png){: .shadow}

The following guide demonstrates how to configure Recce in GitHub Actions.

## Prerequisites

Before integrating Recce with GitHub Actions, you will need to configure the following items:

- Set up **two environments** in your data warehouse. For example, one for base and another for pull request.

- Provide the **credentials profile** for both environments in your `profiles.yml` so that Recce can access your data warehouse. You can put the credentials in a `profiles.yml` file, or use environment variables.

- Set up the **data warehouse credentials** in your [GitHub repository secrets](https://docs.github.com/en/actions/reference/encrypted-secrets).

## Set up Recce with GitHub Actions

We suggest setting up two GitHub Actions workflows in your GitHub repository. One for the base environment and another for the PR environment.

- **Base environment workflow**: Triggered on every merge to the `main branch`. This ensures that base artifacts are readily available for use when a PR is opened.

- **PR environment workflow**: Triggered on every push to the `pull-request branch`. This workflow will compare base models with the current PR environment.

### Base Workflow (Main Branch)

This workflow will perform the following actions:

1. Run dbt on the base environment
2. Upload the generated DBT artifacts to [GitHub workflow artifacts](https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts) for later use

```yaml
name: Recce CI Base Branch

on:
workflow_dispatch:
push:
branches:
- main

concurrency:
group: recce-ci-base
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: "3.10.x"

- name: Install dependencies
run: |
pip install -r requirements.txt

- name: Run DBT
run: |
dbt deps
dbt seed --target ${{ env.DBT_BASE_TARGET }}
dbt run --target ${{ env.DBT_BASE_TARGET }}
dbt docs generate --target ${{ env.DBT_BASE_TARGET }}
env:
DBT_BASE_TARGET: "prod"

- name: Upload DBT Artifacts
uses: actions/upload-artifact@v4
with:
name: target
path: target/
```

!!! note

Please place the above file in `.github/workflows/dbt_base.yml`. This workflow path will also be used in the next PR workflow. If you place it in a different location, please remember to make the corresponding changes in the next step.

### PR Workflow (Pull Request Branch)

This workflow will perform the following actions:

1. Run dbt on the PR environment.
2. Download previously generated base artifacts from base workflow.
3. Use Recce to compare the PR environment with the downloaded base artifacts.
<!-- 4. Use Recce to generate the summary of the current changes and post it as a comment on the pull request. Please refer to the [Recce Summary](./recce-summary.md) for more information. -->

````yaml
name: Recce CI PR Branch

on:
pull_request:
branches: [main]

jobs:
check-pull-request:
name: Check pull request by Recce CI
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Merge Base Branch into PR
uses: DataRecce/PR-Update@v1
with:
baseBranch: ${{ github.event.pull_request.base.ref }}
autoMerge: false
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10.x"
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install recce
- name: Prepare dbt Base environment
run: |
gh repo set-default ${{ github.repository }}
base_branch=${{ github.base_ref }}
run_id=$(gh run list --workflow ${WORKFLOW_BASE} --branch ${base_branch} --status success --limit 1 --json databaseId --jq '.[0].databaseId')
echo "Download artifacts from run $run_id"
gh run download ${run_id} -n target -D target-base
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
WORKFLOW_BASE: ".github/workflows/dbt_base.yml"
- name: Prepare dbt Current environment
run: |
git checkout ${{ github.event.pull_request.head.sha }}
dbt deps
dbt seed --target ${{ env.DBT_CURRENT_TARGET}}
dbt run --target ${{ env.DBT_CURRENT_TARGET}}
dbt docs generate --target ${{ env.DBT_CURRENT_TARGET}}
env:
DBT_CURRENT_TARGET: "dev"

- name: Run Recce CI
run: |
recce run --github-pull-request-url ${{ github.event.pull_request.html_url }}

- name: Upload DBT Artifacts
uses: actions/upload-artifact@v4
with:
name: target
path: target/

- name: Upload Recce State File
uses: actions/upload-artifact@v4
id: recce-artifact-uploader
with:
name: recce-state-file
path: recce_state.json
````
<!--
- name: Prepare Recce Summary
id: recce-summary
run: |
recce summary recce_state.json > recce_summary.md
cat recce_summary.md >> $GITHUB_STEP_SUMMARY
echo '${{ env.NEXT_STEP_MESSAGE }}' >> recce_summary.md

# Handle the case when the recce summary is too long to be displayed in the GitHub PR comment
if [[ `wc -c recce_summary.md | awk '{print $1}'` -ge '65535' ]]; then
echo '# Recce Summary
The recce summary is too long to be displayed in the GitHub PR comment.
Please check the summary detail in the [Job Summary](${{github.server_url}}/${{github.repository}}/actions/runs/${{github.run_id}}) page.
${{ env.NEXT_STEP_MESSAGE }}' > recce_summary.md
fi

env:
NEXT_STEP_MESSAGE: |
## Next Steps
If you want to check more detail information about the recce result, please download the [artifact](${{ steps.recce-artifact-uploader.outputs.artifact-url }}) file and open it by [Recce](https://pypi.org/project/recce/) CLI.

### How to check the recce result
```bash
# Unzip the downloaded artifact file
tar -xf recce-state-file.zip

# Launch the recce server based on the state file
recce server --review recce_state.json

# Open the recce server http://localhost:8000 by your browser
```

- name: Comment on pull request
uses: thollander/actions-comment-pull-request@v2
with:
filePath: recce_summary.md
comment_tag: recce
-->


## Review the Recce State File

Review the downloaded Recce [state file](../../8-technical-concepts/state-file.md) with the following command:

```bash
recce server --review recce_state.json
```

In the Recce server `--review` mode, you can review the comparison results of the data models between the base and current environments. It will contain the row counts of modified data models.
<!-- and the results of any Recce [Preset Checks](./preset-checks.md). -->
Loading