Batch pipeline steps r90

This documentation is outdated!

🚧 The latest setup guidance for Snowplow can be found on the Snowplow documentation site.

This page refers to Snowplow R90

Click here for the corresponding documentation for other releases

Dataflow diagram

Recovery steps

The below table summarizes the actions to be taken at each particular step failure from the dataflow diagram above.

Failed step	Recovery actions
1	If no files have been moved yet (`raw:processing` [A] is empty), rerun the EmrEtlRunner as usual. If (on the other hand) some files have already been moved, rerun the EmrEtlRunner with `--skip staging` option to proceed with processing of those log files.
2	Rerun the EmrEtlRunner with `--skip staging` option.
3	Rerun the EmrEtlRunner with `--skip staging` option. Note: The `enriched:bad` [D] and `enriched:error` [E] could contain the files produced as a result of the step 3. Therefore rerunning the EmrEtlRunner could result in duplicated `bad`/`error` files. This could be significant if `elasticsearch` step [8-9] is engaged for examining `bad` data [D]. The outcome would be the same data timestamped with different time values by different EMR runs.
4	Delete `enriched:good` files [F] and rerun the EmrEtlRunner with `--skip staging` option.
5	Delete `enriched:good` files [F] and rerun the EmrEtlRunner with `--skip staging` option.
6	Delete `enriched:good` files [F] and rerun the EmrEtlRunner with `--skip staging` option. Note: The `enriched:bad` [D] and `shredded:bad` [H] could contain the files produced as a result of the step 3 and 6 respectively. Therefore rerunning the EmrEtlRunner could result in duplicated `bad` files. This could be significant if `elasticsearch` step (8-9) is engaged for examining `bad` data ([D],[H]). The outcome would be the same data timestamped with different time values by different EMR runs.
7	Delete `enriched:good` [F] and `shredded:good` [K]. Rerun the EmrEtlRunner with `--skip staging` option.
8	If duplicated `bad` data is not critical rerun the EmrEtlRunner with `--skip staging,enrich,shred` option. If duplicated bad data is critical, instructions to come (#2593). WARNING: In R90+ if you pass `--skip shred` to EmrEtlRunner then RDB Loader does not load unstructured events and contexts. This issue is to be resolved in R92.
9	If duplicated `bad` data is not critical rerun the EmrEtlRunner with `--skip staging,enrich,shred` option. If duplicated bad data is critical, instructions to come (#2593). WARNING: In R90+ if you pass `--skip shred` to EmrEtlRunner then RDB Loader does not load unstructured events and contexts. This issue is to be resolved in R92.
10	Rerun the EmrEtlRunner with `--skip staging,enrich,shred,elasticsearch` option. WARNING: In R90+ if you pass `--skip shred` to EmrEtlRunner then RDB Loader does not load unstructured events and contexts. This issue is to be resolved in R92.
11	The data load cannot result in partial load due to the use of `COMMIT`. However, if more than one data target is used you would need to rerun the EmrEtlRunner with the successfully loaded target removed from the `config.yml` configuration file to retry loading the "failed" target. Note: If the failure occurred at `analyze` stage, you can skip it with `--skip staging,enrich,shred,archive_raw,rdb_load` option.
12	Rerun the EmrEtlRunner with `--skip staging,enrich,shred,archive_raw,rdb_load` option.

HOME > SNOWPLOW SETUP GUIDE

Setup Snowplow

Useful resources

Troubleshooting
AWS sub-account setup
IAM Setup
Hosted assets
Glossary of Terms
Upgrade Guide
Snowplow Version Matrix
Batch Pipeline Steps (block dataflow diagram)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch pipeline steps r90

This documentation is outdated!

Dataflow diagram

Recovery steps

Clone this wiki locally