Skip to content
This repository was archived by the owner on May 18, 2021. It is now read-only.

Setting up alternative data stores

Yali Sassoon edited this page Feb 21, 2013 · 20 revisions

HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores

  1. Overview
  2. Setting up SnowPlow to work with additional data stores

Overview

SnowPlow supports storing your data in multiple different data stores:

Storage Description Status
S3 Data is stored in the S3 file system where it can be analysed using EMR emr (e.g. Hive, Pig, Mahout) Production-ready
Infobright infobright An open source columnar database accessible via the MySQL JDBC driver. (So compatible with a wide range of analytics tools.) Optimized for performing OLAP analysis. Scales to Terabytes Production-ready
Redshift redshift A columnar database offered as a service on EMR. Optimized for performing OLAP analysis. Scales to Petabytes Coming-soon
SkyDB skydb Open source database for analysis of behavioural / event data Coming soon

By setting up the EmrEtlRunner (in the previous step), you are already successfully loading your data into S3 where it is accessible to EMR for analysis.

If you wish to analyse your data using a wider range of tools (e.g. BI tools like ChartIO chartio), you will want to load your data into a columnar database like Infobright to support enable use of these tools.

The StorageLoader storage-loader-setup is an application to make it simple to keep an updated copy of your data in multiple data sources including Infobright. Setting up SnowPlow so that you can maintain a copy of your data in a database like Infobright is a two step process:

  1. [Create a database and table in Infobright for the data] setup-infobright
  2. Setup the StorageLoader storage-loader-setup so that it regularly updates that table with the latest data from S3
## Setting up SnowPlow to work with additional data stores

Select the appropriate option below to walk through the steps necessary to setup SnowPlow with the following data stores:

  1. [Set up Redshift to work with SnowPlow] setup-redshift
  2. [Set up Infobright to work with SnowPlow] setup-infobright
  3. Setup SkyDB to work with SnowPlow (coming soon)

After you have setup one or more of the above databases, you need to:

  • [Set up the StorageLoader to regularly transfer SnowPlow data into your new store] storage-loader-setup

HOME > SNOWPLOW SETUP GUIDE > Step 4: Setting up alternative data stores

Setup Snowplow

  • [Step 1: Setup a Collector] (setting-up-a-collector)
  • Step 2a: Setup a Tracker
  • Step 2b: Setup a Webhook
  • [Step 3: Setup Enrich] (setting-up-enrich)
  • [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
    • [4.1: setup Redshift] (setting-up-redshift)
    • [4.2: setup PostgreSQL] (setting-up-postgresql)
    • [4.3: installing the StorageLoader] (1-installing-the-storageloader)
    • [4.4: using the StorageLoader] (2-using-the-storageloader)
    • [4.5: scheduling the StorageLoader] (3-scheduling-the-storageloader)
    • [4.6: loading shredded types] (4-Loading-shredded-types)
  • [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)

Useful resources

Clone this wiki locally