-
Notifications
You must be signed in to change notification settings - Fork 2
Setting up alternative data stores
HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores
Snowplow supports storing your data in two different data stores:
| Storage | Description | Status |
|---|---|---|
| S3 | Data is stored in the S3 file system where it can be analysed using EMR emr (e.g. Hive, Pig, Mahout) | Production-ready |
| Redshift setup-redshift | A columnar database offered as a service on EMR. Optimized for performing OLAP analysis. Scales to Petabytes | Production-ready |
| [PostgreSQL] setup-postgres | A popular, open source, RDBMS database | Production-ready |
By setting up the EmrEtlRunner (in the previous step), you are already successfully loading your Snowplow event data into S3 where it is accessible to EMR for analysis.
If you wish to analyse your data using a wider range of tools (e.g. BI tools like ChartIO chartio or Tableau tableau, or statistical tools like R r), you will want to load your data into a database like Amazon's Redshift redshift or PostgreSQL to support enable use of these tools.
The StorageLoader storage-loader-setup is an application to make it simple to keep an updated copy of your data in Redshift. Setting up Snowplow so that you can maintain a copy of your data in a database like Redshift is a two step process:
- [Create a database and table in Amazon Redshift for the data] setup-redshift
- Setup the StorageLoader storage-loader-setup so that it regularly updates that table with the latest data from S3
Currently, the only supported datastores for Snowplow data are Redshift and PostgreSQL. If you wish to use either Redshift or PostgreSQL as a storage target alongside S3, first setup either
- Redshift setup-redshift, or
- [PostgreSQL] [setup-postegres]
Afterwards, you will be need to [set up the StorageLoader to regularly transfer Snowplow data into your new store] storage-loader-setup
All done? Then start analysing your data.
Note: We recommend running all Snowplow AWS operations through an IAM user with the bare minimum permissions required to run Snowplow. Please see our IAM user setup page for more information on doing this.
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2014 Snowplow Analytics Ltd
HOME > SNOWPLOW SETUP GUIDE > Step 4: Setting up alternative data stores
- [Step 1: Setup a Collector] (setting-up-a-collector)
- Step 2a: Setup a Tracker
- Step 2b: Setup a Webhook
- [Step 3: Setup Enrich] (setting-up-enrich)
- [Step 4: Setup alternative data stores] (setting-up-alternative-data-stores)
- [4.1: setup Redshift] (setting-up-redshift)
- [4.2: setup PostgreSQL] (setting-up-postgresql)
- [4.3: installing the StorageLoader] (1-installing-the-storageloader)
- [4.4: using the StorageLoader] (2-using-the-storageloader)
- [4.5: scheduling the StorageLoader] (3-scheduling-the-storageloader)
- [4.6: loading shredded types] (4-Loading-shredded-types)
- [Step 5: Analyze your data!] (Getting started analyzing Snowplow data)
Useful resources
