Releases · snowplow/snowplow

02 Sep 14:07

alexanderdean

0.9.7

6292529

Snowplow v0.9.7

A "tidy-up" release which fixes some important bugs, particularly:

A bug in 0.9.5 onwards which was preventing events containing multiple JSONs from being shredded successfully
Our Hive table definition falling behind Snowplow 0.9.6's enriched event format updates
A bug in EmrEtlRunner causing issues running Snowplow inside some VPC environments

Blog post

Trackers

Ruby Tracker: bumped git submodule to 0.3.0 (#939)
Java Tracker: bumped git submodule to 0.5.1 (#948)
Node.js Tracker: added git submodule. Version 0.1.0 (#949)
Fixed broken git submodule links, thanks @OAGr! (#957)

EmrEtlRunner

Bumped to 0.9.1
Fixed @jobflow.ec2_subnet_id not being set due to incorrect guard, thanks @rslifka! (#956)
Fixed bugs in --process-bucket (#973)
Renamed --process-bucket option to --process-enrich (#972)
Changed -s option for --skip to -x prevent clash with -s for --start (#975)
Now allows shredding without prior enrichment (#927)

StorageLoader

Bumped to 0.3.2
Removed EMPTYASNULL for loading JSONs (#942)
Added missing targetUrl field to ad_impression JSON Path file, thanks @gisripa! (#951)
Made providing jsonpath_assets optional (#958)
Added support for cross-region Redshift COPY (#971)

Hive Storage

Bumped table-def.q to 0.2.0
Added and removed fields to synchronize with 0.9.6's enriched event format (#965)

Scala Hadoop Shred

Bumped to version 0.2.1
Fixed multiple JSONs not being shredded for a single row (#968)
Strengthened test suite (#967)

Assets 2

26 Jul 12:27

alexanderdean

0.9.6

3286f92

Snowplow v0.9.6

This release does four things:

It fixes some important bugs discovered in Snowplow 0.9.5, related to our new shredding functionality
It introduces new JSON-based configurations for Snowplow's existing enrichments
It extends our geo-IP lookup enrichment to support all five of MaxMind's commercial databases
It extends our referer-parsing enrichment to support a user-configurable list of internal domains

Blog post

Java Tracker

Bumped git submodule to 0.4.0 (#892)

EmrEtlRunner

Bumped to 0.9.0
Passed etl_tstamp into Hadoop Enrich as an argument (#396)
Removed enrichment-specific code (#811)
Removed enrichment-specific parameters from config.yml.sample (#809)
Replaced enrichment-specific arguments from EmrEtlRunner (#808)
Removed %3D code following Scalding upgrade (#849)
Fixed contract on partition_by_run (#894)
Updated Bash script to support enrichments path (#916)

StorageLoader

Bumped to 0.3.1
Now looking in eu-west-1 region for s3://snowplow-hosted-assets (#895)
Updated combined Bash script to support enrichments path (#917)

Scala Hadoop Enrich

Bumped to 0.6.0
Bumped Scala to 2.10.4 (#912)
Bumped Scalding to 0.11.1 (#911)
Bumped Hadoop to 1.2.1 (#913)
Bumped to Scala Common Enrich 0.5.0 (#788)
Passed etl_tstamp into Scala Common Enrich (#817)
Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#835)
Removed %3D handling for compatibility with old Scalding Args (#850)
Added ability to download additional MaxMind databases (#885)
Added runHadoop and Tool.main tests (#914)

Scala Common Enrich

Bumped to 0.5.0
Bumped user-agent-utils version, thanks @pkallos! (#662)
Bumped referer-parser to 0.2.2 (#864)
Bumped httpclient to 4.3.3 (#897)
Bumped scala-maxmind-geoip to scala-maxmind-iplookups 0.1.0 (#882)
Stored etl_tstamp in new field in CanonicalOutput (#818)
Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#836)
Made referer parsing configurable with list of internal domains (#857)
Migrated configurable enrichments to new EnrichmentRegistry (#858)
Added validation of enrichments JSON (#807)
Replaced "anon_ip_quartets" with "anon_ip_octets" everywhere (#547)
Added ability to extract event_id from querystring (#723)
Extracted CanonicalInput's userId as network_userid, thanks @pkallos! (#855)
Added MaxMind region_name field (#873)
Added IP -> ISP lookup (#861)
Added IP -> organization lookup (#887)
Added IP -> domain lookup (#886)
Added IP -> net speed lookup (#889)
Added validation for transaction ID (#428)
Renamed Tests to Specs for consistency (#618)

Scala Hadoop Shred

Bumped to 0.2.0
Bumped to Scala Common Enrich 0.5.0 (#918)
Trailing empty fields no longer cause shredding for that row to fail (#921)
Updated column offsets for enriched events TSV (#915)

Redshift

Bumped table-def to 0.4.0
Migration script added for 0.3.0 to 0.4.0
Added etl_tstamp to atomic.events (#819)
Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#834)
Added new MaxMind fields (#871)
Applied runlength encoding to all fields keyed off IP address (#883)
Migration script added for 0.3.0 to 0.4.0 (#838)

Postgres

Bumped table-def to 0.3.0
Migration script added for 0.2.0 to 0.3.0
Added etl_tstamp to atomic.events (#820)
Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#833)
Added new MaxMind fields (#871)
Migration script added for 0.2.0 to 0.3.0 (#837)

Assets 2

09 Jul 14:12

alexanderdean

0.9.5

76b1fde

Snowplow v0.9.5

Now validates incoming event and context JSONs (using JSON Schema), and then automatically shreds those JSONs into dedicated tables in Amazon Redshift.

Blog post

Trackers

Ruby Tracker: added git submodule. Version 0.1.0 (#645)
Java Tracker: added git submodule. Version 0.2.0 (#843)
JavaScript Tracker: bumped git submodule to 2.0.0 (#635)
Python Tracker: bumped Python Tracker git submodule to 0.4.0 (#634)

Scala Hadoop Shred

Added. Version 0.1.0

EmrEtlRunner

Bumped to 0.8.0
Updated S3DistCp steps to use new S3DistCpStep from Elasticity (#629)
Added --skip s3distcp option (#313)
Added ability to start Lingual in EmrEtlRunner (#623)
Added ability to start HBase in EmrEtlRunner (#622)
Improved load performance by switching ETL to write out to HDFS (#278)
Now invoking Scala Hadoop Shredder after main job (#644)
Added :iglu: section to config.yml for Scala Hadoop Shred (#814)
Updated to run Scala Hadoop Shred following Hadoop Enrich (#815)
Added --skip shred option (#659)

StorageLoader

Bumped to 0.3.0
Bumped Sluice to 0.2.1 (#881)
Added initial Ruby.contracts support (#391)
Updated config.yml to support shredding (#897)
Added ACCEPTINVCHARS to StorageLoader (#411)
Wrote JSON Path files for ad_* events (#642)
Wrote JSON Path file for link_click (#599)
Wrote JSON Path file for screen_view (#643)
Wrote JSON Path file for schema.org's WebPage (#772)
Added :jsonpath_assets: setting for StorageLoader (#606)
Added ability to load custom tables using JSON Paths (#607)
Added --skip shred option (#660)
Added :in: hint on StorageLoader configuration, thanks @joaolcorreia! (#755)

Redshift

Added Redshift DDL for ad_* events (#639)
Added Redshift DDL for link_click events (#600)
Added Redshift DDL for screen_view events (#640)
Added Redshift DDL for schema.org's WebPage (#771)

Looker Analytics

Wrote LookML for ad_* events (#605)
Wrote LookML for screen_view events (#637)
Wrote LookML for link_click events (#636)
Wrote LookML for schema.org's WebPage (#770)
Updated LookML to use liquid templating (#851)

Assets 2

30 May 11:47

alexanderdean

0.9.4

04b2fc0

Snowplow v0.9.4

Improvements to the Looker models bundled with Snowplow.

Blog post

Looker Analytics

New 'traffic_pulse' dashboard with globally configurable drill-down variables (#765)
Snowplow website specific dimensions and metrics removed: base model is now company-generic (#764)
Cleaner joining of data sets in Looker model (#763)
Dimensions and metrics renamed to make it clearer for an analyst getting started with the data (#761)
Added distkeys and sortkeys to derived tables to speed up query times (#696)
Derived tables now auto-generated when new data is loaded into atomic.events (#688)
'visits' renamed to 'sessions' (#762)
LookML models versioned using SchemaVer (#766)

Redshift

Added reference_data.country_codes (#779)

Postgres

Added reference_data.country_codes (#781)

Assets 2

30 May 11:52

alexanderdean

0.9.3

ed7b3b9

Snowplow v0.9.3

Improvements to EmrEtlRunner and the Clojure Collector.

EmrEtlRunner

Bumped to 0.7.0
Bumped Sluice to 0.2.1 (#405)
Bumped Elasticity to 3.0.4 (#665)
Replaced hadoop_version setting with ami_version setting (#701)
Fixed handling of region, placement and ec2_subnet_id (#754)
Fixed regression where 0 files staged still kicks off EMR (#409)
Stopped Sluice file operation threads being killed by folders (#401)
Fixed disabling of Cascading error catching (#721)
Renamed Clojure Collector log files in processing bucket to support multiple instances (#717)
Added initial Ruby.contracts support into EmrEtlRunner (#392)
Updated to use the Ruby Logger (#194)
Updated so it's embeddable in other applications (#128)
Added ability to bundle as a JRuby fat jar (#674)
Added initial unit tests (#672)

Clojure Collector

Bumped to 0.6.0
Load balancer IP address getting stored in logs (#719)

Documentation

Removed all Snowplow tracking from READMEs, thanks @acinader! (#720)
Fixed EmrEtlRunner documentation is (slightly) inconsistent, thanks @pvdb! (#749)

Assets 2

30 Apr 13:48

alexanderdean

0.9.2

dbe9e87

Snowplow v0.9.2

Rapid release to accommodate Amazon's April 29th update to the CloudFront Access Log file format

Blog post

Scala Hadoop Enrich

Bumped to 0.5.0
Bumped to Scala Common Enrich 0.3.0 (#699)
Bumped SBT to 0.13.2 (#702)
Bumped to using using sbt-assembly 0.11.2 (#704)

Scala Common Enrich

Bumped to 0.4.0
Upgraded to support new and future CloudFront file formats (#698)
Bumped SBT to 0.13.2 (#703)

Scala Hadoop Bad Rows

Added. Version 0.1.0

Hive Storage

Added new unstructured fields to Hive table definition (#709)
Added raw page_url and page_referrer into Hive table (#710)
Added name_tracker field to Hive table (#711)

Assets 2

11 Apr 17:03

alexanderdean

0.9.1

9f47a6c

Snowplow v0.9.1

Initial support for custom unstructured events and custom contexts, plus a host of other small improvements.

Blog post

Scala Hadoop Enrich

Bumped to 0.4.0
Bumped to Scala Common Enrich 0.3.0 (#497)
Renamed AnonQuartets to AnonOctets (#498)
Renamed all Snowplow Hadoop Tests to Specs (#515)
Added page_url and page_referrer back into ETL's output (#483)

Scala Common Enrich

Bumped to 0.3.0
Bumped Argonaut to 6.0.3 (#620)
Added app and mob as valid platform codes, thanks @kinabalu! (#524)
Added support for remaining platform codes (#516)
Updated POJO in Scalding ETL to include new unstructured fields (#362)
Updated POJO in Scalding ETL to include name_tracker field (#595)
Extract evn from Tracker Protocol (#604)
Extract tna from Tracker Protocol (#616)
Extract and validate unstructured events (#142)
Extract and validate custom contexts (#426)
Reformat incoming event and context JSONs (#589)
Make sure to error a JSON if > length (#567)

EmrEtlRunner

Bumped to 0.6.0
Bumped Elasticity to 3.0.2 (#587)
Allowed AWS VPC selection in EmrEtlRunner (#581)
Set :visible_to_all_users to true for EMR jobs, thanks @smugryan! (#560)

Redshift

atomic-def script bumped to 0.3.0
Migration script added for 0.2.2 to 0.3.0
Added new unstructured fields to Redshift table definition (#361)
Changed distkey to be event_id, not domain_userid (#584)
Added raw page_url and page_referrer into Redshift table (#591)
Added name_tracker field to Redshift table (#594)
Converted Redshift varchar(38) for event IDs to char(36) (#282)

Postgres

atomic-def script bumped to 0.2.0
Migration script added for 0.1.x to 0.2.0
Added new unstructured fields to Postgres table definition (#359)
Added raw page_url and page_referrer into Postgres table (#592)
Added name_tracker field to Postgres table (#593)
Converted varchar(36) for event IDs to char(36) (#596)

StorageLoader

Bumped to 0.2.0
Added TIMEFORMAT 'auto' to StorageLoader to handle outlier dvce_timestamps (#427)

JavaScript Tracker

Bumped git submodule to 1.0.1 (#585)

Python Tracker

Added git submodule pointing to 0.1.0 (#586)

Assets 2

04 Feb 18:25

alexanderdean

0.9.0

789828b

Snowplow v0.9.0

Releasing initial beta of Amazon Kinesis support.

Blog post

Thrift Raw Event

Added. Version 0.1.0
Specified Thrift IDL for new raw event schema (#430)

Scala Stream Collector

Added. Version 0.1.0
Implemented new spray-can (Akka Http) Scala stream collector (#432)

Scala Kinesis Enrich

Added. Version 0.1.0
Implemented initial Kinesis-based enrichment (#460)

Scala Common Enrich

Bumped to 0.2.0
Added Thrift SnowplowRawEvent as a dependency to common-enrich (#475)
Added ability to read Thrift SnowplowRawEvent (Thrift) (#462)
Renamed CloudFront to Cloudfront in code (#495)
Renamed AnonQuartets to AnonOctets (#491)
Added raw -> CanonicalInput tests (#484)
Updated GET payload extraction to handle empty payloads (#502)

Git housekeeping

Changed git:// protocol in .gitmodules to https:// (#512)
Removed contrib-nodejs-collector from 2-collectors (#474)
Bumped JS Tracker submodule to 0.13.1 release (#511)

Assets 2

08 Jan 13:42

alexanderdean

0.8.13

4aab8b1

Snowplow v0.8.13

Releasing initial version of the Snowplow metadata model for Looker.

Blog post

Looker Analytics

Added 0.1.0
Created Snowplow metadata model for Looker BI (www.looker.com) (#472)

Assets 2

07 Jan 14:16

alexanderdean

0.8.12

7d0389b

Snowplow v0.8.12

Various small improvements to our Scalding-based Enrichment process, plus some architectural re-work.

Blog post

Scala Hadoop Enrich

Bumped to 0.3.6
Bumped to SBT 0.13.0 (#404)
Bumped to using sbt-assembly 0.10.1 (#421)
Bumped to Scala 2.10.3 (#423)
Bumped to Scalding 0.8.11 (#422)
Upgraded useragent utils to 1.11 & moved to Maven dependency (#416)
Added test running back into sbt-assembly step (#420)
Updated copyright messages to be Snowplow not SnowPlow, and to 2014 not 2013 (#419)
Added ValidatedString as a type to package.scala (#328)
Added missing validation to stringToJByte (#408)
Missing page URI no longer interpreted as bad row (#399)
Updated CfRegex to reflect Cfcs(Cookie) can be empty (#410)
Numeric fields in tr_ and ti_ now parsed to doubles, not madeTsvSafe strings (#400)
Moved ETL core into separate project scala-enrich-common (#417)

Scala Common Enrich

Updated ETL versioning to include host and common versions (#448)

Postgres

Bumped cube-pages.sql to 0.1.1
Minor fix: cube_pages.complete referenced non-existent table cube_pages.basic, thanks @mrwalker! (#414)

Assets 2

Releases: snowplow/snowplow

Snowplow v0.9.7

Trackers

EmrEtlRunner

StorageLoader

Hive Storage

Scala Hadoop Shred

Snowplow v0.9.6

Java Tracker

EmrEtlRunner

StorageLoader

Scala Hadoop Enrich

Scala Common Enrich

Scala Hadoop Shred

Redshift

Postgres

Snowplow v0.9.5

Trackers

Scala Hadoop Shred

EmrEtlRunner

StorageLoader

Redshift

Looker Analytics

Snowplow v0.9.4

Looker Analytics

Redshift

Postgres

Snowplow v0.9.3

EmrEtlRunner

Clojure Collector

Documentation

Snowplow v0.9.2

Scala Hadoop Enrich

Scala Common Enrich

Scala Hadoop Bad Rows

Hive Storage

Snowplow v0.9.1

Scala Hadoop Enrich

Scala Common Enrich

EmrEtlRunner

Redshift

Postgres

StorageLoader

JavaScript Tracker

Python Tracker

Snowplow v0.9.0

Thrift Raw Event

Scala Stream Collector

Scala Kinesis Enrich

Scala Common Enrich

Git housekeeping

Snowplow v0.8.13

Looker Analytics

Snowplow v0.8.12

Scala Hadoop Enrich

Scala Common Enrich

Postgres