Skip to content

Releases: snowplow/snowplow

Snowplow v0.9.7

02 Sep 14:07
Compare
Choose a tag to compare

A "tidy-up" release which fixes some important bugs, particularly:

  1. A bug in 0.9.5 onwards which was preventing events containing multiple JSONs from being shredded successfully
  2. Our Hive table definition falling behind Snowplow 0.9.6's enriched event format updates
  3. A bug in EmrEtlRunner causing issues running Snowplow inside some VPC environments

Trackers

  • Ruby Tracker: bumped git submodule to 0.3.0 (#939)
  • Java Tracker: bumped git submodule to 0.5.1 (#948)
  • Node.js Tracker: added git submodule. Version 0.1.0 (#949)
  • Fixed broken git submodule links, thanks @OAGr! (#957)

EmrEtlRunner

  • Bumped to 0.9.1
  • Fixed @jobflow.ec2_subnet_id not being set due to incorrect guard, thanks @rslifka! (#956)
  • Fixed bugs in --process-bucket (#973)
  • Renamed --process-bucket option to --process-enrich (#972)
  • Changed -s option for --skip to -x prevent clash with -s for --start (#975)
  • Now allows shredding without prior enrichment (#927)

StorageLoader

  • Bumped to 0.3.2
  • Removed EMPTYASNULL for loading JSONs (#942)
  • Added missing targetUrl field to ad_impression JSON Path file, thanks @gisripa! (#951)
  • Made providing jsonpath_assets optional (#958)
  • Added support for cross-region Redshift COPY (#971)

Hive Storage

  • Bumped table-def.q to 0.2.0
  • Added and removed fields to synchronize with 0.9.6's enriched event format (#965)

Scala Hadoop Shred

  • Bumped to version 0.2.1
  • Fixed multiple JSONs not being shredded for a single row (#968)
  • Strengthened test suite (#967)

Snowplow v0.9.6

26 Jul 12:27
Compare
Choose a tag to compare

This release does four things:

  1. It fixes some important bugs discovered in Snowplow 0.9.5, related to our new shredding functionality
  2. It introduces new JSON-based configurations for Snowplow's existing enrichments
  3. It extends our geo-IP lookup enrichment to support all five of MaxMind's commercial databases
  4. It extends our referer-parsing enrichment to support a user-configurable list of internal domains

Java Tracker

  • Bumped git submodule to 0.4.0 (#892)

EmrEtlRunner

  • Bumped to 0.9.0
  • Passed etl_tstamp into Hadoop Enrich as an argument (#396)
  • Removed enrichment-specific code (#811)
  • Removed enrichment-specific parameters from config.yml.sample (#809)
  • Replaced enrichment-specific arguments from EmrEtlRunner (#808)
  • Removed %3D code following Scalding upgrade (#849)
  • Fixed contract on partition_by_run (#894)
  • Updated Bash script to support enrichments path (#916)

StorageLoader

  • Bumped to 0.3.1
  • Now looking in eu-west-1 region for s3://snowplow-hosted-assets (#895)
  • Updated combined Bash script to support enrichments path (#917)

Scala Hadoop Enrich

  • Bumped to 0.6.0
  • Bumped Scala to 2.10.4 (#912)
  • Bumped Scalding to 0.11.1 (#911)
  • Bumped Hadoop to 1.2.1 (#913)
  • Bumped to Scala Common Enrich 0.5.0 (#788)
  • Passed etl_tstamp into Scala Common Enrich (#817)
  • Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#835)
  • Removed %3D handling for compatibility with old Scalding Args (#850)
  • Added ability to download additional MaxMind databases (#885)
  • Added runHadoop and Tool.main tests (#914)

Scala Common Enrich

  • Bumped to 0.5.0
  • Bumped user-agent-utils version, thanks @pkallos! (#662)
  • Bumped referer-parser to 0.2.2 (#864)
  • Bumped httpclient to 4.3.3 (#897)
  • Bumped scala-maxmind-geoip to scala-maxmind-iplookups 0.1.0 (#882)
  • Stored etl_tstamp in new field in CanonicalOutput (#818)
  • Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#836)
  • Made referer parsing configurable with list of internal domains (#857)
  • Migrated configurable enrichments to new EnrichmentRegistry (#858)
  • Added validation of enrichments JSON (#807)
  • Replaced "anon_ip_quartets" with "anon_ip_octets" everywhere (#547)
  • Added ability to extract event_id from querystring (#723)
  • Extracted CanonicalInput's userId as network_userid, thanks @pkallos! (#855)
  • Added MaxMind region_name field (#873)
  • Added IP -> ISP lookup (#861)
  • Added IP -> organization lookup (#887)
  • Added IP -> domain lookup (#886)
  • Added IP -> net speed lookup (#889)
  • Added validation for transaction ID (#428)
  • Renamed Tests to Specs for consistency (#618)

Scala Hadoop Shred

  • Bumped to 0.2.0
  • Bumped to Scala Common Enrich 0.5.0 (#918)
  • Trailing empty fields no longer cause shredding for that row to fail (#921)
  • Updated column offsets for enriched events TSV (#915)

Redshift

  • Bumped table-def to 0.4.0
  • Migration script added for 0.3.0 to 0.4.0
  • Added etl_tstamp to atomic.events (#819)
  • Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#834)
  • Added new MaxMind fields (#871)
  • Applied runlength encoding to all fields keyed off IP address (#883)
  • Migration script added for 0.3.0 to 0.4.0 (#838)

Postgres

  • Bumped table-def to 0.3.0
  • Migration script added for 0.2.0 to 0.3.0
  • Added etl_tstamp to atomic.events (#820)
  • Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#833)
  • Added new MaxMind fields (#871)
  • Migration script added for 0.2.0 to 0.3.0 (#837)

Snowplow v0.9.5

09 Jul 14:12
Compare
Choose a tag to compare

Now validates incoming event and context JSONs (using JSON Schema), and then automatically shreds those JSONs into dedicated tables in Amazon Redshift.

Trackers

  • Ruby Tracker: added git submodule. Version 0.1.0 (#645)
  • Java Tracker: added git submodule. Version 0.2.0 (#843)
  • JavaScript Tracker: bumped git submodule to 2.0.0 (#635)
  • Python Tracker: bumped Python Tracker git submodule to 0.4.0 (#634)

Scala Hadoop Shred

  • Added. Version 0.1.0

EmrEtlRunner

  • Bumped to 0.8.0
  • Updated S3DistCp steps to use new S3DistCpStep from Elasticity (#629)
  • Added --skip s3distcp option (#313)
  • Added ability to start Lingual in EmrEtlRunner (#623)
  • Added ability to start HBase in EmrEtlRunner (#622)
  • Improved load performance by switching ETL to write out to HDFS (#278)
  • Now invoking Scala Hadoop Shredder after main job (#644)
  • Added :iglu: section to config.yml for Scala Hadoop Shred (#814)
  • Updated to run Scala Hadoop Shred following Hadoop Enrich (#815)
  • Added --skip shred option (#659)

StorageLoader

  • Bumped to 0.3.0
  • Bumped Sluice to 0.2.1 (#881)
  • Added initial Ruby.contracts support (#391)
  • Updated config.yml to support shredding (#897)
  • Added ACCEPTINVCHARS to StorageLoader (#411)
  • Wrote JSON Path files for ad_* events (#642)
  • Wrote JSON Path file for link_click (#599)
  • Wrote JSON Path file for screen_view (#643)
  • Wrote JSON Path file for schema.org's WebPage (#772)
  • Added :jsonpath_assets: setting for StorageLoader (#606)
  • Added ability to load custom tables using JSON Paths (#607)
  • Added --skip shred option (#660)
  • Added :in: hint on StorageLoader configuration, thanks @joaolcorreia! (#755)

Redshift

  • Added Redshift DDL for ad_* events (#639)
  • Added Redshift DDL for link_click events (#600)
  • Added Redshift DDL for screen_view events (#640)
  • Added Redshift DDL for schema.org's WebPage (#771)

Looker Analytics

  • Wrote LookML for ad_* events (#605)
  • Wrote LookML for screen_view events (#637)
  • Wrote LookML for link_click events (#636)
  • Wrote LookML for schema.org's WebPage (#770)
  • Updated LookML to use liquid templating (#851)

Snowplow v0.9.4

30 May 11:47
Compare
Choose a tag to compare

Improvements to the Looker models bundled with Snowplow.

Looker Analytics

  • New 'traffic_pulse' dashboard with globally configurable drill-down variables (#765)
  • Snowplow website specific dimensions and metrics removed: base model is now company-generic (#764)
  • Cleaner joining of data sets in Looker model (#763)
  • Dimensions and metrics renamed to make it clearer for an analyst getting started with the data (#761)
  • Added distkeys and sortkeys to derived tables to speed up query times (#696)
  • Derived tables now auto-generated when new data is loaded into atomic.events (#688)
  • 'visits' renamed to 'sessions' (#762)
  • LookML models versioned using SchemaVer (#766)

Redshift

  • Added reference_data.country_codes (#779)

Postgres

  • Added reference_data.country_codes (#781)

Snowplow v0.9.3

30 May 11:52
Compare
Choose a tag to compare

Improvements to EmrEtlRunner and the Clojure Collector.

EmrEtlRunner

  • Bumped to 0.7.0
  • Bumped Sluice to 0.2.1 (#405)
  • Bumped Elasticity to 3.0.4 (#665)
  • Replaced hadoop_version setting with ami_version setting (#701)
  • Fixed handling of region, placement and ec2_subnet_id (#754)
  • Fixed regression where 0 files staged still kicks off EMR (#409)
  • Stopped Sluice file operation threads being killed by folders (#401)
  • Fixed disabling of Cascading error catching (#721)
  • Renamed Clojure Collector log files in processing bucket to support multiple instances (#717)
  • Added initial Ruby.contracts support into EmrEtlRunner (#392)
  • Updated to use the Ruby Logger (#194)
  • Updated so it's embeddable in other applications (#128)
  • Added ability to bundle as a JRuby fat jar (#674)
  • Added initial unit tests (#672)

Clojure Collector

  • Bumped to 0.6.0
  • Load balancer IP address getting stored in logs (#719)

Documentation

  • Removed all Snowplow tracking from READMEs, thanks @acinader! (#720)
  • Fixed EmrEtlRunner documentation is (slightly) inconsistent, thanks @pvdb! (#749)

Snowplow v0.9.2

30 Apr 13:48
Compare
Choose a tag to compare

Rapid release to accommodate Amazon's April 29th update to the CloudFront Access Log file format

Scala Hadoop Enrich

  • Bumped to 0.5.0
  • Bumped to Scala Common Enrich 0.3.0 (#699)
  • Bumped SBT to 0.13.2 (#702)
  • Bumped to using using sbt-assembly 0.11.2 (#704)

Scala Common Enrich

  • Bumped to 0.4.0
  • Upgraded to support new and future CloudFront file formats (#698)
  • Bumped SBT to 0.13.2 (#703)

Scala Hadoop Bad Rows

  • Added. Version 0.1.0

Hive Storage

  • Added new unstructured fields to Hive table definition (#709)
  • Added raw page_url and page_referrer into Hive table (#710)
  • Added name_tracker field to Hive table (#711)

Snowplow v0.9.1

11 Apr 17:03
Compare
Choose a tag to compare

Initial support for custom unstructured events and custom contexts, plus a host of other small improvements.

Scala Hadoop Enrich

  • Bumped to 0.4.0
  • Bumped to Scala Common Enrich 0.3.0 (#497)
  • Renamed AnonQuartets to AnonOctets (#498)
  • Renamed all Snowplow Hadoop Tests to Specs (#515)
  • Added page_url and page_referrer back into ETL's output (#483)

Scala Common Enrich

  • Bumped to 0.3.0
  • Bumped Argonaut to 6.0.3 (#620)
  • Added app and mob as valid platform codes, thanks @kinabalu! (#524)
  • Added support for remaining platform codes (#516)
  • Updated POJO in Scalding ETL to include new unstructured fields (#362)
  • Updated POJO in Scalding ETL to include name_tracker field (#595)
  • Extract evn from Tracker Protocol (#604)
  • Extract tna from Tracker Protocol (#616)
  • Extract and validate unstructured events (#142)
  • Extract and validate custom contexts (#426)
  • Reformat incoming event and context JSONs (#589)
  • Make sure to error a JSON if > length (#567)

EmrEtlRunner

  • Bumped to 0.6.0
  • Bumped Elasticity to 3.0.2 (#587)
  • Allowed AWS VPC selection in EmrEtlRunner (#581)
  • Set :visible_to_all_users to true for EMR jobs, thanks @smugryan! (#560)

Redshift

  • atomic-def script bumped to 0.3.0
  • Migration script added for 0.2.2 to 0.3.0
  • Added new unstructured fields to Redshift table definition (#361)
  • Changed distkey to be event_id, not domain_userid (#584)
  • Added raw page_url and page_referrer into Redshift table (#591)
  • Added name_tracker field to Redshift table (#594)
  • Converted Redshift varchar(38) for event IDs to char(36) (#282)

Postgres

  • atomic-def script bumped to 0.2.0
  • Migration script added for 0.1.x to 0.2.0
  • Added new unstructured fields to Postgres table definition (#359)
  • Added raw page_url and page_referrer into Postgres table (#592)
  • Added name_tracker field to Postgres table (#593)
  • Converted varchar(36) for event IDs to char(36) (#596)

StorageLoader

  • Bumped to 0.2.0
  • Added TIMEFORMAT 'auto' to StorageLoader to handle outlier dvce_timestamps (#427)

JavaScript Tracker

  • Bumped git submodule to 1.0.1 (#585)

Python Tracker

  • Added git submodule pointing to 0.1.0 (#586)

Snowplow v0.9.0

04 Feb 18:25
Compare
Choose a tag to compare

Releasing initial beta of Amazon Kinesis support.

Thrift Raw Event

  • Added. Version 0.1.0
  • Specified Thrift IDL for new raw event schema (#430)

Scala Stream Collector

  • Added. Version 0.1.0
  • Implemented new spray-can (Akka Http) Scala stream collector (#432)

Scala Kinesis Enrich

  • Added. Version 0.1.0
  • Implemented initial Kinesis-based enrichment (#460)

Scala Common Enrich

  • Bumped to 0.2.0
  • Added Thrift SnowplowRawEvent as a dependency to common-enrich (#475)
  • Added ability to read Thrift SnowplowRawEvent (Thrift) (#462)
  • Renamed CloudFront to Cloudfront in code (#495)
  • Renamed AnonQuartets to AnonOctets (#491)
  • Added raw -> CanonicalInput tests (#484)
  • Updated GET payload extraction to handle empty payloads (#502)

Git housekeeping

  • Changed git:// protocol in .gitmodules to https:// (#512)
  • Removed contrib-nodejs-collector from 2-collectors (#474)
  • Bumped JS Tracker submodule to 0.13.1 release (#511)

Snowplow v0.8.13

08 Jan 13:42
Compare
Choose a tag to compare

Releasing initial version of the Snowplow metadata model for Looker.

Looker Analytics

Snowplow v0.8.12

07 Jan 14:16
Compare
Choose a tag to compare

Various small improvements to our Scalding-based Enrichment process, plus some architectural re-work.

Scala Hadoop Enrich

  • Bumped to 0.3.6
  • Bumped to SBT 0.13.0 (#404)
  • Bumped to using sbt-assembly 0.10.1 (#421)
  • Bumped to Scala 2.10.3 (#423)
  • Bumped to Scalding 0.8.11 (#422)
  • Upgraded useragent utils to 1.11 & moved to Maven dependency (#416)
  • Added test running back into sbt-assembly step (#420)
  • Updated copyright messages to be Snowplow not SnowPlow, and to 2014 not 2013 (#419)
  • Added ValidatedString as a type to package.scala (#328)
  • Added missing validation to stringToJByte (#408)
  • Missing page URI no longer interpreted as bad row (#399)
  • Updated CfRegex to reflect Cfcs(Cookie) can be empty (#410)
  • Numeric fields in tr_ and ti_ now parsed to doubles, not madeTsvSafe strings (#400)
  • Moved ETL core into separate project scala-enrich-common (#417)

Scala Common Enrich

  • Updated ETL versioning to include host and common versions (#448)

Postgres

  • Bumped cube-pages.sql to 0.1.1
  • Minor fix: cube_pages.complete referenced non-existent table cube_pages.basic, thanks @mrwalker! (#414)