Skip to content

Releases: snowplow/snowplow

Snowplow v0.8.11

22 Oct 12:56
Compare
Choose a tag to compare

Extensive ETL improvements, including adding support for the recent changes to the CloudFront access log file format.

Hadoop ETL

  • Bumped to 0.3.5
  • Added Argonaut 6.0 as a dependency (#342)
  • Added fromTimestamp to EventEnrichments (#340)
  • Added makeTsvSafe to ConversionUtils (#338)
  • Added JsonUtils (#323)
  • Added support for 3 and 4 return values from MapTransformer (#324)
  • Updated GetJsonPayload to use Argonaut and renamed to JsonPayload (#339)
  • Added ability to mask IP addresses in ETL (#309)
  • refr_ and page_ fields now stored raw (#374)
  • Defensively fixed raw spaces in page and referer URLs (#346)
  • Fixed regression, single-encoded %s logic didn't account for % itself (#347)
  • Added unit tests for fixTabsNewlines (#332)
  • Tests now report the failing CanonicalOutput field (#325)
  • Now handling all fields double-encoded as per CloudFront post-14-September (#348)
  • Added support for 21 Oct CloudFront access log format (#384)
  • Added truncation to refr_term (#379)
  • Added truncation to se_label (#394)
  • Made all prior ME.identity fields TSV-safe (#395)

EmrEtlRunner

  • Bumped to 0.5.0
  • Bumped Sluice to 0.1.5 (#96)
  • Bumped Elasticity to 2.6 (#345)
  • Enabled EMR Job Flow debugging for easier access to logs (#279)
  • ETL job no longer fails if there's no data for last run period (#296)
  • Empty processing dir check now works if dir contains 1 file (#326)
  • Added ability to mask IP addresses in ETL (#309)
  • Made the examples match what you get from git out of the box, thanks @shermozle (#331)

StorageLoader

  • Bumped to 0.1.1
  • Bumped Sluice to 0.1.5 (#96)
  • Fixed "" in fields acts as an escape character for Postgres, thanks @kingo55 (#329)
  • Added ability to --skip analyze (#335)
  • Moved VACUUM SORT ONLY to a --include step (#321)
  • Added COMPROWS to config and --include compupdate option (#344)
  • Changed Postgres VACUUM FULL to VACUUM (#357)
  • Added TRUNCATECOLUMNS for Redshift load (#360)
  • Added FILLRECORD to our Redshift COPY command (#380)

Postgres

  • Fixed error in recipes_basic.technology_mobile recipe (#397)

Snowplow v0.8.10

18 Oct 17:02
Compare
Choose a tag to compare

Adding recipes and cubes as SQL views for both Redshift and PostgreSQL. A few miscellaneous tidy-ups as well, see below for details.

Redshift

  • Bumped table-def to 0.2.2
  • Moved events table to a new atomic schema in atomic-def.sql (#301)
  • Added migration script for 0.2.1 to 0.2.2
  • Added SQL DDL to define Redshift recipes (#297)
  • Redshift: added SQL DDL to define Redshift cubes (#298)

Postgres

  • Bumped table-def to 0.1.1
  • Renamed table-def file to atomic-def.sql
  • Added migration script for 0.1.0 to 0.1.1
  • Moved NOT NULL constraint on event field to event_vendor field (#318)
  • Added SQL DDL to define Postgres recipes (#303)
  • Added SQL DDL to define Postgres cubes (#302)

Documentation

Snowplow v0.8.9

05 Sep 19:32
Compare
Choose a tag to compare

A release to handle the unannounced change which Amazon made to the CloudFront access log file format on 17th August (since reversed).

Hadoop ETL

  • Bumped to 0.3.4
  • Updated to handle singly-encoded %s in CloudFront querystring field (#333)

Snowplow v0.8.8

04 Aug 21:54
Compare
Choose a tag to compare

Adding Postgres support, re-adding HiveQL support, and also adding support for multiple storage targets.

Plus plenty of small improvements, bug fixes and simplifications.

JavaScript Tracker

  • Moved into own repo (#277)

Hadoop ETL

  • Bumped to 0.3.3
  • URL-decodes "%3D" to "=" to allow Hive-style directory names as arguments (#305)
  • Bumped referer-parser to 0.1.1 to fix java.lang.NullPointerException (#314)

EmrEtlRunner

  • Bumped to 0.4.0
  • Bumped Sluice to 0.0.7 (#299)
  • Removed :snowplow: section from config.yml.sample (#289)
  • Simplified EmrEtlRunner and its config (#287)
  • Added run= to timestamped ETL folder names (#294)
  • Updated "Jobflow started" stdout message to include jobflow ID (#315)

Hive ETL

  • Removed folder 3-enrich/hive-etl as no longer supported (#286)

Hive storage

  • Updated hive-storage scripts to work with current Redshift-format flatfile (#290)

Infobright storage

  • Rremoved folder 4-storage/infobright as not currently supported (#285)

Postgres storage

  • Added Postgres table definition in atomic schema (#160)

StorageLoader

  • Bumped to 0.1.0
  • Bumped Sluice 0.0.7 (#300)
  • Removed code to delete Hive ETL's empty event files (#306)
  • Fixed bug where download path has to be set (even when using Redshift) (#280)
  • Optimized ANALYZE and VACUUM commands (#283)
  • Added MAXERROR as StorageLoader configuration value for Redshift (#273)
  • Added support for loading Postgres (#161)
  • Removed Infobright loading capability (#307)
  • Added support for loading into multiple storage targets (#311)

Snowplow v0.8.7

07 Jul 15:03
Compare
Choose a tag to compare

Predominantly bug fixes and tweaks to the JavaScript Tracker. Note that this is the last release where the JavaScript Tracker is part of the main snowplow/snowplow repository - it will shortly be moved into its own repo.

JavaScript Tracker

  • Bumped to 0.12.0
  • Fixed document reference to use documentAlias (#247)
  • Fixed bug with setCustomUrl (#267)
  • Changed ev_ to se_ for structured events (#197)
  • Fixed Firefox failure when "Always ask" set for cookies (#163)
  • Fixed bug in page ping functionality detected in IE 8 (#260)
  • Replaced forEach as not supported in IE 6-8 (#295)

EmrEtlRunner

  • Fixed bug in config.yml.sample (#291)

Arduino Tracker

  • Added git submodule link (#292)