Releases: snowplow/snowplow
Snowplow v0.9.7
A "tidy-up" release which fixes some important bugs, particularly:
- A bug in 0.9.5 onwards which was preventing events containing multiple JSONs from being shredded successfully
- Our Hive table definition falling behind Snowplow 0.9.6's enriched event format updates
- A bug in EmrEtlRunner causing issues running Snowplow inside some VPC environments
Trackers
- Ruby Tracker: bumped git submodule to 0.3.0 (#939)
- Java Tracker: bumped git submodule to 0.5.1 (#948)
- Node.js Tracker: added git submodule. Version 0.1.0 (#949)
- Fixed broken git submodule links, thanks @OAGr! (#957)
EmrEtlRunner
- Bumped to 0.9.1
- Fixed @jobflow.ec2_subnet_id not being set due to incorrect guard, thanks @rslifka! (#956)
- Fixed bugs in --process-bucket (#973)
- Renamed --process-bucket option to --process-enrich (#972)
- Changed -s option for --skip to -x prevent clash with -s for --start (#975)
- Now allows shredding without prior enrichment (#927)
StorageLoader
- Bumped to 0.3.2
- Removed EMPTYASNULL for loading JSONs (#942)
- Added missing targetUrl field to ad_impression JSON Path file, thanks @gisripa! (#951)
- Made providing jsonpath_assets optional (#958)
- Added support for cross-region Redshift COPY (#971)
Hive Storage
- Bumped table-def.q to 0.2.0
- Added and removed fields to synchronize with 0.9.6's enriched event format (#965)
Scala Hadoop Shred
Snowplow v0.9.6
This release does four things:
- It fixes some important bugs discovered in Snowplow 0.9.5, related to our new shredding functionality
- It introduces new JSON-based configurations for Snowplow's existing enrichments
- It extends our geo-IP lookup enrichment to support all five of MaxMind's commercial databases
- It extends our referer-parsing enrichment to support a user-configurable list of internal domains
Java Tracker
- Bumped git submodule to 0.4.0 (#892)
EmrEtlRunner
- Bumped to 0.9.0
- Passed etl_tstamp into Hadoop Enrich as an argument (#396)
- Removed enrichment-specific code (#811)
- Removed enrichment-specific parameters from config.yml.sample (#809)
- Replaced enrichment-specific arguments from EmrEtlRunner (#808)
- Removed %3D code following Scalding upgrade (#849)
- Fixed contract on partition_by_run (#894)
- Updated Bash script to support enrichments path (#916)
StorageLoader
- Bumped to 0.3.1
- Now looking in eu-west-1 region for s3://snowplow-hosted-assets (#895)
- Updated combined Bash script to support enrichments path (#917)
Scala Hadoop Enrich
- Bumped to 0.6.0
- Bumped Scala to 2.10.4 (#912)
- Bumped Scalding to 0.11.1 (#911)
- Bumped Hadoop to 1.2.1 (#913)
- Bumped to Scala Common Enrich 0.5.0 (#788)
- Passed etl_tstamp into Scala Common Enrich (#817)
- Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#835)
- Removed %3D handling for compatibility with old Scalding Args (#850)
- Added ability to download additional MaxMind databases (#885)
- Added runHadoop and Tool.main tests (#914)
Scala Common Enrich
- Bumped to 0.5.0
- Bumped user-agent-utils version, thanks @pkallos! (#662)
- Bumped referer-parser to 0.2.2 (#864)
- Bumped httpclient to 4.3.3 (#897)
- Bumped scala-maxmind-geoip to scala-maxmind-iplookups 0.1.0 (#882)
- Stored etl_tstamp in new field in CanonicalOutput (#818)
- Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#836)
- Made referer parsing configurable with list of internal domains (#857)
- Migrated configurable enrichments to new EnrichmentRegistry (#858)
- Added validation of enrichments JSON (#807)
- Replaced "anon_ip_quartets" with "anon_ip_octets" everywhere (#547)
- Added ability to extract event_id from querystring (#723)
- Extracted CanonicalInput's userId as network_userid, thanks @pkallos! (#855)
- Added MaxMind region_name field (#873)
- Added IP -> ISP lookup (#861)
- Added IP -> organization lookup (#887)
- Added IP -> domain lookup (#886)
- Added IP -> net speed lookup (#889)
- Added validation for transaction ID (#428)
- Renamed Tests to Specs for consistency (#618)
Scala Hadoop Shred
- Bumped to 0.2.0
- Bumped to Scala Common Enrich 0.5.0 (#918)
- Trailing empty fields no longer cause shredding for that row to fail (#921)
- Updated column offsets for enriched events TSV (#915)
Redshift
- Bumped table-def to 0.4.0
- Migration script added for 0.3.0 to 0.4.0
- Added etl_tstamp to atomic.events (#819)
- Removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#834)
- Added new MaxMind fields (#871)
- Applied runlength encoding to all fields keyed off IP address (#883)
- Migration script added for 0.3.0 to 0.4.0 (#838)
Postgres
Snowplow v0.9.5
Now validates incoming event and context JSONs (using JSON Schema), and then automatically shreds those JSONs into dedicated tables in Amazon Redshift.
Trackers
- Ruby Tracker: added git submodule. Version 0.1.0 (#645)
- Java Tracker: added git submodule. Version 0.2.0 (#843)
- JavaScript Tracker: bumped git submodule to 2.0.0 (#635)
- Python Tracker: bumped Python Tracker git submodule to 0.4.0 (#634)
Scala Hadoop Shred
- Added. Version 0.1.0
EmrEtlRunner
- Bumped to 0.8.0
- Updated S3DistCp steps to use new S3DistCpStep from Elasticity (#629)
- Added --skip s3distcp option (#313)
- Added ability to start Lingual in EmrEtlRunner (#623)
- Added ability to start HBase in EmrEtlRunner (#622)
- Improved load performance by switching ETL to write out to HDFS (#278)
- Now invoking Scala Hadoop Shredder after main job (#644)
- Added :iglu: section to config.yml for Scala Hadoop Shred (#814)
- Updated to run Scala Hadoop Shred following Hadoop Enrich (#815)
- Added --skip shred option (#659)
StorageLoader
- Bumped to 0.3.0
- Bumped Sluice to 0.2.1 (#881)
- Added initial Ruby.contracts support (#391)
- Updated config.yml to support shredding (#897)
- Added ACCEPTINVCHARS to StorageLoader (#411)
- Wrote JSON Path files for ad_* events (#642)
- Wrote JSON Path file for link_click (#599)
- Wrote JSON Path file for screen_view (#643)
- Wrote JSON Path file for schema.org's WebPage (#772)
- Added :jsonpath_assets: setting for StorageLoader (#606)
- Added ability to load custom tables using JSON Paths (#607)
- Added --skip shred option (#660)
- Added :in: hint on StorageLoader configuration, thanks @joaolcorreia! (#755)
Redshift
- Added Redshift DDL for ad_* events (#639)
- Added Redshift DDL for link_click events (#600)
- Added Redshift DDL for screen_view events (#640)
- Added Redshift DDL for schema.org's WebPage (#771)
Looker Analytics
Snowplow v0.9.4
Improvements to the Looker models bundled with Snowplow.
Looker Analytics
- New 'traffic_pulse' dashboard with globally configurable drill-down variables (#765)
- Snowplow website specific dimensions and metrics removed: base model is now company-generic (#764)
- Cleaner joining of data sets in Looker model (#763)
- Dimensions and metrics renamed to make it clearer for an analyst getting started with the data (#761)
- Added distkeys and sortkeys to derived tables to speed up query times (#696)
- Derived tables now auto-generated when new data is loaded into atomic.events (#688)
- 'visits' renamed to 'sessions' (#762)
- LookML models versioned using SchemaVer (#766)
Redshift
- Added reference_data.country_codes (#779)
Postgres
- Added reference_data.country_codes (#781)
Snowplow v0.9.3
Improvements to EmrEtlRunner and the Clojure Collector.
EmrEtlRunner
- Bumped to 0.7.0
- Bumped Sluice to 0.2.1 (#405)
- Bumped Elasticity to 3.0.4 (#665)
- Replaced hadoop_version setting with ami_version setting (#701)
- Fixed handling of region, placement and ec2_subnet_id (#754)
- Fixed regression where 0 files staged still kicks off EMR (#409)
- Stopped Sluice file operation threads being killed by folders (#401)
- Fixed disabling of Cascading error catching (#721)
- Renamed Clojure Collector log files in processing bucket to support multiple instances (#717)
- Added initial Ruby.contracts support into EmrEtlRunner (#392)
- Updated to use the Ruby Logger (#194)
- Updated so it's embeddable in other applications (#128)
- Added ability to bundle as a JRuby fat jar (#674)
- Added initial unit tests (#672)
Clojure Collector
- Bumped to 0.6.0
- Load balancer IP address getting stored in logs (#719)
Documentation
Snowplow v0.9.2
Rapid release to accommodate Amazon's April 29th update to the CloudFront Access Log file format
Scala Hadoop Enrich
- Bumped to 0.5.0
- Bumped to Scala Common Enrich 0.3.0 (#699)
- Bumped SBT to 0.13.2 (#702)
- Bumped to using using sbt-assembly 0.11.2 (#704)
Scala Common Enrich
- Bumped to 0.4.0
- Upgraded to support new and future CloudFront file formats (#698)
- Bumped SBT to 0.13.2 (#703)
Scala Hadoop Bad Rows
- Added. Version 0.1.0
Hive Storage
Snowplow v0.9.1
Initial support for custom unstructured events and custom contexts, plus a host of other small improvements.
Scala Hadoop Enrich
- Bumped to 0.4.0
- Bumped to Scala Common Enrich 0.3.0 (#497)
- Renamed AnonQuartets to AnonOctets (#498)
- Renamed all Snowplow Hadoop Tests to Specs (#515)
- Added page_url and page_referrer back into ETL's output (#483)
Scala Common Enrich
- Bumped to 0.3.0
- Bumped Argonaut to 6.0.3 (#620)
- Added app and mob as valid platform codes, thanks @kinabalu! (#524)
- Added support for remaining platform codes (#516)
- Updated POJO in Scalding ETL to include new unstructured fields (#362)
- Updated POJO in Scalding ETL to include name_tracker field (#595)
- Extract evn from Tracker Protocol (#604)
- Extract tna from Tracker Protocol (#616)
- Extract and validate unstructured events (#142)
- Extract and validate custom contexts (#426)
- Reformat incoming event and context JSONs (#589)
- Make sure to error a JSON if > length (#567)
EmrEtlRunner
- Bumped to 0.6.0
- Bumped Elasticity to 3.0.2 (#587)
- Allowed AWS VPC selection in EmrEtlRunner (#581)
- Set :visible_to_all_users to true for EMR jobs, thanks @smugryan! (#560)
Redshift
- atomic-def script bumped to 0.3.0
- Migration script added for 0.2.2 to 0.3.0
- Added new unstructured fields to Redshift table definition (#361)
- Changed distkey to be event_id, not domain_userid (#584)
- Added raw page_url and page_referrer into Redshift table (#591)
- Added name_tracker field to Redshift table (#594)
- Converted Redshift varchar(38) for event IDs to char(36) (#282)
Postgres
- atomic-def script bumped to 0.2.0
- Migration script added for 0.1.x to 0.2.0
- Added new unstructured fields to Postgres table definition (#359)
- Added raw page_url and page_referrer into Postgres table (#592)
- Added name_tracker field to Postgres table (#593)
- Converted varchar(36) for event IDs to char(36) (#596)
StorageLoader
- Bumped to 0.2.0
- Added TIMEFORMAT 'auto' to StorageLoader to handle outlier dvce_timestamps (#427)
JavaScript Tracker
- Bumped git submodule to 1.0.1 (#585)
Python Tracker
- Added git submodule pointing to 0.1.0 (#586)
Snowplow v0.9.0
Releasing initial beta of Amazon Kinesis support.
Thrift Raw Event
- Added. Version 0.1.0
- Specified Thrift IDL for new raw event schema (#430)
Scala Stream Collector
- Added. Version 0.1.0
- Implemented new spray-can (Akka Http) Scala stream collector (#432)
Scala Kinesis Enrich
- Added. Version 0.1.0
- Implemented initial Kinesis-based enrichment (#460)
Scala Common Enrich
- Bumped to 0.2.0
- Added Thrift SnowplowRawEvent as a dependency to common-enrich (#475)
- Added ability to read Thrift SnowplowRawEvent (Thrift) (#462)
- Renamed CloudFront to Cloudfront in code (#495)
- Renamed AnonQuartets to AnonOctets (#491)
- Added raw -> CanonicalInput tests (#484)
- Updated GET payload extraction to handle empty payloads (#502)
Git housekeeping
Snowplow v0.8.13
Releasing initial version of the Snowplow metadata model for Looker.
Looker Analytics
- Added 0.1.0
- Created Snowplow metadata model for Looker BI (www.looker.com) (#472)
Snowplow v0.8.12
Various small improvements to our Scalding-based Enrichment process, plus some architectural re-work.
Scala Hadoop Enrich
- Bumped to 0.3.6
- Bumped to SBT 0.13.0 (#404)
- Bumped to using sbt-assembly 0.10.1 (#421)
- Bumped to Scala 2.10.3 (#423)
- Bumped to Scalding 0.8.11 (#422)
- Upgraded useragent utils to 1.11 & moved to Maven dependency (#416)
- Added test running back into sbt-assembly step (#420)
- Updated copyright messages to be Snowplow not SnowPlow, and to 2014 not 2013 (#419)
- Added ValidatedString as a type to package.scala (#328)
- Added missing validation to stringToJByte (#408)
- Missing page URI no longer interpreted as bad row (#399)
- Updated CfRegex to reflect Cfcs(Cookie) can be empty (#410)
- Numeric fields in tr_ and ti_ now parsed to doubles, not madeTsvSafe strings (#400)
- Moved ETL core into separate project scala-enrich-common (#417)
Scala Common Enrich
- Updated ETL versioning to include host and common versions (#448)