Skip to content

Releases: snowplow/snowplow

Release 92 Maiden Castle (2017-09-11)

11 Sep 08:58
Compare
Choose a tag to compare

Improving EmrEtlRunner

EmrEtlRunner

  • Release lock in case of no-op (#3396)
  • Treat archive_enriched and archive_shredded as separate steps (#3401)
  • Do not pass --skip shred to RDB Loader when skipping RDB Shredder (#3403)
  • If RDB Loader step hangs and is cancelled, logs are not retrieved (#3399)
  • Ensure appropriate log level for RDB logs (#3369)
  • Unlink downloaded RDB logs (#3363)
  • Do not try to download non-existent RDB loader log files (#3405)
  • Rescue the intermittent RestClient::SSLCertificateNotVerified error (#2572)
  • Pass GZIP compression argument to S3DistCp as "gz" not "gzip" (#3415)
  • Update rdb_loader version in config.yml.sample to 0.13.0 (#3418)
  • Bump to 0.28.0 (#3404)

Documentation

  • Fix broken links in storage/postgres's README.md (#3390)

RDB Shredder and Loader

Release 91 Stonehenge (2017-08-17)

17 Aug 10:48
Compare
Choose a tag to compare

EmrEtlRunner robustness.

EmrEtlRunner

  • Use S3DistCp not Sluice for staging step (#276)
  • Add an S3DistCp step for the _SUCCESS file produced by RDB Shredder (#3137)
  • Add step to delete raw events from HDFS before shredding (#2545)
  • Use S3DistCp to move raw files from S3 to HDFS for all collector formats (#3136)
  • Add file- and Consul-based locking mechanism (#3352)
  • Move current behavior into a run command (#3104)
  • Add lint command which validates Iglu resolver and enrichments (#1946)
  • Add backend for a generate command (#3105)
  • Add --resume-from option (#3128)
  • Remove support for --start and --end flags (#3132)
  • Remove support for --process-enrich and --process-shred flags (#3365)
  • Handle run= sub-folders if resuming from shred (#2693)
  • Add "ongoing run" message on exit with return code 4 (#3129)
  • Add "no logs to process" message on exit with return code 3 (#2644)
  • Retrieve RDB loader logs only when it failed or the entire run was successful (#3361)
  • Bump rspec to 3.5.0 (#3116)
  • Bump to 0.27.0 (#3358)

Release 90 Lascaux (2017-07-26)

26 Jul 16:05
Compare
Choose a tag to compare

StorageLoader reboot.

Common

  • Update CI/CD to push S3 artifacts to all regional Hosted Assets buckets (#3242)
  • Add CI/CD to deploy RDB Loader to Snowplow Hosted Assets (#3025)
  • No longer bundle StorageLoader in Bintray download (#3024)

Event Manifest Populator

  • Support pre-R83 enriched events (#3293)
  • Bump to 0.1.1 (#3295)

EmrEtlRunner

  • Make targets loading consistent with enrichments (#3268)
  • Expose arbitrary EMR configuration options (#3255)
  • Add maximizeResourceAllocation option to EMR cluster configuration (#3253)
  • Move max attempts configuration to EMR cluster configuration (#3246)
  • Use Elasticity to specify Thrift-specific configuration (#3252)
  • Bump elasticity version to 6.0.12 (#3249)
  • Remove storage.download from config.yml.sample (#3265)
  • Add rdb_loader to config.yml.sample (#3266)
  • Add S3DistCp step to move enriched and shredded files to archive (#1777)
  • Add RDB Loader step for each target (#3121)
  • Bump to 0.26.0 (#3254)

RDB Loader

  • Remove StorageLoader (#3026)
  • Accept storage target JSONs on command-line (#3022)
  • Rewrite StorageLoader in Scala, removing file archiving step (#3023)
  • Fix eventual consistency problem (#3113)
  • Load all runs from shredded, not just the first run found (#2962)
  • Remove compupdate step (#3178)
  • Add logging around database load, analyze and vacuum (#2935)
  • Use Redshift-specific driver to connect to Redshift (#1830)

Storage

  • Storage: replace example Redshift storage target configuration with 2-0-0 (#3281)

Trackers

  • Java Tracker: bump git submodule to 0.8.2 (#3260)
  • Ruby Tracker: bump git submodule to 0.6.1 (#3264)
  • NET Tracker: bump git submodule to 1.0.2 (#3258)
  • Python Tracker: bump git submodule to 0.8.0 (#3263)
  • Golang Tracker: bump git submodule to 1.1.0 (#3259)
  • Node.js Tracker: bump git submodule to 0.3.0 (#3262)
  • Android Tracker: bump git submodule to 0.6.2 (#3257)
  • JavaScript Tracker: bump git submodule to 2.8.0 (#3261)

Release 89 Plain of Jars (2017-06-12)

12 Jun 09:20
Compare
Choose a tag to compare

Ports the Snowplow batch pipeline to Spark.

Documentation

  • Fix incorrect hyphen underlining for R88 (#3198)

Common

  • Refactor CI/CD deploy scripts into one (#3100)
  • Update CI/CD to deploy Spark Enrich (#3069)
  • Refactor CI/CD is release tag scripts into one (#3101)
  • Update CI/CD to deploy RDB Shredder (#3038)
  • Fix travis build due to the changes to the precise image (#3210)
  • Build local Scala Common Enrich before publishing Kinesis-related artifacts (#3220)
  • Add Sonatype credentials to .travis.yml (#3217)
  • Bump Scala to 2.11 in .travis.yml (#3227)

Scala Common Enrich

  • Bump to 0.25.0 (#3089)
  • Bump scala-iglu-client to 0.5.0 (#3092)
  • Remove scala-util (#3054)
  • Get rid of deprecated erasure method calls (#3008)
  • Bump scalaz to 7.0.9 (#3055)
  • Bump scalding-args to 0.13.0 (#3058)
  • Bump specs2 to 2.3.13 (#3059)
  • Bump scalaz-specs2 to 0.2 (#3060)
  • Bump scala-forex to 0.5.0 (#3057)
  • Bump sbt to 0.13.13 (closes #3056)
  • Bump Scala to 2.11.11 (#3007)
  • Add Scala 2.11 cross-building (#3061)
  • Make EnrichedEvent Serializable (#3081)
  • Fix failing WeatherEnrichmentSpec expectation (#3205)
  • Remove ScalazArgs (#3209)
  • Upgrade to Java 8 (#3212)
  • Add CI/CD (#3216)

Spark Enrich

  • Bump to 1.9.0 (#3072)
  • Rename from Scala Hadoop Enrich (#3064)
  • Change the package from hadoop to spark (#3076)
  • Bump sbt-assembly to 0.14.3 (#3078)
  • Bump SBT to 0.13.13 (#3065)
  • Port from Scalding to Spark (#3067)
  • Bump scala-common-enrich to 0.25 (#3096)
  • Bump Scalaz to 7.0.9 (#3097)
  • Bump iglu-scala-client to 0.5.0 (#3098)
  • Bump specs2-core to 2.3.13 (#3099)
  • Bump Scala version to 2.11 (#3070)
  • Upgrade to Java 8 (#2381)
  • Fix SqlQueryEnrichmentCfLinesSpec (#3224)
  • Fix CurrencyConversionTransactionSpec (#3225)
  • Run the unit tests systematically in Travis (#3228)

EmrEltRunner

  • Bump to 0.25.0 (#3039)
  • Update to run Spark Enrich instead of Scala Hadoop Enrich (#3066)
  • Update to run RDB Shredder instead of Scala Hadoop Shred (#3033)
  • Add ability to run Spark jobs (#641)
  • Replace hadoop_shred in config.yml.sample with rdb_shredder (#3035)
  • Bump elasticity version to 6.0.11 (#3053)
  • Use the Scalding step provided by Elasticity (#3052)
  • Replace hadoop_enrich in config.yml.sample with spark_enrich (#3068)
  • Bump AMI version in example config to 5.5.0 (#3207)

RDB Shredder

  • Bump to 0.12.0 (#3042)
  • Rename from Scala Hadoop Shred (#3031)
  • Move from 3-enrich to 4-storage (#3032)
  • Change the package to storage from enrich (#3036)
  • Port from Scalding to Spark (#3034)
  • Bump scala-common-enrich to 0.25 (#3091)
  • Bump iglu-scala-client to 0.5.0 (#3090)
  • Bump specs2-core to 2.3.13 (#3093)
  • Bump Scala version to 2.11 (#3071)
  • Upgrade to Java 8 (#3213)
  • Run the unit tests systematically in Travis (#3229)

StorageLoader

  • Bump to 0.11.0 (#3214)
  • Add support for Spark-based Shredder's directory structure (#3044)

Release 88 Angkor Wat (2017-04-27)

27 Apr 23:45
Compare
Choose a tag to compare

Introduces event de-duplication across different pipeline runs, powered by DynamoDB, along with an important refactoring of the batch pipeline configuration

Documentation

Documentation: fix incorrect release date for R87 (#3126)

Common

  • Update copyright years in README (#3148)
  • Add CI/CD for EmrEtlRunner and StorageLoader (#3102)
  • Add CI/CD for Event Manifest Populator (#3170)
  • Add AWS staging credentials to .travis.yml (#3114)
  • Update script to sync ap-northeast-2 (Seoul) Snowplow Hosted Assets bucket (#3160)
  • Update READMEs markdown in according with CommonMark (#3157)

Event Manifest Populator

  • Add Spark job to backpopulate DynamoDB duplicate storage (#3158)

Scala Common Enrich

  • Bump to 0.24.1 (#3155)
  • Fix failing WeatherEnrichmentSpec expectation (#3154)

Scala Hadoop Shred

  • Bump to 0.11.0 (#3041)
  • Bump sbt-assembly to 0.14.4 (#3140)
  • Bump SBT to 0.13.13 (#2972)
  • Remove explicit jackson-databind dependency (#3138)
  • Add cross-batch natural deduplication (#2999)

Storage

  • Add example storage target configuration JSONs (#2990)

StorageLoader

  • Bump to 0.10.0 (#3109)
  • Remove Northern Virginia endpoint for Postgres load (#3143)
  • Handle return code of 4 for EmrEtlRunner in snowplow-runner-and-loader.sh (#3139)
  • Use storage target JSONs instead of targets section in config.yml (#2992)
  • Replace table configuration property with schema (#2458)

EmrEtlRunner

  • Bump to 0.24.0 (#3040)
  • Update hadoop_shred version in config.yml.sample to 0.11.0 (#3197)
  • Add script to convert config.yml targets section into JSON format (#3135)
  • Remove targets section from config.yml.sample (#2989)
  • No longer use sources property when loading Elasticsearch (#2993)
  • Use storage target JSONs instead of targets section in config.yml (#2991)

R87 Chichen Itza

21 Feb 23:24
Compare
Choose a tag to compare

New features, stability enhancements and performance improvements for EmrEtlRunner and StorageLoader. As of this release EmrEtlRunner lets you specify EBS volumes for your Hadoop worker nodes; meanwhile StorageLoader now writes to a dedicated manifest table to record each load

EmrEtlRunner

  • Bump to 0.23.0 (#2960)
  • Bump JRuby version to 9.1.6.0 (#3050)
  • Bump Elasticity to 6.0.10 (#3013)
  • Remove AnonIpHash from contracts.rb (#2523)
  • Remove UnmatchedLzoFilesError check (#2740)
  • Use S3DistCp not Sluice for archive_raw step (#1977)
  • Add warning about the array of in buckets in config.yml (#2462)
  • Add dedicated return code of 4 for DirectoryNotEmptyError (#2546)
  • Add support for specifying EBS for Hadoop workers (#2950)
  • Add example EBS configuration to config.yml.sample (#3012)
  • Catch Elasticity ThrottlingExceptions while waiting for EMR (#3028)
  • Catch Elasticity ArgumentErrors while waiting for EMR (#3027)

StorageLoader

  • Bump to 0.9.0 (#2961)
  • Bump JRuby version to 9.1.6.0 (#3051)
  • Fix typo in S3Tasks.download_events (#2888)
  • Update manifest table as part of Redshift load transaction (#2280)

Redshift

  • Added manifest table (#2265)

Release 86 Petra

20 Dec 13:21
Compare
Choose a tag to compare

Brings in-batch synthetic deduplication and data-modeling improvements.

Common

  • Add AWS credentials to .travis.yml (#2963)
  • Add CI/CD for Scala Hadoop Enrich (#2982)
  • Add CI/CD for Scala Hadoop Shred (#2928)
  • Migrate Hadoop Event Recovery deployment to Release Manager (#2983)
  • Remove short-hostname addon from travis.yml (#2674)
  • Update script to sync us-east-2 (Ohio) Snowplow Hosted Assets bucket (#2986)
  • Update script to sync ca-central-1 (Montreal) Snowplow Hosted Assets bucket (#3004)
  • Update script to sync eu-west-2 (London) Snowplow Hosted Assets bucket (#3005)
  • Use AWS environment variables to sync Snowplow Hosted Assets buckets (#2985)

Scala Hadoop Shred

  • Bump to 0.10.0 (#2979)
  • Add general top-level exception handling (#2071)
  • Get the CustomPartitionSourceTest working with Hadoop 2.4 (#1960)
  • Fix omitted string interpolation (#2562)
  • Deduplicate event_ids with different event_fingerprints (synthetic duplicates) (#24)
  • Stop catching fatal errors (#1456)

Data Modeling

  • Add drill fields to web block (#2956)
  • Resolve issues with web model (#2954)
  • Restrict table scan on deduplication queries (#2929)
  • Add web model (#2925)
  • Delete example models (#2836)
  • Remove outdated recipes (#2626)

EmrEtlRunner

  • Update hadoop_shred version in config.yml.sample to 0.10.0 (#3003)

Release 85 Metamorphosis

15 Nov 18:04
Compare
Choose a tag to compare

One of our hackathon projects at our Berlin company away-week: initial Kafka support for Snowplow

Scala Stream Collector

  • Scala Stream Collector: bump to 0.9.0 (#2936)
  • Scala Stream Collector: add Kafka sink (#2937)
  • Scala Stream Collector: update config.hocon.sample to support Kafka (#2943)
  • Scala Stream Collector: move sink.kinesis.buffer to sink.buffer in config.hocon.sample (#2938)

Stream Enrich

  • Stream Enrich: bump to 0.10.0 (#2942)
  • Stream Enrich: add Kafka sink (#2939)
  • Stream Enrich: add Kafka source (#2941)
  • Stream Enrich: update config.hocon.sample to support Kafka (#2940)
  • Stream Enrich: fix incorrect parsing of S3 urls (#2921)

Release 84 Steller's Sea Eagle

07 Oct 15:24
Compare
Choose a tag to compare

Brings support for Elasticsearch 2.x to the Kinesis Elasticsearch Sink for both Transport and HTTP clients

Common

  • Common: standardise sbt-assembly settings (#2900)
  • Common: refactor Kinesis release CI/CD (#2887)
  • Common: update script to sync ap-south-1 (Mumbai) Snowplow Hosted Assets bucket (#2903)

Scala Stream Collector

  • Scala Stream Collector: bump to 0.8.0 (#2886)
  • Scala Stream Collector: add scala_ into artifact filename in Bintray (#2843)
  • Scala Stream Collector: use nuid query parameter value to set the 3rd party network id cookie (#2512)
  • Scala Stream Collector: configurable cookie path (#2528)
  • Scala Stream Collector: call Config.resolve() to resolve environment variables in hocon (#2879)

Stream Enrich

  • Stream Enrich: bump to 0.9.0 (#2728)
  • Stream Enrich: bump Scala Tracker to 0.3.0 (#2898)
  • Stream Enrich: bump Scala Common Enrich to 0.24.0 (#2729)
  • Stream Enrich: tolerate trailing slashes for paths in IP Lookups Enrichment configuration (#2744)
  • Stream Enrich: call Config.resolve() to resolve environment variables in hocon (#2878)

Kinesis Elasticsearch Sink

  • Kinesis Elasticsearch Sink: bump to 0.8.0 (#2885)
  • Kinesis Elasticsearch Sink: bump Scala Tracker to 0.3.0 (#2899)
  • Kinesis Elasticsearch Sink: allow parametrized timeouts for jest client (#2897)
  • Kinesis Elasticsearch Sink: does not take into account buffer configurations (#2895)
  • Kinesis Elasticsearch Sink: error messages are not helpful (#2896)
  • Kinesis Elasticsearch Sink: ensure field names do not contain any dots (#2894)
  • Kinesis Elasticsearch Sink: add support for Elasticsearch 2.x (#2525)
  • Kinesis Elasticsearch Sink: call Config.resolve() to resolve environment variables in hocon (#2880)

Redshift

  • StorageLoader: remove all JSON Path files (#2905)
  • Redshift: remove all Redshift DDL for Iglu Central schemas (#2904)

Release 83 Bald Eagle

06 Sep 21:38
Compare
Choose a tag to compare

Introduces our powerful new SQL Query Enrichment, long-awaited support for the EU Frankfurt AWS region, plus POST support for our Iglu webhook adapter

Scala Tracker

  • Bump git submodule to 0.3.0 (#2726)

ActionScript 3.0 Tracker

  • Bump git submodule to 0.3.0 (#2727)

Scala Common Enrich

  • Bump to 0.24.0 (#2715)
  • Add SQL Query Enrichment (#2321)
  • Add POST support to IgluAdapter (#1184)

Scala Hadoop Enrich

  • Bump to 1.8.0 (#2716)
  • Bump Scala Common Enrich to 0.24.0 (#2717)
  • Add test for SQL Query Enrichment (#2718)
  • Make resolver config in JobSpecHelpers injectable (#2825)

EmrEtlRunner

  • Bump to 0.22.0 (#2784)
  • Bump Ruby version to 2.2.3 (#2869)
  • Bump Sluice to 0.4.0 (#1708)
  • Bump Contracts to 0.9 (#2789)
  • Rebuild Gemfile.lock (#2872)
  • Add version recognition of currently installed commons-codec (#2735)
  • Update snowplow-ami4-bootstrap.sh to take optional commons-codec version argument (#2713)
  • Fix bug with double compression in shred step if enrich skipped (#2586)
  • Pass GZIP compression argument to S3DistCp as "gz" not "gzip" (#2679)
  • Update hadoop_enrich version in config.yml.sample to 1.8.0 (#2756)
  • Replace deprecated Dir.exists? with Dir.exist? (#2799)
  • Fix contract for fatal_with (#2810)
  • Use region-specific Snowplow Hosted Assets buckets (#2813)
  • Disable contract on build_fix_filenames due to Contracts issue #238 (#2828)

Storage

  • Add Kinesis S3 git submodule (#2706)

StorageLoader

  • Bump to 0.8.0 (#2785)
  • Bump Ruby version to 2.2.3 (#2870)
  • Bump Sluice to 0.4.0 (#2786)
  • Bump Contracts to 0.9 (#2790)
  • Add explicit mime-types dependency (#2805)
  • Rebuild Gemfile.lock (#2871)
  • Use Northern Virginia endpoint not global endpoint for us-east-1 (#2748)
  • Replace module_function everywhere with self (#2801)
  • Fix broken contracts (#2461)
  • Write JSON path for com.amazon.aws.lambda/s3_notification_event (#2590)
  • Write JSON path for com.snowplowanalytics.snowplow/application_foreground/jsonschema/1-0-0 (#2857)
  • Write JSON path for com.snowplowanalytics.snowplow/application_background/jsonschema/1-0-0 (#2856)
  • Write JSON path for com.snowplowanalytics.snowplow/application_error/jsonschema/1-0-0 (#2855)

Redshift

  • Add Redshift DDL for com.snowplowanalytics.snowplow/application_foreground/jsonschema/1-0-0 (#2854)
  • Add Redshift DDL for com.snowplowanalytics.snowplow/application_background/jsonschema/1-0-0 (#2853)
  • Add Redshift DDL for com.snowplowanalytics.snowplow/application_error/jsonschema/1-0-0 (#2852)
  • Add Redshift DDL for com.amazon.aws.lambda/s3_notification_event/jsonschema/1-0-0 (#2589)