Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
Branch: master
Clone or download
suvasude and htran1 [GOBBLIN-682] Create a new constructor for DatasetCleanerJob.[]
Closes #2554 from sv2000/datasetCleanerJob
Latest commit 670a7d0 Feb 21, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github [GOBBLIN-168] Standardize Github PR template for Gobblin Jul 27, 2017
bin [GOBBLIN-545] Updating gobblin mapreduce dependecies Jul 30, 2018
buildSrc/src/main/groovy/org/apache/gobblin/gradle Updated package names, imports and shell scripts Jul 31, 2017
conf [GOBBLIN-453][GOBBLIN-452][GOBBLIN-190] Make the GAAS rest port confi… Jun 14, 2018
config/checkstyle [GOBBLIN-219] Added copyright header check and enforced it for existi… Aug 23, 2017
dev [GOBBLIN-578] Comparison to None should be 'expr is None' Sep 8, 2018
gobblin-admin [GOBBLIN-217][GOBBLIN-191][GOBBLIN-213] Fix gobblin-admin module to u… Aug 21, 2017
gobblin-api [GOBBLIN-676] Add record metadata support to the RecordEnvelope Feb 6, 2019
gobblin-audit Change package from gobblin to org.apache.gobblin for stray reference… Jul 31, 2017
gobblin-aws [GOBBLIN-533] upgrade helix to 0.8.1 Jul 13, 2018
gobblin-cluster [GOBBLIN-679] Refactor GobblinHelixTask metrics Feb 11, 2019
gobblin-compaction [GOBBLIN-561] Handle data completeness checks for data partitions wit… Aug 21, 2018
gobblin-config-management [GOBBLIN-609] Change config store to append to path part of URI Oct 10, 2018
gobblin-core-base [GOBBLIN-686] Enhance schema comparison Feb 20, 2019
gobblin-core [GOBBLIN-671] Close the underlying writer when a HiveWritableHdfsData… Jan 25, 2019
gobblin-data-management [GOBBLIN-645] Fix some typos as reading thru code Dec 6, 2018
gobblin-distribution [GOBBLIN-17] Add Elasticsearch writer (rest + transport) Sep 1, 2018
gobblin-docker Updated docker files to use org.apache.gobblin packages Jul 31, 2017
gobblin-docs [GOBBLIN-669] Configuration Properties Glossary section of Docs hard … Feb 21, 2019
gobblin-example [GOBBLIN-17] Add Elasticsearch writer (rest + transport) Sep 1, 2018
gobblin-hive-registration [GOBBLIN-583] Add support for writing the schema file using a staging… Sep 12, 2018
gobblin-metastore [GOBBLIN-666] Data too long for column 'property_key' Feb 21, 2019
gobblin-metrics-libs [GOBBLIN-673] Implement a FS based JobStatusRetriever for GaaS Flows. Feb 5, 2019
gobblin-modules [GOBBLIN-682] Create a new constructor for DatasetCleanerJob.[] Feb 21, 2019
gobblin-oozie/src/test/resources [GOBBLIN-222] Fix silent failure for loading incompatible state-store Aug 29, 2017
gobblin-rest-service [GOBBLIN-355] Add git scripts to publish to Nexus and generate signed… Jan 3, 2018
gobblin-restli [GOBBLIN-681] increase max size of job name Feb 8, 2019
gobblin-runtime-hadoop [GOBBLIN-186] Add support for using the Kerberos authentication plugi… Aug 9, 2017
gobblin-runtime [GOBBLIN-684] Ensure buffered messages are flushed before close() in K… Feb 20, 2019
gobblin-salesforce [GOBBLIN-587] Implement partition level lineage for fs based destination Sep 14, 2018
gobblin-service [GOBBLIN-687] Pass TopologySpec map to DagManager to allow reuse of Sp… Feb 20, 2019
gobblin-test-harness [GOBBLIN-680] Enhance error handling on task creation Feb 8, 2019
gobblin-test-utils [GOBBLIN-17] Add Elasticsearch writer (rest + transport) Sep 1, 2018
gobblin-test/resource Changed package from gobblin to org.apache.gobblin in docs and pull f… Jul 31, 2017
gobblin-tunnel [GOBBLIN-437] Disable flaky tests for distribution as well (which are… Mar 23, 2018
gobblin-utility [GOBBLIN-686] Enhance schema comparison Feb 20, 2019
gobblin-yarn [GOBBLIN-437] Disable flaky tests for distribution as well (which are… Mar 23, 2018
gradle [GOBBLIN-610] Add support for secure access to Git in GitMonitoringSe… Oct 15, 2018
ligradle/findbugs Change package from gobblin to org.apache.gobblin for stray reference… Jul 31, 2017
maven-nexus [GOBBLIN-355] Add helper script to publish archives on Nexus, and wir… Jan 3, 2018
maven-sonatype [GOBBLIN-569] Address the pylint warnings in github-pr-change-log.py Sep 4, 2018
travis [GOBBLIN-563] Upgrade to gradle 4.x Aug 15, 2018
.gitignore [GOBBLIN-485] AvroSchemaManager does not support using schema generat… May 3, 2018
.travis.yml Upgrade to java 8. (#1842) May 17, 2017
CHANGELOG.md [GOBBLIN-641] Updated CHANGELOG for 0.14.0 release Nov 30, 2018
DISCLAIMER [GOBBLIN-355] Added DISCLAIMER for Apache Incubation Jan 3, 2018
HEADER [GOBBLIN-355] Add HEADER as per release process Jan 3, 2018
LICENSE Add GLYPHICONS Halflings license reference in LICENSE file Jul 2, 2018
NOTICE Update NOTICE to comply with ASF policy Jun 20, 2018
README.md [GOBBLIN-669] Configuration Properties Glossary section of Docs hard … Feb 21, 2019
build.gradle [GOBBLIN-563] Upgrade to gradle 4.x Aug 15, 2018
defaultEnvironment.gradle Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gobblin-flavored-build.gradle Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gradle.properties [GOBBLIN-641] Reserve version 0.15.0 for next release Nov 30, 2018
gradlew [GOBBLIN-563] Upgrade to gradle 4.x Aug 15, 2018
gradlew.bat [GOBBLIN-563] Upgrade to gradle 4.x Aug 15, 2018
mkdocs.yml [GOBBLIN-482] Add http write documentation May 4, 2018
query_github_issues.py [GOBBLIN-577] pep-0020 - Readability counts Sep 10, 2018
readthedocs.yml Initial commit for mkdocs and readthedocs integration Mar 9, 2016
settings.gradle Adding service module Jan 31, 2017

README.md

Apache Gobblin Build Status Documentation Status

Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop. Apache Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc. Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.

Requirements

  • Java >= 1.8
  • gradle-wrapper.jar version 2.13

If building the distribution with tests turned on:

  • Maven version 3.5.3

Instructions to download gradle wrapper

Run the following command for downloading the gradle-wrapper.jar from Gobblin git repository to gradle/wrapper directory.

wget --no-check-certificate -P gradle/wrapper https://github.com/apache/incubator-gobblin/raw/0.12.0/gradle/wrapper/gradle-wrapper.jar (or) curl --insecure -L https://github.com/apache/incubator-gobblin/raw/0.12.0/gradle/wrapper/gradle-wrapper.jar > gradle/wrapper/gradle-wrapper.jar

Alternatively, you can download it manually from: https://github.com/apache/incubator-gobblin/blob/0.12.0/gradle/wrapper/gradle-wrapper.jar

Make sure that you download it to gradle/wrapper directory.

Instructions to run Apache RAT (Release Audit Tool)

  1. Extract the archive file to your local directory.
  2. Download gradle-wrapper.jar (version 2.13) and place it in the gradle/wrapper folder. See 'Instructions to download gradle wrapper' above.
  3. Run ./gradlew rat. Report will be generated under build/rat/rat-report.html

Instructions to build the distribution

  1. Extract the archive file to your local directory.
  2. Download gradle-wrapper.jar (version 2.13) and place it in the gradle/wrapper folder. See 'Instructions to download gradle wrapper' above.
  3. Skip tests and build the distribution: Run ./gradlew build -x findbugsMain -x test -x rat -x checkstyleMain The distribution will be created in build/gobblin-distribution/distributions directory. (or)
  4. Run tests and build the distribution (requires Maven): Run ./gradlew build The distribution will be created in build/gobblin-distribution/distributions directory.

Quick Links

  • Documentation: Check out the Gobblin documentation for a complete description of Gobblin's features
  • Powered By: Check out the list of companies known to use Gobblin
  • Architecture: The Gobblin Architecture page has a full explanation of Gobblin's architecture
  • Getting Started with Gobblin: Refer to the Getting Started Guide on how to get started with Gobblin
  • Building Gobblin (from master branch): Refer to the page Building Gobblin for directions on how to build Gobblin
  • Javadocs: The full JavaDocs for each released version of Gobblin can be found here
  • Gobblin chat room: Gitter chat room for Gobblin developers and users here
  • Gobblin Issue Tracker can be found here