Permalink
Commits on Oct 30, 2014
  1. PARQUET-119: add data_encodings to ColumnMetaData to enable dictionar…

    …y based predicate push down
    
    To implement predicate push down based on dictionary we need to know if fallback happened.
    If all data pages are dictionary encoded we can use the dictionary for predicate-push down.
    If not we can not.
    
    CC @nongli @rdblue @isnotinvain @tsdeng
    
    Author: julien <julien@twitter.com>
    
    Closes #16 from julienledem/data_encodings and squashes the following commits:
    
    3a60c6c [julien] typo
    46f7b7a [julien] update to stats based on feedback
    6474f58 [julien] Merge branch 'master' into data_encodings
    3529ccf [julien] make data_encodings optional
    709dd7c [julien] add data_encodings to ColumnMetaData to enable dictionary based predicate push down
    julienledem committed Oct 30, 2014
Commits on Oct 24, 2014
  1. PARQUET-24: port changes form parquet_mr

    Author: julien <julien@twitter.com>
    
    Closes #14 from julienledem/PARQUET_24 and squashes the following commits:
    
    a0efb8f [julien] port changes form parquet_mr
    julienledem committed Oct 24, 2014
Commits on Oct 1, 2014
  1. PARQUET-109: Update NOTICE, add binary LICENSE.

    This makes a minor change to NOTICE to match the maven-generated project
    name (Apache Parquet => Apache Parquet Format (Incubating)). There
    should be no more additions to NOTICE needed. The Thrift NOTICE has no
    additions beyond the standard Apache boilerplate and SLF4J has no NOTICE
    file. **Please double-check this**
    
    This also adds a LICENSE file in the resources that is included in the
    binary distribution. This LICENSE represents the contents of the binary
    distribution and includes the QOS.ch BSD license for SLF4J. There are no
    other licenses required. Although the LICENSE.txt in the thrift binary
    that is shaded includes other licenses, the files that they apply to are
    not included in either source or binary form because the shaded jar
    contains only the Thrift Java library. **Please double-check this**
    
    This also removes the Thrift NOTICE.txt and LICENSE.txt files from the
    binary distribution. The parquet-format NOTICE and LICENSE files
    represent the contents of the jar.
    
    Author: Ryan Blue <rblue@cloudera.com>
    
    Closes #15 from rdblue/PARQUET-109-licensing-fixes and squashes the following commits:
    
    3b34771 [Ryan Blue] PARQUET-109: Update NOTICE, add binary LICENSE.
    rdblue committed with julienledem Oct 1, 2014
Commits on Sep 16, 2014
  1. PARQUET-72: Fix NOTICE

    Author: Ryan Blue <rblue@cloudera.com>
    
    Closes #13 from rdblue/PARQUET-72-fix-notice and squashes the following commits:
    
    92d25c2 [Ryan Blue] PARQUET-72: Remove unnecessary entries in NOTICE.
    rdblue committed with julienledem Sep 16, 2014
Commits on Sep 10, 2014
  1. PARQUET-72: Add PGP keys to KEYS file.

    Author: Ryan Blue <rblue@cloudera.com>
    
    Closes #12 from rdblue/add-pgp-keys and squashes the following commits:
    
    2683b4e [Ryan Blue] PARQUET-72: Add PGP keys to KEYS file.
    rdblue committed with julienledem Sep 10, 2014
Commits on Sep 3, 2014
  1. PARQUET-72: Update POM to use ASF parent.

    Adds the org.apache:apache POM as the parent, which sets up the standard
    Apache release configuration, including repositories,
    distributionManagement, and a release profile. The existing Sonatype
    release info is still present and the maven-release-plugin is configured
    to use the Sonatype settings. To move to Apache, remove the
    maven-release-pluging configuration and the Sonatype profile,
    distributionManagement, and repositories.
    
    Removes settings that are supplied by the parent, like encoding
    properties and plugin versions. Overrides parent settings for compiler
    source and target versions. Updates scm links.
    
    Adds info for mailing lists and issue tracker.
    
    Author: Ryan Blue <rblue@cloudera.com>
    
    Closes #11 from rdblue/PARQUET-72-update-maven-for-release and squashes the following commits:
    
    b8b8048 [Ryan Blue] PARQUET-72: Update POM to use ASF parent.
    rdblue committed with julienledem Sep 3, 2014
  2. PARQUET-85: add license headers

    Author: julien <julien@twitter.com>
    
    Closes #10 from julienledem/fix_headers and squashes the following commits:
    
    e6922a0 [julien] add license headers
    julienledem committed with rdblue Sep 3, 2014
  3. PARQUET-72: Prepare for Apache release

    Add license headers and other documentation required by the ASF.
    
    This doesn't update the maven release configuration.
    
    Author: Ryan Blue <rblue@cloudera.com>
    
    Closes #6 from rdblue/PARQUET-72-prepare-apache-release and squashes the following commits:
    
    e48a607 [Ryan Blue] Adding NOTICE, DISCLAIMER, and KEYS.
    3d2ca06 [Ryan Blue] Add license headers and enable apache-rat-plugin.
    rdblue committed with julienledem Sep 3, 2014
Commits on Aug 30, 2014
  1. PARQUET-79: add a streaming Thrift API, to enable processing the meta…

    …data as we read it and skipping unnecessary fields.
    
    This pull request provides an API to read thrift in a streaming fashion.
    This enables ignoring fields that are not needed without loading them into memory.
    It also aloow treating the data as it comes instead of when it's fully loaded in memory.
    
    Author: julien <julien@twitter.com>
    
    Closes #8 from julienledem/streaming_metadata and squashes the following commits:
    
    621769a [julien] cleanup refactoring
    a58913d [julien] rename add to consume
    e5c78fc [julien] #simplify
    cb386ce [julien] RIP TypedConsumerProvider, @tsdeng did not like you
    8dd801e [julien] Merge branch 'master' into streaming_metadata
    958726f [julien] javadoc; fix apis
    9be786a [julien] added simple readMetaData method
    bee937a [julien] refactor, cleanup
    6368bdc [julien] streaming thrift reader
    71c85de [julien] first stab
    julienledem committed with Dmitriy Ryaboy Aug 30, 2014
Commits on Aug 28, 2014
  1. Typo

    s/metdata/metadata
    
    Author: Bruno P. Kinoshita <kinow@users.noreply.github.com>
    
    Closes #9 from kinow/patch-1 and squashes the following commits:
    
    266de2d [Bruno P. Kinoshita] Typo
    kinow committed with Dmitriy Ryaboy Aug 28, 2014
Commits on Aug 22, 2014
  1. PARQUET-11: Reduce memory pressure when reading footers

    based on https://github.com/apache/incubator-parquet-format/pull/2
    
    Author: julien <julien@twitter.com>
    Author: Dmitriy Ryaboy <dvryaboy@gmail.com>
    
    Closes #7 from julienledem/reduce_metadata_memory and squashes the following commits:
    
    96ff408 [julien] Merge branch 'master' into reduce_metadata_memory
    1c382cc [julien] implement delegate instead
    7664919 [Dmitriy Ryaboy] intern parquet metadata strings when reading them
    julienledem committed with tsdeng Aug 22, 2014
Commits on Aug 7, 2014
  1. Add anchor links

    I'm starting work on a Haskell implementation of the Parquet format and wanted to get up to speed on the various encodings used.
    
    I find that the current Encodings.md document is quite confusing.
    
    This is a first pass at some clean-up.
    
    There were multiple references in the text that made no sense, as they were references to sections which appear to have been re-ordered in the document.
    
    I've rewritten those references as anchor links within the document to avoid future issues.
    
    Author: Chris Heller <hellertime@gmail.com>
    Author: Chris Heller <cheller@akamai.com>
    
    Closes #1 from hellertime/feature/improve-encoding-doc and squashes the following commits:
    
    c2e5f54 [Chris Heller] Link the delta encoding section
    c1e85d2 [Chris Heller] Add anchor links
    hellertime committed with julienledem Aug 7, 2014
  2. PARQUET-12: Add specs for new logical types.

    This adds the new logical types from #3 to the LogicalTypes.md specification.
    
    Author: Ryan Blue <rblue@cloudera.com>
    
    Closes #5 from rdblue/PARQUET-12-add-new-type-docs and squashes the following commits:
    
    be414fe [Ryan Blue] PARQUET-12: Add specs for new logical types.
    rdblue committed Aug 7, 2014
Commits on Aug 5, 2014
  1. PARQUET-58: Add PR merge tool

    This is a copy of the merge tool from incubator-parquet-mr. I've used it to merge a PR already, so it works.
    
    Author: Ryan Blue <rblue@cloudera.com>
    
    Closes #4 from rdblue/PARQUET-58-add-merge-tool and squashes the following commits:
    
    b06c32c [Ryan Blue] PARQUET-58: Add PR merge tool and docs.
    rdblue committed with julienledem Aug 5, 2014
Commits on Jul 30, 2014
  1. PARQUET-12: Add format support for additional converted types.

    Author: Jacques Nadeau <jacques@apache.org>
    
    Closes #3 from jacques-n/PARQUET-12 and squashes the following commits:
    
    7001502 [Jacques Nadeau] Remove micros implementations until everyone is agreed on micros versus nanos.
    b0e067c [Jacques Nadeau] PARQUET-12: Add format support for additional converted types.
    jacques-n committed with rdblue Jul 30, 2014
Commits on May 27, 2014
  1. Merge pull request #85 from Parquet/field_id

    add field_id in SchemaElement
    julienledem committed May 27, 2014
  2. fix id

    julienledem committed May 27, 2014
  3. Merge branch 'master' into field_id

    Conflicts:
    	src/thrift/parquet.thrift
    julienledem committed May 27, 2014
Commits on May 22, 2014
  1. Merge pull request #96 from tobym/master

    Fix typo in README
    dvryaboy committed May 22, 2014
  2. Fix typo in README

    tobym committed May 22, 2014
Commits on May 8, 2014
  1. Merge pull request #95 from Parquet/add_changelog

    add changelog
    julienledem committed May 8, 2014
  2. add changelog

    julienledem committed May 8, 2014
Commits on May 6, 2014
Commits on Apr 15, 2014
  1. Merge pull request #84 from Parquet/decimal_metadata

    Add metadata in the schema for storing decimals.
    nongli committed Apr 15, 2014
Commits on Mar 28, 2014
  1. Merge pull request #89 from egonina/stats_page_header

    Added statistics to the data page header
    nongli committed Mar 28, 2014
Commits on Mar 26, 2014
Commits on Mar 25, 2014
  1. Merge pull request #86 from dswang/master

    Fix minor formatting, correct some wording under the "Error recovery" se...
    nongli committed Mar 25, 2014
Commits on Mar 21, 2014
Commits on Mar 4, 2014
Commits on Feb 10, 2014
  1. Merge pull request #82 from Parquet/exclude_thrift_source_from_jar

    exclude thrift source from jar
    nongli committed Feb 10, 2014