Permalink
Commits on Jan 11, 2017
  1. ARROW-421: [Python] Retain parent reference in PyBytesReader

    Pass Buffer to BufferReader so that zero-copy slices retain reference to PyBytesBuffer, which prevents the bytes object from being garbage collected prematurely. Also added some helper tools for inspecting Arrow Buffer objects in Python.
    
    Close #278
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #279 from wesm/ARROW-421 and squashes the following commits:
    
    acf730e [Wes McKinney] Rename method
    50c195a [Wes McKinney] Fix accidental typo
    ef20185 [Wes McKinney] Pass Buffer to BufferReader so that zero-copy slices retain reference to PyBytesBuffer, which prevents the bytes object from being garbage collected prematurely
    wesm committed Jan 11, 2017
Commits on Jan 10, 2017
  1. ARROW-442: [Python] Inspect Parquet file metadata from Python

    I also made the Cython parquet extension "private" so that higher level logic (e.g. upcoming handling of multiple files) can be handled in pure Python (which doesn't need to be compiled)
    
    Requires PARQUET-828 for the test suite to pass.
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #275 from wesm/ARROW-442 and squashes the following commits:
    
    a4255a2 [Wes McKinney] Add row group metadata accessor, add smoke tests
    75a11cf [Wes McKinney] Add more metadata accessor scaffolding, to be tested
    e59ca40 [Wes McKinney] Move parquet Cython wrapper to a private import, add parquet.py for high level logic
    wesm committed with xhochy Jan 10, 2017
Commits on Jan 8, 2017
  1. ARROW-438: [C++/Python] Implement zero-data-copy record batch and tab…

    …le concatenation.
    
    This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #274 from wesm/ARROW-438 and squashes the following commits:
    
    1f39568 [Wes McKinney] py3 compatibility
    2e76c5e [Wes McKinney] Implement arrow::ConcatenateTables and Python wrapper. Fix bug in ChunkedArray::Equals
    f3cb170 [Wes McKinney] Fix Cython compilation, verify pyarrow.Table.from_batches still works
    af28755 [Wes McKinney] Implement Table::FromRecordBatches
    
    Change-Id: I948b61d848c178edefad63465a74d9c303ad1f18
    wesm committed Jan 8, 2017
Commits on Jan 6, 2017
  1. ARROW-427: [C++] Implement dictionary array type

    I thought some about this and thought that it made sense to store the reference to the dictionary values themselves in the data type object, similar to `CategoricalDtype` in pandas. This will be at least adequate for the Feather file format merge.
    
    In the IPC metadata, there is no explicit dictionary type -- an array can be dictionary encoded or not. On JIRA we've discussed adding a dictionary type flag indicating whether or not the dictionary values/categories are ordered (also called "ordinal") or unordered (also called "nominal"). That hasn't been done yet.
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #268 from wesm/ARROW-427 and squashes the following commits:
    
    5ce3701 [Wes McKinney] cpplint
    a6c2896 [Wes McKinney] Revert T::Equals(const T& other) to EqualsExact to appease clang
    9a4edb5 [Wes McKinney] Implement rudimentary DictionaryArray::Validate
    9efe46b [Wes McKinney] Add tests, implementation for DictionaryArray::Equals and RangeEquals
    b06eb86 [Wes McKinney] Implement PrettyPrint for DictionaryArray
    17c70de [Wes McKinney] Refactor, compose shared_ptr<DataType> in DictionaryType
    b52b3a7 [Wes McKinney] Add rudimentary DictionaryType and DictionaryArray implementation for discussion
    wesm committed Jan 6, 2017
Commits on Jan 5, 2017
  1. ARROW-455: [C++] Add dtor to BufferOutputStream that calls Close()

    Since `Close()` can technically fail, it's better to call it yourself (and it's idempotent), but this will help avoid a common class of bugs in small-scale use cases.
    
    An alternative here is that we could remove all `Close()` calls from all destructors and possibly add a `DCHECK(!is_open_)` to the base dtor to force the user to close handles. The downside of this is that it makes RAII more difficult, so I'd prefer to leave the close-in-dtor even though it can fail in unusual scenarios.
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #269 from wesm/ARROW-455 and squashes the following commits:
    
    821ee22 [Wes McKinney] Add dtor to BufferOutputStream that calls Close()
    wesm committed Jan 5, 2017
Commits on Jan 3, 2017
  1. ARROW-387: [C++] Verify zero-copy Buffer slices from BufferReader ret…

    …ain reference to parent Buffer
    
    This is stacked on top of the patch for ARROW-294, will rebase.
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #266 from wesm/ARROW-387 and squashes the following commits:
    
    061ef8b [Wes McKinney] Verify BufferReader passes on ownership of parent buffer to zero-copy slices
    42a83a4 [Wes McKinney] Remove duplicated includes
    3928ab0 [Wes McKinney] Base MemoryMappedFile implementation on common OSFile interface. Add test case for ARROW-340.
    wesm committed with xhochy Jan 3, 2017
  2. ARROW-294: [C++] Do not use platform-dependent fopen/fclose functions…

    … for MemoryMappedFile
    
    Also adds a test case for ARROW-340.
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #265 from wesm/ARROW-294 and squashes the following commits:
    
    42a83a4 [Wes McKinney] Remove duplicated includes
    3928ab0 [Wes McKinney] Base MemoryMappedFile implementation on common OSFile interface. Add test case for ARROW-340.
    wesm committed with xhochy Jan 3, 2017
  3. ARROW-108: [C++] Add Union implementation and IPC/JSON serialization …

    …tests
    
    Closes #206.
    
    Still need to add test cases for JSON read/write and dense union IPC. Integration tests can happen in a subsequent PR (but the Java library does not support dense unions yet, so sparse only -- i.e. no offsets vector)
    
    Author: Wes McKinney <wes.mckinney@twosigma.com>
    
    Closes #264 from wesm/ARROW-108 and squashes the following commits:
    
    86c4191 [Wes McKinney] Fix valgrind error
    cdfc61d [Wes McKinney] Export UnionArray
    3edca1e [Wes McKinney] Implement basic JSON roundtrip for unions
    30b7188 [Wes McKinney] Add test case for dense union, implement RangeEquals for it
    4887fd2 [Wes McKinney] Move Windows stuff into a compatibility header, exclude from clang-format because of include order sensitivity
    5ca9c57 [Wes McKinney] Implement IPC/JSON serializationf or unions. Test UnionMode::SPARSE example in IPC
    wesm committed with xhochy Jan 3, 2017