Skip to content

v2.0.0

Latest
Compare
Choose a tag to compare
@dongjoon-hyun dongjoon-hyun released this 08 Mar 21:20
· 140 commits to main since this release
46eb6ff

Milestone

Branch

This is a new major release which we cannot provide a changelog.

Summary of notable changes

ORC-1547: Spin-off ORC Format
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1507: Support Java 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1577: Use ZSTD as the default compression
ORC-1430: Use Hadoop 3.3.5 shaded clients
ORC-1456: Update Hadoop to 3.3.6
ORC-1251: Use Hadoop Vectored IO
ORC-1463: Support brotli codec
ORC-1100: Support vcpkg
ORC-1620: Add Apple Silicon Test Coverage

New Feature

ORC-998: Refactor compression output buffer within OutStream for better portability
ORC-1088: Suport ZSTD_JNI and columnn compress to set compression level
ORC-1100: Support vcpkg
ORC-1251: Use Hadoop Vectored IO
ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
ORC-1440: Check for protobuf config based module
ORC-1463: Support brotli codec
ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
ORC-1512: Drop Java 8/11 and make Java 17 by default
ORC-1531: Create orc-format module and repo
ORC-1545: Use orc-format 1.0.0-SNAPSHOT
ORC-1546: Use orc-format 1.0.0-alpha
ORC-1547: Spin-off ORC Format
ORC-1551: Use orc-format 1.0.0-beta
ORC-1572: Use Apache ORC Format 1.0.0
ORC-1585: [C++] Add orc-format_ep as a dependency of orc

Improvement

ORC-1459: Mark DataBuffer::size() and DataBuffer::capacity() as const
ORC-1460: specification: Clarify how dictionary entries are sorted
ORC-1461: Mark Int128::getHighBits() and Int128::getLowBits() as const
ORC-1472: Replace deprecated method in TestMurmur3.java
ORC-1479: Enhance example usage message to use Uber jar
ORC-1481: [C++] Better error message when TZDB is unavailable
ORC-1504: Add lower bound check in get API for DynamicIntArray
ORC-1506: Replacing deprecated valueOf() with recommended forNumber()
ORC-1509: Auto grant contributor role to first-time contributors
ORC-1520: Remove JDK 8 settings from pom
ORC-1567: Add the -ignoreExtension configuration to the sizes and count commands of orc-tools
ORC-1570: Add supportVectoredIO API to HadoopShimsCurrent and use it
ORC-1571: Supports displaying raw data size in the meta command of orc-tools
ORC-1577: Use ZSTD as the default compression
ORC-1580: Change default DataBuffer constructor to use reserve instead of resize
ORC-1595: Add a short-cut to skip tiny inputs for ZstdCodec.compress
ORC-1596: Remove redundant Zstd.isError JNI usage
ORC-1597: Set bloom filter fpp to 1%
ORC-1600: Reduce getStaticMemoryManager sync block in OrcFile
ORC-1601: Reduce get HadoopShims sync block in HadoopShimsFactory
ORC-1610: Reduce the number of hash computation in CuckooSetBytes
ORC-1613: Zstd decompression supports direct buffer
ORC-1631: Supports summary output in sizes command
ORC-1637: [C++] Port conan recipe from upstream conan center
ORC-1638: Avoid System.exit(0) in count command
ORC-1639: [C++] Reduce unnecessary compiler flags in CMake
ORC-1641: Remove sourceFileExcludes from maven-javadoc-plugin
ORC-1642: Avoid System.exit(0) in scan command
ORC-1593: Set orc.compression.zstd.level to 3 by default

Bug Fix

ORC-634: Fix the json output for double NaN and infinite
ORC-1455: [C++] Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
ORC-1473: Zero-copy zeroCopyReadRanges and releaseBuffer bugs
ORC-1476: Maven build fail with unsupported platform: protoc-3.17.3-osx-aarch_64.exe
ORC-1480: [C++] Build failed when the BUILD_CPP_ENABLE_METRICS is ON
ORC-1500: [C++] The partition field does not support English special characters
ORC-1528: When using the orc.min.disk.seek.size configuration to read extremely large ORC files, a java.nio.BufferOverflowException may occur.
ORC-1553: Reading information from Row group, where there are 0 records of SArg column
ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
ORC-1568: Use readDiskRanges if orc.use.zerocopy is enabled
ORC-1575: Use ASF Archive URL instead Download URL
ORC-1578: Fix SparkBenchmark according to SPARK-40918
ORC-1588: Fix incorrect Decimal assert in LeafFilterFactory
ORC-1602: [C++] limit compression block size

Task

ORC-1422: Setting version to 2.0.0-SNAPSHOT
ORC-1434: Remove org.apache.hadoop from dependabot.yml
ORC-1484: Use JIRA_ACCESS_TOKEN in merge_orc_pr.py
ORC-1485: Enable checkstyle checks for test classes
ORC-1486: Fix checkstyle violations for tests in orc-core module
ORC-1492: Fix checkstyle violations for tests in mapreduce, tools, bench modules
ORC-1496: Use iterator to suggest backporting branches
ORC-1515: Skip publishing orc-example module
ORC-1516: Fix minor typo in comments in IOUtils
ORC-1518: Remove findbugs folders
ORC-1529: Fix minor typos in pom.xml
ORC-1530: Rename variables in RecordReaderUtils.ChunkReader#create
ORC-1535: Remove generated Java docs from source tree
ORC-1536: Remove hive-storage-api link from maven-javadoc-plugin
ORC-1540: Remove MacOS 11 from GitHub Action CI
ORC-1542: Use Pattern Matching for instanceof (JEP-394)
ORC-1549: Update libhdfspp.tar.gz by adding #include <cstdint>
ORC-1569: Remove HadoopShimsPre2_3, HadoopShimsPre2_6, HadoopShimsPre2_7 classes
ORC-1579: Add ASF Generative Tooling Guidance to PR template
ORC-1591: Lower log level from INFO to DEBUG in *ReaderImpl/WriterImpl/PhysicalFsWriter
ORC-1592: Suppress KeyProvider missing log
ORC-1598: Close reader in orc-examples
ORC-1604: Deprecate non-utf8 bloom filter for Java writer

Test

ORC-1003: Recover java-examples-test
ORC-1409: Add stream order description in ORC spec.
ORC-1432: Add MacOS 13 GitHub Action Job
ORC-1474: Replaced deprecated getMinimum/Maximum in TestColumnStatistics
ORC-1475: [C++] ConvertColumnReader.TestConvertNumericToStringVariant fails when compiled with unsigned char
ORC-1477: Remove unused imports from Test classes
ORC-1478: Add Unit Test for org.apache.orc.impl.DynamicIntArray
ORC-1510: Fix package for TestOrcUtils and add more test cases
ORC-1541: Add Ubuntu 24.04 LTS Docker Test
ORC-1555: Simplify fedora37 docker image
ORC-1556: Add Rocky Linux 9 Docker Test
ORC-1557: Add GitHub Action CI for Docker Test
ORC-1558: Remove ubuntu22_jdk=21 and ubuntu22_jdk=21_cc=clang test combinations from docker/os-list.txt
ORC-1574: Update GitHub Action YAML files in branch-2.0
ORC-1586: Fix IllegalAccessError when SparkBenchmark runs on JDK17
ORC-1607: Fix testDoubleNaNAndInfinite to use TestFileDump.checkOutput
ORC-1614: Set ByteBuffer limit in TestBrotli test
ORC-1618: Disable building tests for snappy
ORC-1619: Add MacOS 14 to GitHub Action
ORC-1620: Add Apple Silicon Test Coverage
ORC-1621: Switch to oraclelinux9 from rocky9
ORC-1623: Use directOut.put(out) instead of directOut.put(out.array()) in TestZstd test
ORC-1630: Test using VectoredIO of hadoop to read ORC
ORC-1632: Add test for count command
ORC-1633: Add test for sizes command
ORC-1643: Add test for scan command

Build and Dependency Changes

ORC-870: Unpin and upgrade jmh to 1.37
ORC-1423: Bump build-helper-maven-plugin to 3.4.0
ORC-1424: Bump maven-assembly-plugin to 3.6.0
ORC-1425: Bump checkstyle to 10.11.0
ORC-1427: Use Hadoop 3.3.5 in tools module
ORC-1429: Upgrade Maven to 3.8.8
ORC-1430: Use Hadoop 3.3.5 shaded clients
ORC-1431: Use parquet to 1.13.1 in bench module
ORC-1437: Bump checkstyle to 10.12.0
ORC-1438: Bump auto-service to 1.1.0
ORC-1439: Bump guava to 32.0.0-jre
ORC-1442: Update guava to 32.0.1
ORC-1445: Bump snappy-java to 1.1.10.1 in bench module
ORC-1448: Bump auto-service to 1.1.1
ORC-1456: Update Hadoop to 3.3.6
ORC-1466: Bump junit to 5.10.0
ORC-1467: Upgrade commons-lang3 to 3.13.0
ORC-1468: Bump opencsv to 5.8
ORC-1469: Update guava to 32.1.2
ORC-1470: Update maven-shade-plugin to 3.5.0
ORC-1493: Bump byte-buddy to 1.14.6
ORC-1502: Upgrade Maven to 3.9.4
ORC-1508: Upgrade slf4j to 2.0.9
ORC-1513: Upgrade snappy to 1.1.10.4
ORC-1514: Remove zookeeper runtime dependency
ORC-1517: Bump snappy-java to 1.1.10.5 in bench module
ORC-1521: Bump com.google.guava:guava to 32.1.3-jre
ORC-1522: Bump commons-cli:commons-cli to 1.6.0
ORC-1523: Bump maven-checkstyle-plugin to 3.3.1
ORC-1524: Bump maven-shade-plugin to 3.5.1
ORC-1526: Bump spotbugs-maven-plugin to 4.8.1.0
ORC-1527: Bump junit to 5.10.1
ORC-1533: Upgrade commons-lang3 to 3.14.0
ORC-1534: Upgrade build-helper-maven-plugin to 3.5.0
ORC-1537: Unpin and upgrade spotless to 2.41.0
ORC-1538: Unpin and upgrade maven-dependency-plugin to 3.6.1
ORC-1543: Bump spotless-maven-plugin to 2.41.1
ORC-1544: Unpin and upgrade protobuf-java to 3.25.1
ORC-1550: Upgrade Maven to 3.9.6
ORC-1562: Bump com.google.guava:guava to 33.0.0-jre
ORC-1565: Bump slf4j.version to 2.0.10
ORC-1566: Make Brotli dependency as optional
ORC-1576: Upgrade spark.jackson.version to 2.15.2 in bench module
ORC-1581: Bump slf4j.version to 2.0.11
ORC-1582: Bump protobuf-java to 3.25.2
ORC-1605: Upgrade brotli4j to 1.16.0
ORC-1616: Upgrade aircompressor to 0.26
ORC-1624: Upgrade Spark to 3.5.1
ORC-1626: Upgrade Mockito to 5.10 and byte-buddy to 1.14.11
ORC-1627: Unpin scala-library
ORC-1628: Bump protobuf-java to 3.25.3

Documentation

ORC-994: Fix javadoc so that it doesn't put files into the source tree
ORC-1471: Updated README.md to use maven 3.8.8
ORC-1491: Update Python documentation with PyArrow 13.0.0 and Dask 2023.8.1
ORC-1503: Update README.md to use maven 3.9.4
ORC-1552: Update README.md to use maven 3.9.6
ORC-1564: Add Java ORC configuration documentation
ORC-1584: Remove README about Proto subdirectory
ORC-1587: Fix usage command of SparkBenchmark document
ORC-1599: Add zstd compression level and windowlog in Java configuration documentation
ORC-1612: Document available encodings at orc.compress
ORC-1625: Switch to oraclelinux9 from rocky9 in README