Skip to content

Releases: apache/amoro

v0.7.0-incubating

02 Aug 04:06
Compare
Choose a tag to compare

Highlight:

  • Support automatically generating and managing Tags and Branches for tables #1354
  • Support killing the running optimizing process #1862
  • Support detail page for optimizing process #2114
  • Support Spark-based optimizer #1812
  • Add metric collection for AMS #1528
  • Support managing tables with multiple formats under one Catalog #1061

Features:

  • Support automatically generating and managing Tags and Branchs for Table #1354
  • Support killing the running optimizing process #1862
  • Support detail page for Optimizing Process #2114
  • Support Spark-based optimizer #1812
  • Add metric collection for ams #1528
  • Support managing tables with multiple formats under one Catalog #1061
  • Make local terminal cores configurable #2492
  • Support reading encrypted iceberg data files #2401
  • Add basic authentication to REST API #2667

Improvements:

  • Improve major optimizing: compact segment files to target size #2330
  • Put the database username and password into Kubernetes Secrets #2291
  • Avoid releasing external optimizer instances through ams #2315
  • Mixed-format Table stats for Trino engine need to be calculated individually #2246
  • Add planning optimizing status for table #2290
  • Support computed columns for mixed-format tables in Flink DDL #1457
  • The Files page supports filtering by partition name and sorting by dictionary value. #2316
  • Support log out of the ams dashboard #2359
  • Support parallelized planning in one optimizer group #1951
  • Skip cleaning up dangling delete files for Iceberg V1 table #2222
  • Configure iceberg.worker.num-threads in the configuration file #2386
  • Support sorting the external Iceberg table list #1716
  • Add table filter for catalog properties #2310
  • Show the format version of Iceberg Table #2260
  • Make the maximum input file size for each optimizing thread configurable #2385
  • The Files page supports filtering by partition name and sorting by dictionary value. #2316
  • Use partition filter to speed up optimizing plan #2417
  • Add more configurable properties to the ams server #2333
  • Dispose of the current tasks after task deserialization error #2521
  • Paimon snapshot display on the page should be arranged in reverse order according to the snapshot time of the table. #2606
  • Add support for setting/altering mixed-format table properties with Flink SQL #2
  • Helm adds Kyuubi (as a terminal backend) and PostgreSQL-related conf #2306
  • Validate hive-site.xml when uploading a file for Hive-catalog #2520
  • Optimize the speed of searching for tables in the tables navigation bar #2215
  • Use CachingCatalog to reduce the time cost of IcebergCatalogWrapper#loadTable #1794
  • Enable return URL parameter when redirecting to login page #2637
  • Display more details on the page of Optimizing->Optimizers #2698
  • Improve the version info about the project #2813
  • AMS should use G1 GC by default #2830
  • Support different simple authorization user names for different catalogs #2918
  • Move admin username and password to Kubernetes secrets #2948
  • Add Tests for helm chart templates #2778
  • Support S3 storage for Paimon format catalog #2972
  • Support changing the language to Chinese for Dashboard #2958

BugFixes:

  • After disabled self-optimizing, the table status is not updated to idle #2250
  • Optimizing commit failed: org.apache.iceberg.exceptions.ValidationException: Missing required files to delete #2454
  • Periodic snapshots expiring tasks will be scheduled redundantly #2453
  • When using Flink API to write data into a mixed-iceberg table, it will update the table metadata to incorrect value #2525
  • Optimizing not trigged when partition value is null #2542
  • When expiring historical data, the latest snapshot generated by amoro should not be used #2555
  • The process of deleting expired snapshots was interrupted by exception #2579
  • Loading Mixed-Iceberg format tables of internal catalog failed in Flink #2587
  • Failed to remove orphan files after dropping a partition field #2617
  • The Last Commit Time for the Paimon partitioned table in the File pages has the wrong zone #2607
  • If an exception occurs in externalCatalog.listTables during the exploreExternalCatalog process, exploreExternalCatalog will fail this time. #2638
  • Failed to view snapshot details #2614
  • Hive table name may be too long for AMS table identifiers #2649
  • Skip the deleted directories when cleaning orphan files #2619
  • Optimizing process stuck after a few failures #2623
  • Lack of database type mandatory checking, leading to NPE in existing MySQL config check #2733
  • Iceberg temporary snapshot tables should be excluded from data cleanups #2488
  • Filformat ORC does not support rolling new files for mixed_format #2560

Amoro-0.6.1

21 Feb 07:39
Compare
Choose a tag to compare

Highlight:

  • Improve the performance of the optimizer in cases where there are a large number of delete files

Improvements:

  • Improve optimizer's compaction performance in large delete scenarios #2266
  • Store max transaction id on the snapshot level for Mixed Format Table #1720
  • Reduce triggering of optimizating evaluation for native iceberg format #2350
  • The mixed hive tables should be allowed to be dropped when the hive table has already been dropped #2286
  • Print GC date stamps #2416
  • Exclude kryo dependency from flink-optimizer #2437

BugFixes:

  • Fix data lost issue after optimizing for Mixed format tables #2253
  • After disabling self-optimizing, the table status is not updated to idle #2250
  • Optimizing status blocked at minor/major when restarting AMS #2279
  • Exception happened when using Flink DataStream API to write data into Mixed format tables #2271
  • Failure when creating a table and immediately retrieving Blocker #2289
  • Fix deleting Puffin files by mistake when cleaning orphan files #2320
  • There is a not daemon thread exists when the Spark job finished #2336
  • SimpleDateFormat throws parsing errors when using multi-threading #2324
  • Dashboard displays wrong partition path when the table had multiple partition spec #1863
  • Fix the NullPointerException issue when modifying the self-optimizing.enabled configuration #2388
  • TableController set partition string value to null when getting file list for a non-partitioned table #2407
  • Fix loading the optimizing snapshot id of ChangeStore for Mixed Format tables#2430
  • Batch deletion of ChangeStore files did not take effect for the Mixed Format tables #2440
  • Fix File format is required when scanning file entries if no file format is specified in the file name #2441
  • Mixed Hive tables mistakenly delete hive files during expiring snapshots #2404
  • Using the wrong database name when switching to another Catalog #2413
  • Fix Mixed Format tables expiring all the snapshots with optimized sequence #2394
  • Add a default value of table-filters in spark scan builder to avoid NPE when comparing the table scan #2313
  • Failed to get table properties when BaseStore contains property watermark.base #2295
  • Show create table statement return incorrect SQL with Spark #2272
  • AMS will repeat trigger the same minor optimizing jobs when the bucket > self-optimizing.minor.trigger.file-count #2464
  • Periodic snapshots expiring tasks will be scheduled redundantly #2453

Amoro-0.6.0

06 Nov 11:59
Compare
Choose a tag to compare

Highlight:

  • Kubernetes Integration
    • Support running AMS in a Kubernetes environment
    • Support running Optimizer in a Kubernetes environment
  • Partition expiration
  • Paimon integration
    • Support viewing metadata information such as Schema, Properties, Files, Snapshots, Compactions, Operations of the Paimon table
    • Support executing Spark SQL supported by Paimon in the Terminal interface
  • Mixed format support ORC file format
  • Mixed format support flink-1.16 and flink-1.17

Features:

  • Integration with Kubernetes #1917
  • Support for expiring partitions in Iceberg tables #1758
  • Support integration with Apache Paimon #1269
  • Support ORC file format for Mixed format tables #740
  • Upgrading Flink version based on Mixed-format table #1983
  • Enrich the storage type registry when creating a catalog. #2007

Improvements:

  • Support s3:// and s3a:// protocols in the default distribution. #2009
  • Make health status checks available without logging in #2081
  • Separate local-optimizer and flink-optimizer into different modules and package them independently. #2008
  • Support list partitions via flink catalog #1940
  • Add conf/env.sh for jvm options configurations #2047
  • Change position delete index from set to bitmap of optimizing #2078
  • Critical vulnerabilities in the project dependencies need to be fixed #2040
  • Make AMS table sync with external catalogs multithreaded #2109
  • Change the default value of self-optimizing.major.trigger.duplicate-ratio #2225
  • Log4j 1.2 is a SocketServer class that is vulnerable to deserialization of untrusted data #1548
  • Fix the format of thread name in DefaultTableService #2163

BugFixes:

  • Failed to optimize Mixed Format KeyedTable with Timestamp(without zone) column as Primary key #2091
  • An infinite loop occurs in merge optimization, and the number of input files is equal to the number of output files. #2090
  • Get incorrect partition field name when refresh tables #2165
  • Catalog type always null in TerminalSession #2146
  • ams server can not initialized on HA mode #2083
  • Delete sync tables when drop external catalogs #2235
  • Data lost after optimizing in Mixed format #2253

Amoro-0.5.1

10 Oct 12:10
Compare
Choose a tag to compare

Changelog for v0.5.1

Features:

  • [Feature]: Show summary details of snapshots in the table transaction tab #1827
  • [Subtask]: Implement displaying metrics on the front end of AMS #1771
  • [Feature]: Support table filters to make AMS ignore tables that are not needed #1870
  • [Feature]: Support MOR with Flink Engine in SQL/Table API #1422
  • [Feature]: Add PostgresSQL RDS as system database #1966
  • [Subtask]: Support creating mixed-iceberg format tables in any catalog which support iceberg. #1336

Imporvements:

  • [Improvement]: modify the parameter transfer method for flink optimize #1789
  • [Subtask]: Remove IcebergContentFile and related codes #1839
  • [Improvement]: Extract Spark Common Module to reuse code between different spark versions. #1449
  • [Improvement]: Show correct resource occupation for external optimizer #1845
  • [Improvement] Polish optimizing executor code #1895
  • [Improvement]: Remove AmsClient in UnkeyedTable and KeyedTable #1904
  • [Improvement]: add mysql schema init #1899
  • [Improvement]: Perform file filtering as early as possible when during optimizing plan process #1883
  • [Improvement] Add bounds of timestamp type for mixed-hive table #1920
  • [Improvement]: Flink Logstore supports 'group-offsets' and 'specific-offsets' Startup mode. #1931
  • [Improvement]: Using the parameters in TableConfiguration instead of from Table properties #1943
  • [Improvement]: Improve the triggering conditions for segment file rewriting #1953
  • [Improvement]: Helm Chart Support Overwrite JVM Optional #1962
  • [Subtask]: Do not cache DataFile in memory when evaluating tables. #1840
  • [Improvement]: Introduce a table property to set the task splitting size of self-optimizing #1935
  • doc: update dashboard readme #1984
  • [Improvement]: Flink optimizer configure managed memory explicitly in both the doc and the startup script #1907
  • [Improvement]: CI run check style first #1293
  • [Improvement]: Add basic configration of spotless #1926
  • [Improvement]: The version of the Amoro in the Docker startup script should be the latest by default #2010
  • support running ams in jdk17 environment. #1896

BugFixs:

  • [Bug]: the historical metadata file has not been deleted #1864
  • [Bug]: Segment file first binpack strategy leads to inability to optimize SplitTasks #1866
  • [Bug]: The water mark is not displayed properly for tables without primary keys #1603
  • [Bug]: Fix unit conversion for parameters -msz to spillmap configuration #1893
  • [Bug]: Snapshot file not found after Flink ingestion job has been running for a while #1882
  • [Bug]: BasicUnkeyedTable does not Override the name method #1914
  • [Bug]: Can not set configuration when setting UGI #1912
  • [Bug]: If the optimizing of a Mixed Hive table fails to alter the Hive location, the files will be moved to the old Hive location. #1898
  • [Bug]: MAJOR Optimizing is running repeatedly #1924
  • [Bug]: The optimizing task has been in a suspended state all along. #1939
  • [Bug]: Failed to persist optimizing process records #1869
  • [Bug]: Mix-iceberg format with hive metastore not working on spark #1963
  • [Bug]: Mix-iceberg format with hive metastore not working on spark session catalog #1964
  • [Bug]:When using an HDFS cluster protected by Kerberos, the fs.hdfs.impl.disable.cache=false configuration is invalid, FileSystem is unable to hit the cache #1857
  • [Bug]: When using s3a schema, orphan files could not be deleted. #1981
  • [Bug]: Using mix iceberg in spark has some docs problems #1908
  • [Bug]: Avoid NPE during disposing ArcticServiceContainer #1985
  • [Bug]: Orphan files clean mistakenly deleted the Iceberg statistic file #1959
  • [Bug]: If the iceberg table has a primary key of date type, in upsert mode, after Amoro merge, the primary key duplicate data can be found #1979
  • [Bug]: If the projected schema of the Lookup Table Source doesn't include all primary keys and requires automatic addition #1891
  • [Bug]: ams-optimizer FLINK job expired oom due to frequent requests for Kerberos. #1980
  • [Bug]: Terminal is unable to create the iceberg table, when Kerberos authentication is enabled #1996
  • [Bug] Trino failed to query Amoro Mixed Format Table after restarting AMS #1977
  • [Bug]: The startup script has a judgment error #2006
  • [Bug]: Spark SQL using amoro spark runtime error class cannot find ResolveProcedures #2032
  • [Bug]: Dashboard can't show files page with kerberos and mixed-iceberg format in external catalog #2029
  • [Bug]: When a time value that is too large or too small is written, it will encounter a 'long overflow' exception. #2044
  • [Bug]: Ams Server always executes initSqlScript when using mysql8 #2049

Amoro-0.5.0

10 Aug 02:39
Compare
Choose a tag to compare

Changelog for v0.5.0

Features:

  • [Feature]: Support Spark 3.3 for Mixed format #261
  • [Feature]: Support Spark 3.2 for Mixed format #1383
  • [Feature]: Support MOR with Flink Engine in runtime batch mode for Mixed format #5
  • [Feature]: Support Iceberg on S3 or other non-hadoop storage system. #1476
  • [Feature]: Add self-optimizing detail in optimized tab of table  #1535
  • [Feature]: Support to manage optimizer groups in dashboard #1537
  • [Feature]: Remove the Independent DeleteFiles for the Iceberg Format Table #1628
  • [Feature]: Support lookup join on Mixed format tables with Flink #1322
  • [Subtask]: Make AMS as an Iceberg Catalog provider via implement RestCatalogAPI #1339

Imporvements:

  • [Improvement]: AMS module refactoring in 0.5 #1372
  • [Improvement]: Add new task allocation rules to make the number of concurrency and read cost more balanced #1311
  • [Improvement]: Improve query speed for Mixed tables in Spark #1349
  • [Spark][Improvement]: Add scala code style check for spark module #1059
  • [Improvement][Spark]: Speed up commit process for Mixed Hive format tables #1350
  • [Subtask]: Introdue the concept of the InternalCatalog and ExternalCatalog #1544
  • [Improvement]: Optimize the description of Spark CreateTableLike #1340
  • [Improvement]: Improve the project documentation #1510

BugFixs:

  • [Bug]: The position-delete written by iceberg directly through the base table cannot take effect when read by Arctic #1347
  • [Bug]: When AMS is deployed HA, use Kerberos to access non-Kerberos zookeeper will throw an exception #1346
  • [Bug]: High CPU occupancy of AMS #1332
  • [Bug]: jacoco-maven-plugin report EOFException in Flink Module CI #1330
  • [Bug]:[Spark] select failed after update on keyed table for kerberos auth failed. #488
  • [Bug]: The column ‘_transaction_id’ and '_change_action' will be null when querying change table with spark #1385
  • [Bug]: Hive function can't be used when using ArcticSparkSessionCatalog under spark 3.3 #1397
  • [Bug]: Find duplicate records when enable rocksdb map #1751
  • [Bug]: when partition value contains spaces, optimize can't process correctly #1653
  • [Bug]: Build failed when change hadoop version to 3.2.1 #1646

Arctic-0.4.1

03 Apr 08:05
Compare
Choose a tag to compare

Changelog for v0.4.1

Features:

  • [Feature]: Self-Optimizing scan files from metadata instead of from file info cache #1093
  • [Subtask][Flink]: support pulsar read and write without consistency in Flink 1.12 #1007
  • [Feature]: A new design of resolving data conflicts without relying on AMS to generate TransactionId #994
  • [Feature]: Introduce a mechanism for concurrency control between Writing and Optimizing #985
  • [Feature][Spark]: Support drop partition #918
  • [Feature][SPARK]: Support truncate table #540
  • [Feature][Spark]: Support Merge Into for Spark3.x #395

Imporvements:

  • [Improvement]: Self-Optimizing for Mixed Format tables should limit the file cnt for each Optimizing #1213
  • [Improvement][AMS]: Automatic retry optimizing if commit failed #1103
  • [Improvement]: Replace the keyword base to express the meaning of the lowest part of sth #1082
  • [Improvement][AMS]: Support set login user and login password in config yaml file #1081
  • [Improvement][AMS]:Optimize settings page and terminal page #1005
  • [Improvement][Flink]: Refactor Log-store Source to FLINK FLIP-27 API #969
  • [Improvement][Flink]: Support all logstore configuration items to be configured in table properties #933
  • [Improvement]: More elegant display of error messages in terminal #913

BugFixs:

  • [ARCTIC-1025][FLINK] Fix data duplication even if there is a primary key table with upsert enabled #1180
  • [Bug][Spark]: The orphan files cleanup of insert overwrite doesn't take effect for un-partitioned table. #1174
  • [Bug]: When reading partial fields from Logstore, the number of fields does not match #1171
  • [Bug]: Address the deserializing exception of the array type in the logstore #1111
  • [Bug][Core]: The PartitionPropertiesUpdate can't remove partition properties key #1107
  • [BUG]: The expire snapshot and the orphan file clean scan related files from metadata #1105
  • [BUG][AMS]: Automatic retry optimizing if commit failed #1103
  • [Bug]: Browser tab does not display Arctic's icon #1091
  • [Bug]: When the Metastore type is Hadoop, Terminal executes the spark sql without loading the configuration file #1090
  • [Bug][Spark]: The read and write authentication user are different in Spark when using mixed Iceberg format #1069
  • [Bug][Spark]: Insert overwrite select from view will throw exception #1066
  • [Bug]: When the flink jobs reads the arctic table for a while, the flink job fails #1063
  • [Bug]: Spark reads the timestamp field eight hours longer than the actual Flink engine writes timestamp #1062
  • [Bug]: KeyedTableScanTask confuse files from BaseStore and ChangeStore #1045
  • [Bug][Spark]: Create Table As Select should write to the base store. #1026
  • [Bug]: When using spark query type timestamp column failed #978
  • [Bug]: Flink sets watermark on Arctic table fields, but ArrayIndexOutOfBoundsException occurs when reading data #957
  • [Bug][Spark]: The data are written repeatedly after the Spark Executor failover #917
  • [Bug]: Spark refresh table error #620
  • [Bug]: Spark batch write failed w/ Already closed files for partition #613
  • [Bug][Flink]: reverse message order when retract message from message queue. #482

Arctic-0.4.0

06 Dec 05:50
Compare
Choose a tag to compare

Changelog for Arctic-0.4.0

New Features:

  • [Feature]: Support managing iceberg tables already existed #260
  • [Feature][Spark]: Standard auth utils method for Kyuubi #582
  • [Feature][AMS]: Show table info for native iceberg in AMS dashboard #531
  • [Feature][Spark]: Support duplicate-key-check when insert into and insert overwrite #484
  • [Feature][AMS]: Support add/modify by AMS Dashboard #412
  • [Feature][AMS]: Add setting page in AMS dashboard #407
  • [Feature]: Support table watermark to determine table freshness #394
  • [Feature]: AMS terminal uses Kyuubi as its backend SQL engine #262

Improvement:

  • [Improvement]: Hidden username and password in setting page #630
  • [Improvement]: Fix file upload when using derby as storage #622
  • [Improvement]: Expose arctic benchmark code and docker environment file #265
  • [Improvement]: Do some initialization when adding catalog #617
  • [Improvement]: Identify Hadoop and Hive version Arctic had supported #356
  • [Improvement]: Upgrade iceberg version to 0.13 or 0.14 #397
  • [Improvement]: Update the documentation in the Flink DML section #598
  • [Improvement][AMS]: Improve the high concurrency processing capability of the allocateTransactionId interface #393
  • [ARCTIC-557][FLINK] Doc catalog authorization configs #570
  • [Optimize] Risk of AMS memory usage and task execute timeout when processing a large number of small files #128

Bugfix:

  • [Bug]: Some bugs in the local test for version 0.4.0 bug #626
  • [Bug]: Including a line-through in the arctic HA property causes the catalog to load failed #515
  • [Bug][Flink]: Read duplicated data from a keyed table, even though the upsert mode is enabled. #592
  • [Bug]: Flink 1.15 quickstart failed for java.lang.ClassNotFoundException #563
  • [ARCTIC-561] When the flink job fails, the flink job will commit a wrong transaction id #568
  • [Bug]: For the Arctic keyed table, the out-of-order TransactionId lead to inconsistent data #479
  • [Bug][Flink]: Watermark may be lost if split read too fast. #486

Arctic-0.3.2-rc1

20 Oct 12:38
Compare
Choose a tag to compare

Changelog for Arctic-0.3.1-rc1

New Features:

  • Support Insert/Delete/Update in Spark. #173
  • Support Spark version 2.3. #18 #21

Improvement:

  • Unify the scan.startup.mode config values for both Logstore and Filestore. #441
  • Add initialization and uprade SQL scripts for every version. #362
  • Check data lost after optimize. #336
  • Check if Hive table can be upgrad before upgrading. #312
  • Supprt MySQL 8.0 as system database for AMS. #48

Bugfix:

  • Changestore's schema may be not correct after adding new columns for Hive table. #497
  • Reading from Filestore may failed for empty table. #475
  • Optimize may be blocked after commiting failed. #464
  • Table list may be not correct with multiple Hive catalogs. #402
  • Cannot create Hive tables with uppercase field names. #380
  • Cannot uprade Hive tables with multiple partition columns. #352
  • Inserting overwrite into Hive table may failed in Spark. #334

Arctic-0.3.1-rc1

02 Sep 11:02
375546f
Compare
Choose a tag to compare

Changelog for Arctic-0.3.1-rc1

New Features:

  • Support hive table. #38
  • Arctic table support real-time dimension table join with Flink. #94
  • Support Flink version 1.15. #166
  • AMS support high available mode. #117

Improvement:

  • Support create table like with Spark. #123
  • AMS support list table DDL operations. #116
  • AMS support tables and database list filter. #115
  • AMS list transactions witch user commit only. #86

Bugfix:

  • Flink may throw java.lang.ArrayIndexOutOfBoundsException when writing into arctic table. #169
  • Optimizer may not work after multiple streaming update/delete commits. #183
  • Data may lost after optimie. #240

arctic-0.3.0-rc1

22 Jul 09:24
5bdcf29
Compare
Choose a tag to compare
arctic-0.3.0-rc1 Pre-release
Pre-release

arctic 0.3.0 rc1