Releases · apache/amoro

02 Aug 04:06

zhoujinsong

v0.7.0-incubating

d8e2784

v0.7.0-incubating Latest

Latest

Highlight:

Support automatically generating and managing Tags and Branches for tables #1354
Support killing the running optimizing process #1862
Support detail page for optimizing process #2114
Support Spark-based optimizer #1812
Add metric collection for AMS #1528
Support managing tables with multiple formats under one Catalog #1061

Features:

Support automatically generating and managing Tags and Branchs for Table #1354
Support killing the running optimizing process #1862
Support detail page for Optimizing Process #2114
Support Spark-based optimizer #1812
Add metric collection for ams #1528
Support managing tables with multiple formats under one Catalog #1061
Make local terminal cores configurable #2492
Support reading encrypted iceberg data files #2401
Add basic authentication to REST API #2667

Improvements:

Improve major optimizing: compact segment files to target size #2330
Put the database username and password into Kubernetes Secrets #2291
Avoid releasing external optimizer instances through ams #2315
Mixed-format Table stats for Trino engine need to be calculated individually #2246
Add planning optimizing status for table #2290
Support computed columns for mixed-format tables in Flink DDL #1457
The Files page supports filtering by partition name and sorting by dictionary value. #2316
Support log out of the ams dashboard #2359
Support parallelized planning in one optimizer group #1951
Skip cleaning up dangling delete files for Iceberg V1 table #2222
Configure iceberg.worker.num-threads in the configuration file #2386
Support sorting the external Iceberg table list #1716
Add table filter for catalog properties #2310
Show the format version of Iceberg Table #2260
Make the maximum input file size for each optimizing thread configurable #2385
The Files page supports filtering by partition name and sorting by dictionary value. #2316
Use partition filter to speed up optimizing plan #2417
Add more configurable properties to the ams server #2333
Dispose of the current tasks after task deserialization error #2521
Paimon snapshot display on the page should be arranged in reverse order according to the snapshot time of the table. #2606
Add support for setting/altering mixed-format table properties with Flink SQL #2
Helm adds Kyuubi (as a terminal backend) and PostgreSQL-related conf #2306
Validate hive-site.xml when uploading a file for Hive-catalog #2520
Optimize the speed of searching for tables in the tables navigation bar #2215
Use CachingCatalog to reduce the time cost of IcebergCatalogWrapper#loadTable #1794
Enable return URL parameter when redirecting to login page #2637
Display more details on the page of Optimizing->Optimizers #2698
Improve the version info about the project #2813
AMS should use G1 GC by default #2830
Support different simple authorization user names for different catalogs #2918
Move admin username and password to Kubernetes secrets #2948
Add Tests for helm chart templates #2778
Support S3 storage for Paimon format catalog #2972
Support changing the language to Chinese for Dashboard #2958

BugFixes:

After disabled self-optimizing, the table status is not updated to idle #2250
Optimizing commit failed: org.apache.iceberg.exceptions.ValidationException: Missing required files to delete #2454
Periodic snapshots expiring tasks will be scheduled redundantly #2453
When using Flink API to write data into a mixed-iceberg table, it will update the table metadata to incorrect value #2525
Optimizing not trigged when partition value is null #2542
When expiring historical data, the latest snapshot generated by amoro should not be used #2555
The process of deleting expired snapshots was interrupted by exception #2579
Loading Mixed-Iceberg format tables of internal catalog failed in Flink #2587
Failed to remove orphan files after dropping a partition field #2617
The Last Commit Time for the Paimon partitioned table in the File pages has the wrong zone #2607
If an exception occurs in externalCatalog.listTables during the exploreExternalCatalog process, exploreExternalCatalog will fail this time. #2638
Failed to view snapshot details #2614
Hive table name may be too long for AMS table identifiers #2649
Skip the deleted directories when cleaning orphan files #2619
Optimizing process stuck after a few failures #2623
Lack of database type mandatory checking, leading to NPE in existing MySQL config check #2733
Iceberg temporary snapshot tables should be excluded from data cleanups #2488
Filformat ORC does not support rolling new files for mixed_format #2560

Assets 2

21 Feb 07:39

zhoujinsong

v0.6.1

0675723

Amoro-0.6.1

Highlight:

Improve the performance of the optimizer in cases where there are a large number of delete files

Improvements:

Improve optimizer's compaction performance in large delete scenarios #2266
Store max transaction id on the snapshot level for Mixed Format Table #1720
Reduce triggering of optimizating evaluation for native iceberg format #2350
The mixed hive tables should be allowed to be dropped when the hive table has already been dropped #2286
Print GC date stamps #2416
Exclude kryo dependency from flink-optimizer #2437

BugFixes:

Fix data lost issue after optimizing for Mixed format tables #2253
After disabling self-optimizing, the table status is not updated to idle #2250
Optimizing status blocked at minor/major when restarting AMS #2279
Exception happened when using Flink DataStream API to write data into Mixed format tables #2271
Failure when creating a table and immediately retrieving Blocker #2289
Fix deleting Puffin files by mistake when cleaning orphan files #2320
There is a not daemon thread exists when the Spark job finished #2336
SimpleDateFormat throws parsing errors when using multi-threading #2324
Dashboard displays wrong partition path when the table had multiple partition spec #1863
Fix the NullPointerException issue when modifying the self-optimizing.enabled configuration #2388
TableController set partition string value to null when getting file list for a non-partitioned table #2407
Fix loading the optimizing snapshot id of ChangeStore for Mixed Format tables#2430
Batch deletion of ChangeStore files did not take effect for the Mixed Format tables #2440
Fix File format is required when scanning file entries if no file format is specified in the file name #2441
Mixed Hive tables mistakenly delete hive files during expiring snapshots #2404
Using the wrong database name when switching to another Catalog #2413
Fix Mixed Format tables expiring all the snapshots with optimized sequence #2394
Add a default value of table-filters in spark scan builder to avoid NPE when comparing the table scan #2313
Failed to get table properties when BaseStore contains property watermark.base #2295
Show create table statement return incorrect SQL with Spark #2272
AMS will repeat trigger the same minor optimizing jobs when the bucket > self-optimizing.minor.trigger.file-count #2464
Periodic snapshots expiring tasks will be scheduled redundantly #2453

Assets 11

06 Nov 11:59

shidayang

v0.6.0

2476e1d

Amoro-0.6.0

Highlight:

Kubernetes Integration
- Support running AMS in a Kubernetes environment
- Support running Optimizer in a Kubernetes environment
Partition expiration
Paimon integration
- Support viewing metadata information such as Schema, Properties, Files, Snapshots, Compactions, Operations of the Paimon table
- Support executing Spark SQL supported by Paimon in the Terminal interface
Mixed format support ORC file format
Mixed format support flink-1.16 and flink-1.17

Features:

Integration with Kubernetes #1917 
Support for expiring partitions in Iceberg tables #1758
Support integration with Apache Paimon #1269
Support ORC file format for Mixed format tables #740
Upgrading Flink version based on Mixed-format table #1983 
Enrich the storage type registry when creating a catalog. #2007

Improvements:

Support s3:// and s3a:// protocols in the default distribution. #2009
Make health status checks available without logging in #2081
Separate local-optimizer and flink-optimizer into different modules and package them independently. #2008
Support list partitions via flink catalog #1940
Add conf/env.sh for jvm options configurations #2047
Change position delete index from set to bitmap of optimizing #2078
Critical vulnerabilities in the project dependencies need to be fixed #2040
Make AMS table sync with external catalogs multithreaded #2109
Change the default value of self-optimizing.major.trigger.duplicate-ratio #2225
Log4j 1.2 is a SocketServer class that is vulnerable to deserialization of untrusted data #1548
Fix the format of thread name in DefaultTableService #2163

BugFixes:

Failed to optimize Mixed Format KeyedTable with Timestamp(without zone) column as Primary key #2091
An infinite loop occurs in merge optimization, and the number of input files is equal to the number of output files. #2090
Get incorrect partition field name when refresh tables #2165
Catalog type always null in TerminalSession #2146
ams server can not initialized on HA mode #2083
Delete sync tables when drop external catalogs #2235
Data lost after optimizing in Mixed format #2253

Assets 11

10 Oct 12:10

wangtaohz

v0.5.1

8f554b1

Amoro-0.5.1

Changelog for v0.5.1

Features:

[Feature]: Show summary details of snapshots in the table transaction tab #1827
[Subtask]: Implement displaying metrics on the front end of AMS #1771
[Feature]: Support table filters to make AMS ignore tables that are not needed #1870
[Feature]: Support MOR with Flink Engine in SQL/Table API #1422
[Feature]: Add PostgresSQL RDS as system database #1966
[Subtask]: Support creating mixed-iceberg format tables in any catalog which support iceberg. #1336

Imporvements:

[Improvement]: modify the parameter transfer method for flink optimize #1789
[Subtask]: Remove IcebergContentFile and related codes #1839
[Improvement]: Extract Spark Common Module to reuse code between different spark versions. #1449
[Improvement]: Show correct resource occupation for external optimizer #1845
[Improvement] Polish optimizing executor code #1895
[Improvement]: Remove AmsClient in UnkeyedTable and KeyedTable #1904
[Improvement]: add mysql schema init #1899
[Improvement]: Perform file filtering as early as possible when during optimizing plan process #1883
[Improvement] Add bounds of timestamp type for mixed-hive table #1920
[Improvement]: Flink Logstore supports 'group-offsets' and 'specific-offsets' Startup mode. #1931
[Improvement]: Using the parameters in TableConfiguration instead of from Table properties #1943
[Improvement]: Improve the triggering conditions for segment file rewriting #1953
[Improvement]: Helm Chart Support Overwrite JVM Optional #1962
[Subtask]: Do not cache DataFile in memory when evaluating tables. #1840
[Improvement]: Introduce a table property to set the task splitting size of self-optimizing #1935
doc: update dashboard readme #1984
[Improvement]: Flink optimizer configure managed memory explicitly in both the doc and the startup script #1907
[Improvement]: CI run check style first #1293
[Improvement]: Add basic configration of spotless #1926
[Improvement]: The version of the Amoro in the Docker startup script should be the latest by default #2010
support running ams in jdk17 environment. #1896

BugFixs:

[Bug]: the historical metadata file has not been deleted #1864
[Bug]: Segment file first binpack strategy leads to inability to optimize SplitTasks #1866
[Bug]: The water mark is not displayed properly for tables without primary keys #1603
[Bug]: Fix unit conversion for parameters -msz to spillmap configuration #1893
[Bug]: Snapshot file not found after Flink ingestion job has been running for a while #1882
[Bug]: BasicUnkeyedTable does not Override the name method #1914
[Bug]: Can not set configuration when setting UGI #1912
[Bug]: If the optimizing of a Mixed Hive table fails to alter the Hive location, the files will be moved to the old Hive location. #1898
[Bug]: MAJOR Optimizing is running repeatedly #1924
[Bug]: The optimizing task has been in a suspended state all along. #1939
[Bug]: Failed to persist optimizing process records #1869
[Bug]: Mix-iceberg format with hive metastore not working on spark #1963
[Bug]: Mix-iceberg format with hive metastore not working on spark session catalog #1964
[Bug]:When using an HDFS cluster protected by Kerberos, the fs.hdfs.impl.disable.cache=false configuration is invalid, FileSystem is unable to hit the cache #1857
[Bug]: When using s3a schema, orphan files could not be deleted. #1981
[Bug]: Using mix iceberg in spark has some docs problems #1908
[Bug]: Avoid NPE during disposing ArcticServiceContainer #1985
[Bug]: Orphan files clean mistakenly deleted the Iceberg statistic file #1959
[Bug]: If the iceberg table has a primary key of date type, in upsert mode, after Amoro merge, the primary key duplicate data can be found #1979
[Bug]: If the projected schema of the Lookup Table Source doesn't include all primary keys and requires automatic addition #1891
[Bug]: ams-optimizer FLINK job expired oom due to frequent requests for Kerberos. #1980
[Bug]: Terminal is unable to create the iceberg table, when Kerberos authentication is enabled #1996
[Bug] Trino failed to query Amoro Mixed Format Table after restarting AMS #1977
[Bug]: The startup script has a judgment error #2006
[Bug]: Spark SQL using amoro spark runtime error class cannot find ResolveProcedures #2032
[Bug]: Dashboard can't show files page with kerberos and mixed-iceberg format in external catalog #2029
[Bug]: When a time value that is too large or too small is written, it will encounter a 'long overflow' exception. #2044
[Bug]: Ams Server always executes initSqlScript when using mysql8 #2049

Assets 11

10 Aug 02:39

zhoujinsong

v0.5.0

281c391

Amoro-0.5.0

Changelog for v0.5.0

Features:

[Feature]: Support Spark 3.3 for Mixed format #261
[Feature]: Support Spark 3.2 for Mixed format #1383
[Feature]: Support MOR with Flink Engine in runtime batch mode for Mixed format #5
[Feature]: Support Iceberg on S3 or other non-hadoop storage system. #1476
[Feature]: Add self-optimizing detail in optimized tab of table #1535
[Feature]: Support to manage optimizer groups in dashboard #1537
[Feature]: Remove the Independent DeleteFiles for the Iceberg Format Table #1628
[Feature]: Support lookup join on Mixed format tables with Flink #1322
[Subtask]: Make AMS as an Iceberg Catalog provider via implement RestCatalogAPI #1339

Imporvements:

[Improvement]: AMS module refactoring in 0.5 #1372
[Improvement]: Add new task allocation rules to make the number of concurrency and read cost more balanced #1311
[Improvement]: Improve query speed for Mixed tables in Spark #1349
[Spark][Improvement]: Add scala code style check for spark module #1059
[Improvement][Spark]: Speed up commit process for Mixed Hive format tables #1350
[Subtask]: Introdue the concept of the InternalCatalog and ExternalCatalog #1544
[Improvement]: Optimize the description of Spark CreateTableLike #1340
[Improvement]: Improve the project documentation #1510

BugFixs:

[Bug]: The position-delete written by iceberg directly through the base table cannot take effect when read by Arctic #1347
[Bug]: When AMS is deployed HA, use Kerberos to access non-Kerberos zookeeper will throw an exception #1346
[Bug]: High CPU occupancy of AMS #1332
[Bug]: jacoco-maven-plugin report EOFException in Flink Module CI #1330
[Bug]:[Spark] select failed after update on keyed table for kerberos auth failed. #488
[Bug]: The column ‘_transaction_id’ and '_change_action' will be null when querying change table with spark #1385
[Bug]: Hive function can't be used when using ArcticSparkSessionCatalog under spark 3.3 #1397
[Bug]: Find duplicate records when enable rocksdb map #1751
[Bug]: when partition value contains spaces, optimize can't process correctly #1653
[Bug]: Build failed when change hadoop version to 3.2.1 #1646

Assets 11

0 Join discussion

03 Apr 08:05

baiyangtx

v0.4.1

7acd749

Arctic-0.4.1

Changelog for v0.4.1

Features:

[Feature]: Self-Optimizing scan files from metadata instead of from file info cache #1093
[Subtask][Flink]: support pulsar read and write without consistency in Flink 1.12 #1007
[Feature]: A new design of resolving data conflicts without relying on AMS to generate TransactionId #994
[Feature]: Introduce a mechanism for concurrency control between Writing and Optimizing #985
[Feature][Spark]: Support drop partition #918
[Feature][SPARK]: Support truncate table #540
[Feature][Spark]: Support Merge Into for Spark3.x #395

Imporvements:

[Improvement]: Self-Optimizing for Mixed Format tables should limit the file cnt for each Optimizing #1213
[Improvement][AMS]: Automatic retry optimizing if commit failed #1103
[Improvement]: Replace the keyword base to express the meaning of the lowest part of sth #1082
[Improvement][AMS]: Support set login user and login password in config yaml file #1081
[Improvement][AMS]:Optimize settings page and terminal page #1005
[Improvement][Flink]: Refactor Log-store Source to FLINK FLIP-27 API #969
[Improvement][Flink]: Support all logstore configuration items to be configured in table properties #933
[Improvement]: More elegant display of error messages in terminal #913

BugFixs:

[ARCTIC-1025][FLINK] Fix data duplication even if there is a primary key table with upsert enabled #1180
[Bug][Spark]: The orphan files cleanup of insert overwrite doesn't take effect for un-partitioned table. #1174
[Bug]: When reading partial fields from Logstore, the number of fields does not match #1171
[Bug]: Address the deserializing exception of the array type in the logstore #1111
[Bug][Core]: The PartitionPropertiesUpdate can't remove partition properties key #1107
[BUG]: The expire snapshot and the orphan file clean scan related files from metadata #1105
[BUG][AMS]: Automatic retry optimizing if commit failed #1103
[Bug]: Browser tab does not display Arctic's icon #1091
[Bug]: When the Metastore type is Hadoop, Terminal executes the spark sql without loading the configuration file #1090
[Bug][Spark]: The read and write authentication user are different in Spark when using mixed Iceberg format #1069
[Bug][Spark]: Insert overwrite select from view will throw exception #1066
[Bug]: When the flink jobs reads the arctic table for a while, the flink job fails #1063
[Bug]: Spark reads the timestamp field eight hours longer than the actual Flink engine writes timestamp #1062
[Bug]: KeyedTableScanTask confuse files from BaseStore and ChangeStore #1045
[Bug][Spark]: Create Table As Select should write to the base store. #1026
[Bug]: When using spark query type timestamp column failed #978
[Bug]: Flink sets watermark on Arctic table fields, but ArrayIndexOutOfBoundsException occurs when reading data #957
[Bug][Spark]: The data are written repeatedly after the Spark Executor failover #917
[Bug]: Spark refresh table error #620
[Bug]: Spark batch write failed w/ Already closed files for partition #613
[Bug][Flink]: reverse message order when retract message from message queue. #482

Assets 9

06 Dec 05:50

YesOrNo828

v0.4.0

21091dd

Arctic-0.4.0

Changelog for Arctic-0.4.0

New Features:

[Feature]: Support managing iceberg tables already existed #260
[Feature][Spark]: Standard auth utils method for Kyuubi #582
[Feature][AMS]: Show table info for native iceberg in AMS dashboard #531
[Feature][Spark]: Support duplicate-key-check when insert into and insert overwrite #484
[Feature][AMS]: Support add/modify by AMS Dashboard #412
[Feature][AMS]: Add setting page in AMS dashboard #407
[Feature]: Support table watermark to determine table freshness #394
[Feature]: AMS terminal uses Kyuubi as its backend SQL engine #262

Improvement:

[Improvement]: Hidden username and password in setting page #630
[Improvement]: Fix file upload when using derby as storage #622
[Improvement]: Expose arctic benchmark code and docker environment file #265
[Improvement]: Do some initialization when adding catalog #617
[Improvement]: Identify Hadoop and Hive version Arctic had supported #356
[Improvement]: Upgrade iceberg version to 0.13 or 0.14 #397
[Improvement]: Update the documentation in the Flink DML section #598
[Improvement][AMS]: Improve the high concurrency processing capability of the allocateTransactionId interface #393
[ARCTIC-557][FLINK] Doc catalog authorization configs #570
[Optimize] Risk of AMS memory usage and task execute timeout when processing a large number of small files #128

Bugfix:

[Bug]: Some bugs in the local test for version 0.4.0 bug #626
[Bug]: Including a line-through in the arctic HA property causes the catalog to load failed #515
[Bug][Flink]: Read duplicated data from a keyed table, even though the upsert mode is enabled. #592
[Bug]: Flink 1.15 quickstart failed for java.lang.ClassNotFoundException #563
[ARCTIC-561] When the flink job fails, the flink job will commit a wrong transaction id #568
[Bug]: For the Arctic keyed table, the out-of-order TransactionId lead to inconsistent data #479
[Bug][Flink]: Watermark may be lost if split read too fast. #486

Assets 9

20 Oct 12:38

zhoujinsong

v0.3.2-rc1

bb1fa59

Arctic-0.3.2-rc1

Changelog for Arctic-0.3.1-rc1

New Features:

Support Insert/Delete/Update in Spark. #173
Support Spark version 2.3. #18 #21

Improvement:

Unify the scan.startup.mode config values for both Logstore and Filestore. #441
Add initialization and uprade SQL scripts for every version. #362
Check data lost after optimize. #336
Check if Hive table can be upgrad before upgrading. #312
Supprt MySQL 8.0 as system database for AMS. #48

Bugfix:

Changestore's schema may be not correct after adding new columns for Hive table. #497
Reading from Filestore may failed for empty table. #475
Optimize may be blocked after commiting failed. #464
Table list may be not correct with multiple Hive catalogs. #402
Cannot create Hive tables with uppercase field names. #380
Cannot uprade Hive tables with multiple partition columns. #352
Inserting overwrite into Hive table may failed in Spark. #334

Assets 9

02 Sep 11:02

zhoujinsong

v0.3.1-rc1

375546f

Arctic-0.3.1-rc1

Changelog for Arctic-0.3.1-rc1

New Features:

Support hive table. #38
Arctic table support real-time dimension table join with Flink. #94
Support Flink version 1.15. #166
AMS support high available mode. #117

Improvement:

Support create table like with Spark. #123
AMS support list table DDL operations. #116
AMS support tables and database list filter. #115
AMS list transactions witch user commit only. #86

Bugfix:

Flink may throw java.lang.ArrayIndexOutOfBoundsException when writing into arctic table. #169
Optimizer may not work after multiple streaming update/delete commits. #183
Data may lost after optimie. #240

Assets 6

22 Jul 09:24

zhoujinsong

v0.3.0-rc1

5bdcf29

arctic-0.3.0-rc1 Pre-release

Pre-release

arctic 0.3.0 rc1

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlight:

Features:

Improvements:

BugFixes:

Highlight:

Improvements:

BugFixes:

Highlight:

Features:

Improvements:

BugFixes:

Changelog for v0.5.1

Features:

Imporvements:

BugFixs:

Changelog for v0.5.0

Features:

Imporvements:

BugFixs:

Changelog for v0.4.1

Features:

Imporvements:

BugFixs:

Changelog for Arctic-0.4.0

Changelog for Arctic-0.3.1-rc1

Changelog for Arctic-0.3.1-rc1

Releases: apache/amoro

v0.7.0-incubating

Highlight:

Features:

Improvements:

BugFixes:

Amoro-0.6.1

Highlight:

Improvements:

BugFixes:

Amoro-0.6.0

Highlight:

Features:

Improvements:

BugFixes:

Amoro-0.5.1

Changelog for v0.5.1

Features:

Imporvements:

BugFixs:

Amoro-0.5.0

Changelog for v0.5.0

Features:

Imporvements:

BugFixs:

Arctic-0.4.1

Changelog for v0.4.1

Features:

Imporvements:

BugFixs:

Arctic-0.4.0

Changelog for Arctic-0.4.0

Arctic-0.3.2-rc1

Changelog for Arctic-0.3.1-rc1

Arctic-0.3.1-rc1

Changelog for Arctic-0.3.1-rc1

arctic-0.3.0-rc1