Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28121: Use direct SQL for transactional altering table parameter #5129

Closed
wants to merge 3 commits into from

Conversation

lirui-apache
Copy link
Contributor

What changes were proposed in this pull request?

Use direct SQL for transactional update table parameter and check the number of affected rows to detect concurrent writes.

Why are the changes needed?

Maintain consistency in case of concurrent writes.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Covered by existing tests

@lirui-apache
Copy link
Contributor Author

@pvary Could you please have a look? I'm unable to reproduce the failed test, and I suppose the no-lock feature is disabled by default.

@Override
public int updateParameterWithExpectedValue(Table table, String key, String expectedValue, String newValue)
throws MetaException {
String dml = String.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do something like this:
https://www.datanucleus.org/products/accessplatform_6_0/jdo/query.html#jdoql_bulkupdate

I would like to avoid:

  • Writing strings to the queries based on table parameters
  • Using native connection

If using jdo queries doesn't work, then I would create a method in MetaStoreDirectSql for this, and throw an exception if directSql is not turned on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting datanucleus.query.jdoql.allowAll=true allows us to run UPDATE with JDO query. I tried something like this but it doesn't work.

    Map<String, String> newParams = new HashMap<>(table.getParameters());
    newParams.put(key, newValue);
    openTransaction();
    Query query = pm.newQuery("UPDATE org.apache.hadoop.hive.metastore.model.MTable " +
        "SET parameters=:newparams WHERE database.name == :dbname && tableName == :tblname && " +
        "parameters.containsEntry(:key, :expval)");
    int affectedRows = (int) query.executeWithMap(ImmutableMap.of(
        "newparams", newParams,
        "dbname", table.getDbName(),
        "tblname", table.getTableName(),
        "key", key,
        "expval", expectedValue
    ));

The error is

Caused by: org.datanucleus.store.rdbms.sql.expression.IllegalExpressionOperationException: Cannot perform operation "==" on org.datanucleus.store.rdbms.sql.expression.MapExpression@6f9c5048 and org.datanucleus.store.rdbms.sql.expression.MapLiteral@5114b7c7
	at org.datanucleus.store.rdbms.sql.expression.SQLExpression.eq(SQLExpression.java:381)
	at org.datanucleus.store.rdbms.sql.expression.MapExpression.eq(MapExpression.java:80)
	at org.datanucleus.store.rdbms.query.QueryToSQLMapper.compileUpdate(QueryToSQLMapper.java:1134)

I believe it's because MapExpression only supports eq with null literals in DN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's sad 😢

Then we need to fall back to directsql.
Something like this:

   String queryText =  "UPDATE \"TABLE_PARAMS\" SET \"PARAM_VALUE\" = ? " +
            "WHERE \"TBL_ID\" = ? AND \"PARAM_KEY\" = ? AND \"PARAM_VALUE\" = ?";
    try (QueryWrapper query = new QueryWrapper(pm.newQuery("javax.jdo.query.SQL", queryText))) {
       Object[] params = new Object[4];
       params[0] = ...;
       long res = query.executeWithArray(params);
...
    }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that ok, if i change code as below?
String queryText = "UPDATE \"TABLE_PARAMS\" SET \"PARAM_VALUE\" = ? " + "WHERE \"TBL_ID\" in (select TBL_ID from TBLS join DBS ON DBS.DB_ID=TBLS.DB_ID WHERE NAME=? and TBL_NAME=? ) AND \"PARAM_KEY\" = ? AND \"PARAM_VALUE\" = ?"; try (QueryWrapper query = new QueryWrapper(pm.newQuery("javax.jdo.query.SQL", queryText))) { Object[] params = new Object[4]; params[0] = ...; long res = query.executeWithArray(params); ... }
cause i don't cherry pick HIVE-22234, can't get table id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for old versions where Table object doesn't have the ID, I think we need to retrieve it from TBLS. I'm planning to do this in PRs for the release branches.

Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Let's see if there are any further comments

@Override
public int updateParameterWithExpectedValue(Table table, String key, String expectedValue, String newValue)
throws MetaException {
String dml = String.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's sad 😢

Then we need to fall back to directsql.
Something like this:

   String queryText =  "UPDATE \"TABLE_PARAMS\" SET \"PARAM_VALUE\" = ? " +
            "WHERE \"TBL_ID\" = ? AND \"PARAM_KEY\" = ? AND \"PARAM_VALUE\" = ?";
    try (QueryWrapper query = new QueryWrapper(pm.newQuery("javax.jdo.query.SQL", queryText))) {
       Object[] params = new Object[4];
       params[0] = ...;
       long res = query.executeWithArray(params);
...
    }

@lirui-apache
Copy link
Contributor Author

@pvary Thanks for the reviews. One thing to mention is this change adds a new exception message for commit conflict: The table has been modified. Updating expected key %s affects %d rows. So Iceberg needs to check for this message during commit. Let me know if you think that's not good.

And btw the change has been verified with Postgres, MySQL and MS SQL Server. I didn't manage to get a working Oracle instance.

@pvary
Copy link
Contributor

pvary commented Mar 30, 2024

@pvary Thanks for the reviews. One thing to mention is this change adds a new exception message for commit conflict: The table has been modified. Updating expected key %s affects %d rows. So Iceberg needs to check for this message during commit. Let me know if you think that's not good.

And btw the change has been verified with Postgres, MySQL and MS SQL Server. I didn't manage to get a working Oracle instance.

That’s a good point. If possible we should keep the old message, so we do not create even more confusion of versions...

The check there is like this:

        if (e.getMessage()
            .contains(
                "The table has been modified. The parameter value for key '"
                    + HiveTableOperations.METADATA_LOCATION_PROP
                    + "' is")) {
          throw new CommitFailedException(
              e, "The table %s.%s has been modified concurrently", database, tableName);
        }

Either we can issue a query to check the new value, or change it to "the parameter value for key ... is different"

@lirui-apache
Copy link
Contributor Author

@pvary I have made the exception messages consistent. Let me know if you have any further comments.

@pvary
Copy link
Contributor

pvary commented Apr 4, 2024

@lirui-apache: we need to fix the ci errors

@deniskuzZ
Copy link
Member

unrelated test failure, restarted the build

Copy link

sonarcloud bot commented Apr 10, 2024

Quality Gate Passed Quality Gate passed

Issues
8 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@pvary
Copy link
Contributor

pvary commented Apr 10, 2024

Tried to merge using the GitHub UI.
The merge stuck on the UI, but succeeded based on the current head: 7378962

On the UI the PR is still not merged.

@deniskuzZ: Could you please take a look at, if you have some time, to confirm that everything is OK?
Thanks,
Peter

@lirui-apache
Copy link
Contributor Author

Thanks @pvary for reviewing and merging the PR. I'm closing it manually.

@pvary
Copy link
Contributor

pvary commented Apr 11, 2024

@lirui-apache: Thanks for the report and the fix @lirui-apache!

@deniskuzZ: Which branches are likely to get new releases?

If we want to fix all of the places where the previous PR was released, we might want to add this to:

  • branch-4.0
  • branch-3
  • branch-3.1
  • branch-2
  • branch-2.3

@deniskuzZ
Copy link
Member

deniskuzZ commented Apr 11, 2024

hey @pvary
fix is in, looks good! thanks @lirui-apache for the patch

git log --grep "HIVE-28121"

we are planning to release 4.0.1 in a month, not sure about other branches. let me add the proper label to the ticket

@lirui-apache
Copy link
Contributor Author

@lirui-apache: Thanks for the report and the fix @lirui-apache!

@deniskuzZ: Which branches are likely to get new releases?

If we want to fix all of the places where the previous PR was released, we might want to add this to:

  • branch-4.0
  • branch-3
  • branch-3.1
  • branch-2
  • branch-2.3

The fix versions of HIVE-26882 are 2.3.10 and 4.0.0-beta-1. Does that mean we don't need this in 3.x?

@deniskuzZ
Copy link
Member

HIVE-26882 was merged to 3.x and 3.1, so whoever maintains these branches has to cherry-pick

@lirui-apache
Copy link
Contributor Author

@deniskuzZ OK, then I'll create PRs for these branches.

@chenwyi2
Copy link

when i use this patch, the error is :
Caused by: java.lang.RuntimeException: MetaException(message:You have specified an SQL statement ("UPDATE "TABLE_PARAMS" SET "PARAM_VALUE" = ? WHERE "TBL_ID" in (select TBL_ID from TBLS join DBS ON DBS.DB_ID=TBLS.DB_ID WHERE NAME=? and TBL_NAME=? ) AND "PARAM_KEY" = ? AND "PARAM_VALUE" = ?") that doesnt start with SELECT. This is invalid.)
at org.apache.iceberg.relocated.com.google.common.base.Throwables.propagate(Throwables.java:249)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:75)
at org.apache.iceberg.hive.MetastoreUtil.alterTable(MetastoreUtil.java:75)
at org.apache.iceberg.hive.HiveTableOperations.lambda$persistTable$2(HiveTableOperations.java:329)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)

and i use mysql, my code is

    String queryText = "UPDATE \"TABLE_PARAMS\" SET \"PARAM_VALUE\" = ? " + "WHERE \"TBL_ID\" in " +
            "(select TBL_ID from TBLS join DBS ON DBS.DB_ID=TBLS.DB_ID WHERE NAME=? and TBL_NAME=? ) " +
            "AND \"PARAM_KEY\" = ? AND \"PARAM_VALUE\" = ?";
    List<String> pms = new ArrayList<>();
    pms.add(newValue);
    pms.add(table.getDbName());
    pms.add(table.getTableName());
    pms.add(key);
    pms.add(expectedValue);

    Query queryParams = pm.newQuery("javax.jdo.query.SQL", queryText);
    return (long) executeWithArray(queryParams, pms.toArray(), queryText);

@pvary
Copy link
Contributor

pvary commented Apr 19, 2024

@chenwyi2: Could you please share the exception from the metastore side, and your metastore version?

Thanks, Peter

@lirui-apache
Copy link
Contributor Author

@chenwyi2 Could you check whether datanucleus.query.sql.allowAll=true is set on HMS?

@chenwyi2
Copy link

@chenwyi2: Could you please share the exception from the metastore side, and your metastore version?

Thanks, Peter

2024-04-19 12:03:06,849 ERROR [pool-9-thread-1]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:JDOQL Single-String query should always start with SELECT)
at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.handleDirectSqlError(ObjectStore.java:3763)
at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3717)
at org.apache.hadoop.hive.metastore.ObjectStore.updateParameterWithExpectedValue(ObjectStore.java:775)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy24.updateParameterWithExpectedValue(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:171)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:5262)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:5227)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy25.alter_table_with_environment_context(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:15448)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:15432)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

and i use 3.1.2 in my metastore

@chenwyi2
Copy link

sorry it's my fault, i doesn't add parameter in metastore cont just like 7bca1b3, and now it is fine

@pan3793
Copy link
Member

pan3793 commented Apr 20, 2024

should this go branch-2.3 too? @pvary @sunchao

@pvary
Copy link
Contributor

pvary commented Apr 20, 2024

@pan3793: It takes some time to do all the backports and tests. If you create the PR, I would be happy to review

@sunchao
Copy link
Member

sunchao commented Apr 20, 2024

@pan3793 @pvary I already created a git tag for 2.3.10 RC0: https://github.com/apache/hive/releases/tag/release-2.3.10-rc0. Let me know how important it is to backport this PR. I can wait if you decide to do so.

@pan3793
Copy link
Member

pan3793 commented Apr 20, 2024

to @pvary I opened #5204 to backport it to branch-2.3
to @lirui-apache I don't have the test suite to verify this patch, could you please help to verify it?
to @sunchao from the context, I think this is kind of a correctness issue. I think we can have an RC0 for Spark integration testing, and create RC1 if the Hive community decides to accept this backport.

@sunchao
Copy link
Member

sunchao commented Apr 20, 2024

OK, sounds good. I just created 2.3.10 RC0 and started a vote thread. Please checkout the candidate and see if there is any issue in Spark integration.

cc @dongjoon-hyun too.

@dongjoon-hyun
Copy link
Member

Thank you, @sunchao and all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants