Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] MultiWriter w/ DynamoDB - Unable to acquire lock, lock object null #4456

Closed
nochimow opened this issue Dec 27, 2021 · 17 comments
Closed
Assignees
Labels
aws-support priority:major degraded perf; unable to move forward; potential bugs

Comments

@nochimow
Copy link

Hello,

I'm currently trying the multiwriter feature using the dynamoDB lock.
I followed all the steps documented on the https://hudi.apache.org/docs/concurrency_control/ and also the hoodie.write.lock.dynamodb.billing_mode=PAY_PER_REQUEST config, thanks to @bhasudha advice on slack.

After that i ended with the following error: org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock object null

This error happens when trying to write in a existing Hudi table.
Since there is no details on how the DynamoDB table must me created on the documentation, i created one simple DynamoDB with a String field like partition.
On slack, there is also other users with the same problem, also AWS Glue users

Environment Description

  • Hudi version : 0.10
  • Spark version : AWS Glue 2.0
  • Storage (HDFS/S3/GCS..) : S3

Stacktrace

Caused by: org.apache.hudi.exception.HoodieException: Unable to acquire lock, lock object null
at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:86)
at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.commit(HoodieDataSourceInternalBatchWrite.java:93)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:371)
... 69 more
Caused by: org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock object null
at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:82)
at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:64)
at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:171)
at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:83)
... 71 more

@nsivabalan
Copy link
Contributor

nsivabalan commented Dec 27, 2021

@zhedoubushishi : Can you take a look at this issue please? Feel free to create a jira and work on adding more documentation around dynamoDB locks. Or we can also think about writing a blog that covers end to end.

@nsivabalan nsivabalan added this to Awaiting Triage in GI Tracker Board via automation Dec 27, 2021
@zhedoubushishi
Copy link
Contributor

Can you provide the code you used? And how you create the Dynamodb table?

@nsivabalan
Copy link
Contributor

@nochimow : a gentle reminder to respond to above question. above commentor is a Hoodie committer who added dynamoDB lock provider. So, he should be able to help in your case.

@nochimow
Copy link
Author

Hi there, my code basically reads some avro file into a dataframe then we write this dataframe into a hudi table.
I'm using the following hudi confs during the write. (It's a python on AWS Glue 3.0)

oodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
oodie.datasource.write.payload.class": "org.apache.hudi.common.model.DefaultHoodieRecordPayload",
hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
hoodie.table.name": table_name,
hoodie.datasource.write.recordkey.field": IDX_COL,
hoodie.datasource.write.partitionpath.field": pks,
hoodie.datasource.write.hive_style_partitioning": "true",
hoodie.datasource.write.precombine.field": tiebreaker,
hoodie.datasource.write.operation": operation,
hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
hoodie.write.lock.dynamodb.table
hoodie.write.lock.dynamodb.partition_key
hoodie.write.lock.dynamodb.region
hoodie.write.lock.dynamodb.billing_mode=PAY_PER_REQUEST

My dynamodb is a simple table with just the partition_key field as a string. There is any recommendation on how the dynamodb structure have to be?

@xushiyan xushiyan added aws-support priority:major degraded perf; unable to move forward; potential bugs labels Jan 18, 2022
@nsivabalan
Copy link
Contributor

@zhedoubushishi : When you get a chance, can you please follow up.

@zhedoubushishi
Copy link
Contributor

Sorry for the late reply, if you set hoodie.write.lock.dynamodb.table as a table that not exist, it would just create one and set the structure automatically. Or do you have to create the dynamodb lock table by yourself?

@mainamit
Copy link

I am facing same issue, created a dynamo DB table with a single column as per below config where dynamo DB table and partition key is being passed when writing the data.
So are you suggesting just passing the table name and it will be created and we don't have to specify partition key

#'hoodie.write.concurrency.mode' : 'optimistic_concurrency_control',
#'hoodie.cleaner.policy.failed.writes' : 'LAZY',
##'hoodie.write.lock.provider' : 'org.apache.hudi.hive.HiveMetastoreBasedLockProvider',
#'hoodie.write.lock.provider' : 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
    #'hoodie.write.lock.hivemetastore.database' : 'test_db',
    #'hoodie.write.lock.hivemetastore.table' : new_table_name,
    #'hoodie.write.lock.dynamodb.table' : 'dynamo_db_table',
    #'hoodie.write.lock.dynamodb.partition_key' : 'tablename',
    #'hoodie.write.lock.dynamodb.region' : 'eu-central-1',
    #'hoodie.write.lock.dynamodb.billing_mode' : 'PAY_PER_REQUEST',

@nsivabalan
Copy link
Contributor

@zhedoubushishi : Can you follow up here please and help unblock the user.

@zhedoubushishi
Copy link
Contributor

zhedoubushishi commented Feb 16, 2022

I couldn't reproduce this issue, this is the config I used:

.option("hoodie.write.concurrency.mode", "optimistic_concurrency_control")
 .option("hoodie.cleaner.policy.failed.writes", "LAZY")
 .option("hoodie.write.lock.provider", "org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider")
 .option("hoodie.write.lock.dynamodb.table", "hudi")
 .option("hoodie.write.lock.dynamodb.partition_key", tableName)
.option("hoodie.write.lock.dynamodb.endpoint_url", "dynamodb.us-west-2.amazonaws.com")
 .option("hoodie.write.lock.dynamodb.region", "us-west-2")

I didn't create the dynamoDB table in advance.

@nochimow
Copy link
Author

Since i had to rollback to Hudi 0.9 due the Redshift Spectrum incompatibility i cannot track this issue anymore. @mainamit can you follow up this issue since you also face the same issue?

@nsivabalan
Copy link
Contributor

@mainamit : let us know if you were able to get it working. Feel free to close out the github issue. If you are still facing the issue, do ping w/ more details. Wenning should be able to assist you.

@nsivabalan
Copy link
Contributor

@mainamit : do you have any updates for us.

@mainamit
Copy link

@nsivabalan I am also using 0.9 now and done a workaround of loading this data in sequential. If you have a working example of this it would be great, but I cannot test with 0.10 due to constraints at my end.

@nsivabalan
Copy link
Contributor

nsivabalan commented Mar 8, 2022

@mainamit : only difference I see between yours and what @zhedoubushishi have provided is .option("hoodie.write.lock.dynamodb.endpoint_url", "dynamodb.us-west-2.amazonaws.com"). Can you try to set a right value for endpoint_url and give it a try please.

@nsivabalan
Copy link
Contributor

thanks! closing this for now. please reach out if you are looking for any more assistance.

@atharvai
Copy link

I'm facing this exception regularly but at varing time periods. This is with Hudi v0.11, EMR 6.6 spark 3.2.0 here's a link to details in Hudi slack https://apache-hudi.slack.com/archives/C4P8Y739U/p1658319412980229?thread_ts=1658319412.980229&cid=C4P8Y739U

@koochiswathiTR
Copy link

Hello @nsivabalan @zhedoubushishi ,

I am facing same exception[Unable to acquire lock, lock object null]
We don't need to create a dynamodb table ahead ? will hudi create a DynamoDB automatically?
Pls confirm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-support priority:major degraded perf; unable to move forward; potential bugs
Projects
Development

No branches or pull requests

7 participants