-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tool for migrating from local deep storage/Derby metadata #7598
Conversation
|
||
This helps users migrate segments stored in local deep storage to HDFS. | ||
|
||
`--hadoopStorageDirectory`, `h`: The HDFS path that will hold the migrated segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-h
instead of h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jon-wei. Please consider my comments below. Also I tested the command in the doc, it emitted the below error. Would you check it please?
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
May 06, 2019 11:55:17 AM org.hibernate.validator.internal.util.Version <clinit>
INFO: HV000001: Hibernate Validator 5.1.3.Final
Exception in thread "main" java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors:
1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2 errors
at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:71)
at org.apache.druid.cli.ExportMetadata.run(ExportMetadata.java:159)
at org.apache.druid.cli.Main.main(Main.java:118)
Caused by: com.google.inject.CreationException: Unable to create injector, see the following errors:
1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2 errors
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:470)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
at com.google.inject.Guice.createInjector(Guice.java:99)
at com.google.inject.Guice.createInjector(Guice.java:73)
at com.google.inject.Guice.createInjector(Guice.java:62)
at org.apache.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:419)
at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:68)
... 2 more
To use the tool, you can run the following command: | ||
|
||
```bash | ||
java -classpath "lib/*:conf/druid/single-server/micro-quickstart/_common" org.apache.druid.cli.Main tools export-metadata -o /tmp/csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adding the below would make clear where the current directory is.
$ cd ${DRUID_ROOT}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added cd ${DRUID_ROOT}
|
||
Example import commands for MySQL and PostgreSQL are shown below. | ||
|
||
These example import commands expect `/tmp/csv` and its contents to be accessible from the server. For other options, such as importing from the client filesystem, please refer to the MySQL or PostgreSQL documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example -> examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage here is fine
- rules | ||
- config | ||
- datasource | ||
- supervisors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks that migrating only these tables would be enough for now, but if we extend this tool to support other types of deep storage in the future, then it would be probably worth to include audit
, tasks
, and tasklogs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, it would be useful to support those in the future
@@ -0,0 +1,169 @@ | |||
--- | |||
layout: doc_page | |||
title: "Migrating Derby Metadata and Local Deep Storage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can have a non-HA cluster with non local deep storage and derby metadata store
I think the metadata and deep storage migration should have separate docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some food for thought:
when a user goes from single server to cluster, they should first read the deep storage migration doc
when a user goes from non-HA cluster to HA-cluster, they should read the metadata store migration doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there may be some redundancy in contents but I'll look into splitting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundancy is fine as long as it is clear to the user what they should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a separate page for the export-metadata
tool and made separate pages for metadata/deep storage migration which reference the tool doc page
I fixed this, I had simplified the example command too much and left out |
|
||
```bash | ||
cd ${DRUID_ROOT} | ||
java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml -Ddruid.extensions.directory="extensions" -Ddruid.extensions.loadList=[] org.apache.druid.cli.Main tools export-metadata --connectURI "jdbc:derby://localhost:1527/var/druid/metadata.db;" -o /tmp/csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add mkdir -p /tmp/csv
too? Looks like the output directory must exist before running this command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that and a note about making sure the directory exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after CI. Thank you @jon-wei!
This PR adds a new tool under
services
meant to assist with the following use case:Users sometimes begin evaluating Druid with a simple deployment that uses local deep storage and Derby. After using the evaluation deployment for some time and ingesting some segments, they wish to move to MySQL or PostgreSQL and/or use a different deep storage, while keeping their old segments and ingestion setup.
This tool exports the contents of the following Druid tables:
These tables are chosen since they contain non-transient (like task locks), non-historical entities (like task logs), as the tool is intended for migrations where the user shuts the entire cluster down.
The tool also allows users to specify a new S3 bucket/key, HDFS path, or new local filesystem path, and the entries from the segments table will be rewritten with new load specs. This is to assist with deep storage migration.
Currently only migration from local deep storage combined with Derby metadata is supported (the use case described above), the tool could be later expanded to handle other use cases.