Add tool for migrating from local deep storage/Derby metadata #7598

jon-wei · 2019-05-06T06:42:53Z

This PR adds a new tool under services meant to assist with the following use case:

Users sometimes begin evaluating Druid with a simple deployment that uses local deep storage and Derby. After using the evaluation deployment for some time and ingesting some segments, they wish to move to MySQL or PostgreSQL and/or use a different deep storage, while keeping their old segments and ingestion setup.

This tool exports the contents of the following Druid tables:

segments
rules
config
datasource
supervisors

These tables are chosen since they contain non-transient (like task locks), non-historical entities (like task logs), as the tool is intended for migrations where the user shuts the entire cluster down.

The tool also allows users to specify a new S3 bucket/key, HDFS path, or new local filesystem path, and the entries from the segments table will be rewritten with new load specs. This is to assist with deep storage migration.

Currently only migration from local deep storage combined with Derby metadata is supported (the use case described above), the tool could be later expanded to handle other use cases.

surekhasaharan · 2019-05-06T18:43:25Z

docs/content/operations/quickstart-migration.md

+
+This helps users migrate segments stored in local deep storage to HDFS.
+
+`--hadoopStorageDirectory`, `h`: The HDFS path that will hold the migrated segments


-h instead of h ?

jihoonson

Thanks @jon-wei. Please consider my comments below. Also I tested the command in the doc, it emitted the below error. Would you check it please?

ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
May 06, 2019 11:55:17 AM org.hibernate.validator.internal.util.Version <clinit>
INFO: HV000001: Hibernate Validator 5.1.3.Final
Exception in thread "main" java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors:

1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2 errors
	at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:71)
	at org.apache.druid.cli.ExportMetadata.run(ExportMetadata.java:159)
	at org.apache.druid.cli.Main.main(Main.java:118)
Caused by: com.google.inject.CreationException: Unable to create injector, see the following errors:

1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2 errors
	at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:470)
	at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
	at com.google.inject.Guice.createInjector(Guice.java:99)
	at com.google.inject.Guice.createInjector(Guice.java:73)
	at com.google.inject.Guice.createInjector(Guice.java:62)
	at org.apache.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:419)
	at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:68)
	... 2 more

jihoonson · 2019-05-06T18:29:24Z

docs/content/operations/quickstart-migration.md

+To use the tool, you can run the following command:
+
+```bash
+java -classpath "lib/*:conf/druid/single-server/micro-quickstart/_common" org.apache.druid.cli.Main tools export-metadata -o /tmp/csv


Maybe adding the below would make clear where the current directory is.

$ cd ${DRUID_ROOT}

Added cd ${DRUID_ROOT}

jihoonson · 2019-05-06T18:32:36Z

docs/content/operations/quickstart-migration.md

+
+Example import commands for MySQL and PostgreSQL are shown below.
+
+These example import commands expect `/tmp/csv` and its contents to be accessible from the server. For other options, such as importing from the client filesystem, please refer to the MySQL or PostgreSQL documentation.


example -> examples?

The usage here is fine

jihoonson · 2019-05-06T18:36:41Z

docs/content/operations/quickstart-migration.md

+- rules
+- config
+- datasource
+- supervisors


It looks that migrating only these tables would be enough for now, but if we extend this tool to support other types of deep storage in the future, then it would be probably worth to include audit, tasks, and tasklogs.

agree, it would be useful to support those in the future

fjy · 2019-05-06T19:57:52Z

docs/content/operations/quickstart-migration.md

@@ -0,0 +1,169 @@
+---
+layout: doc_page
+title: "Migrating Derby Metadata and Local Deep Storage"


you can have a non-HA cluster with non local deep storage and derby metadata store

I think the metadata and deep storage migration should have separate docs

Some food for thought:
when a user goes from single server to cluster, they should first read the deep storage migration doc

when a user goes from non-HA cluster to HA-cluster, they should read the metadata store migration doc

Hm, there may be some redundancy in contents but I'll look into splitting

Redundancy is fine as long as it is clear to the user what they should do.

I made a separate page for the export-metadata tool and made separate pages for metadata/deep storage migration which reference the tool doc page

jon-wei · 2019-05-07T00:49:04Z

@jihoonson

Also I tested the command in the doc, it emitted the below error. Would you check it please?

I fixed this, I had simplified the example command too much and left out -Ddruid.extensions.loadList=[], thanks

jihoonson · 2019-05-07T04:51:49Z

docs/content/operations/export-metadata.md

+
+```bash
+cd ${DRUID_ROOT}
+java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml -Ddruid.extensions.directory="extensions" -Ddruid.extensions.loadList=[] org.apache.druid.cli.Main tools export-metadata --connectURI "jdbc:derby://localhost:1527/var/druid/metadata.db;" -o /tmp/csv


Would you please add mkdir -p /tmp/csv too? Looks like the output directory must exist before running this command.

Added that and a note about making sure the directory exists

jihoonson

+1 after CI. Thank you @jon-wei!

Add tool for migrating from local deep storage/Derby metadata

4a390d2

jon-wei added Ease of Use Area - Operations labels May 6, 2019

surekhasaharan reviewed May 6, 2019

View reviewed changes

jihoonson reviewed May 6, 2019

View reviewed changes

fjy reviewed May 6, 2019

View reviewed changes

jon-wei added 4 commits May 6, 2019 16:18

Split deep storage and metadata migration docs

536bf18

Support import into Derby

a6f5f60

Fix create tables cmd

337ee20

Fix create tables cmd

41bfcc2

jon-wei added the Area - Documentation label May 7, 2019

Fix commands

d6106cb

jihoonson reviewed May 7, 2019

View reviewed changes

jon-wei added 2 commits May 6, 2019 22:14

PR comment

ab203d4

Add -p

9cd1ded

jihoonson approved these changes May 7, 2019

View reviewed changes

jon-wei merged commit dadf6a2 into apache:master May 7, 2019

jihoonson added this to the 0.15.0 milestone May 16, 2019

jihoonson mentioned this pull request Jun 8, 2019

0.15.0-incubating release notes #7854

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tool for migrating from local deep storage/Derby metadata #7598

Add tool for migrating from local deep storage/Derby metadata #7598

jon-wei commented May 6, 2019 •

edited

Loading

surekhasaharan May 6, 2019

jon-wei May 7, 2019

jihoonson left a comment •

edited

Loading

jihoonson May 6, 2019

jon-wei May 7, 2019

jihoonson May 6, 2019

jon-wei May 7, 2019

jihoonson May 6, 2019

jon-wei May 7, 2019

fjy May 6, 2019

fjy May 6, 2019

jon-wei May 6, 2019

fjy May 7, 2019

jon-wei May 7, 2019

jon-wei commented May 7, 2019

jihoonson May 7, 2019 •

edited

Loading

jon-wei May 7, 2019

jihoonson left a comment


		This helps users migrate segments stored in local deep storage to HDFS.

		`--hadoopStorageDirectory`, `h`: The HDFS path that will hold the migrated segments


		Example import commands for MySQL and PostgreSQL are shown below.

		These example import commands expect `/tmp/csv` and its contents to be accessible from the server. For other options, such as importing from the client filesystem, please refer to the MySQL or PostgreSQL documentation.

Add tool for migrating from local deep storage/Derby metadata #7598

Add tool for migrating from local deep storage/Derby metadata #7598

Conversation

jon-wei commented May 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei commented May 7, 2019

jihoonson May 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson left a comment

Choose a reason for hiding this comment

jon-wei commented May 6, 2019 •

edited

Loading

jihoonson left a comment •

edited

Loading

jihoonson May 7, 2019 •

edited

Loading