Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool for migrating from local deep storage/Derby metadata #7598

Merged
merged 8 commits into from
May 7, 2019

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented May 6, 2019

This PR adds a new tool under services meant to assist with the following use case:

Users sometimes begin evaluating Druid with a simple deployment that uses local deep storage and Derby. After using the evaluation deployment for some time and ingesting some segments, they wish to move to MySQL or PostgreSQL and/or use a different deep storage, while keeping their old segments and ingestion setup.

This tool exports the contents of the following Druid tables:

  • segments
  • rules
  • config
  • datasource
  • supervisors

These tables are chosen since they contain non-transient (like task locks), non-historical entities (like task logs), as the tool is intended for migrations where the user shuts the entire cluster down.

The tool also allows users to specify a new S3 bucket/key, HDFS path, or new local filesystem path, and the entries from the segments table will be rewritten with new load specs. This is to assist with deep storage migration.

Currently only migration from local deep storage combined with Derby metadata is supported (the use case described above), the tool could be later expanded to handle other use cases.


This helps users migrate segments stored in local deep storage to HDFS.

`--hadoopStorageDirectory`, `h`: The HDFS path that will hold the migrated segments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-h instead of h ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jon-wei. Please consider my comments below. Also I tested the command in the doc, it emitted the below error. Would you check it please?

ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
May 06, 2019 11:55:17 AM org.hibernate.validator.internal.util.Version <clinit>
INFO: HV000001: Hibernate Validator 5.1.3.Final
Exception in thread "main" java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors:

1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2 errors
	at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:71)
	at org.apache.druid.cli.ExportMetadata.run(ExportMetadata.java:159)
	at org.apache.druid.cli.Main.main(Main.java:118)
Caused by: com.google.inject.CreationException: Unable to create injector, see the following errors:

1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)

2 errors
	at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:470)
	at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
	at com.google.inject.Guice.createInjector(Guice.java:99)
	at com.google.inject.Guice.createInjector(Guice.java:73)
	at com.google.inject.Guice.createInjector(Guice.java:62)
	at org.apache.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:419)
	at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:68)
	... 2 more

To use the tool, you can run the following command:

```bash
java -classpath "lib/*:conf/druid/single-server/micro-quickstart/_common" org.apache.druid.cli.Main tools export-metadata -o /tmp/csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding the below would make clear where the current directory is.

$ cd ${DRUID_ROOT}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cd ${DRUID_ROOT}


Example import commands for MySQL and PostgreSQL are shown below.

These example import commands expect `/tmp/csv` and its contents to be accessible from the server. For other options, such as importing from the client filesystem, please refer to the MySQL or PostgreSQL documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example -> examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage here is fine

- rules
- config
- datasource
- supervisors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that migrating only these tables would be enough for now, but if we extend this tool to support other types of deep storage in the future, then it would be probably worth to include audit, tasks, and tasklogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, it would be useful to support those in the future

@@ -0,0 +1,169 @@
---
layout: doc_page
title: "Migrating Derby Metadata and Local Deep Storage"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have a non-HA cluster with non local deep storage and derby metadata store

I think the metadata and deep storage migration should have separate docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some food for thought:
when a user goes from single server to cluster, they should first read the deep storage migration doc

when a user goes from non-HA cluster to HA-cluster, they should read the metadata store migration doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there may be some redundancy in contents but I'll look into splitting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundancy is fine as long as it is clear to the user what they should do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a separate page for the export-metadata tool and made separate pages for metadata/deep storage migration which reference the tool doc page

@jon-wei
Copy link
Contributor Author

jon-wei commented May 7, 2019

@jihoonson

Also I tested the command in the doc, it emitted the below error. Would you check it please?

I fixed this, I had simplified the example command too much and left out -Ddruid.extensions.loadList=[], thanks


```bash
cd ${DRUID_ROOT}
java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml -Ddruid.extensions.directory="extensions" -Ddruid.extensions.loadList=[] org.apache.druid.cli.Main tools export-metadata --connectURI "jdbc:derby://localhost:1527/var/druid/metadata.db;" -o /tmp/csv
Copy link
Contributor

@jihoonson jihoonson May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add mkdir -p /tmp/csv too? Looks like the output directory must exist before running this command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that and a note about making sure the directory exists

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after CI. Thank you @jon-wei!

@jon-wei jon-wei merged commit dadf6a2 into apache:master May 7, 2019
@jihoonson jihoonson added this to the 0.15.0 milestone May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants