Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9889] Populate local instance of InfluxDB with data #12167

Merged
merged 2 commits into from Jul 15, 2020

Conversation

kamilwu
Copy link
Contributor

@kamilwu kamilwu commented Jul 2, 2020

The goal of these changes is to make the process of creating and modifying Grafana dashboards easier by populating InfluxDB database with real data.

Two workflows have been implemented:

  • a cronjob that makes a full backup of InfluxDB and sends it to a public GCS bucket. The bucket has object versioning[1] enabled. It also has a lifecycle rule[2] that keeps only the 14 most recent versions of the backup.
  • a docker-compose service that starts just after InfluxDB, downloads the latest version of the backup and restore its content.

[1] https://cloud.google.com/storage/docs/object-versioning
[2] https://cloud.google.com/storage/docs/lifecycle


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2020

R: @Ardagan Will you find a moment to make a code review?

cc: @iemejia

Copy link
Contributor

@Ardagan Ardagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition that can make for local development easier.


WORKDIR /

RUN gsutil cp gs://apache-beam-testing-metrics/influxdb-backup.tar.gz . && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these backups available to non-committers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everyone on the Internet has read-only access to the bucket. Grafana dashboards are already public, so it doesn't change anything in terms of security or privacy

- mountPath: /backup
name: shared-data
- name: copy-to-gsc-bucket
image: gcr.io/apache-beam-testing/gsutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can understand, we overwrite same backup each time. Given daily creation of backups and noone is paged when something goes wrong, this backup can be easily corrupted as well.

Though it is ok for current PR, it's best to add some rotation to backups, keeping 1-2 week old backups available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we overwrite the same backup each time, but the bucket has object versioning enabled. According to the doc:

You enable Object Versioning for a bucket. Once enabled:

Cloud Storage creates a noncurrent version of an object each time you perform an overwrite or delete of the live version

If you run gsutil ls -la gs://apache-beam-testing-metrics, you will see all versions of the backup that were created so far.

@Ardagan
Copy link
Contributor

Ardagan commented Jul 6, 2020

I'm not well familiar with influx DB operations. It might be worth another pair of eyes taking a look at overall backup approach.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 8, 2020

It might be worth another pair of eyes taking a look at overall backup approach.

R: @iemejia, I think you were interested in this change. Could you verify if everything is fine?

@Ardagan
Copy link
Contributor

Ardagan commented Jul 14, 2020

I guess we can merge this since there're no more comments for quite some time.

@kamilwu kamilwu force-pushed the copying-influxdb-backups-to-gcs branch from 73dbfaf to 6bd4aa5 Compare July 15, 2020 09:09
@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 15, 2020

Yeah, let's merge it then. Thanks @Ardagan

@kamilwu kamilwu merged commit 9c6ceb8 into apache:master Jul 15, 2020
@kamilwu kamilwu deleted the copying-influxdb-backups-to-gcs branch July 15, 2020 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants