Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Google Cloud Storage #501

Merged
merged 13 commits into from Mar 24, 2023
Merged

Support Google Cloud Storage #501

merged 13 commits into from Mar 24, 2023

Conversation

d-kuro
Copy link
Collaborator

@d-kuro d-kuro commented Feb 1, 2023

refs: #427

Add support for GCS to moco based on the proposal.

We are using a mock GCS server for testing.
refs: https://github.com/fsouza/fake-gcs-server

To verify the actual operation using GCP, we will describe the steps we performed locally below.

Steps

1. Create GCP secret

$ kubectl create secret generic example-sa-cred \
          --from-file=gcp_credentials.json=./gcp.json

2. Apply manifests

apiVersion: moco.cybozu.com/v1beta1
kind: MySQLCluster
metadata:
  namespace: default
  name: test
spec:
  replicas: 3
  backupPolicyName: daily
  podTemplate:
    spec:
      containers:
      - name: mysqld
        image: quay.io/cybozu/mysql:8.0.30
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
---
apiVersion: moco.cybozu.com/v1beta1
kind: BackupPolicy
metadata:
  name: daily
spec:
  schedule: "@daily"
  jobConfig:
    serviceAccountName: default
    env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /var/secrets/google/gcp_credentials.json
    bucketConfig:
      bucketName: moco-test-backet
      endpointURL: https://storage.googleapis.com
      backendType: gcs
    workVolume:
      emptyDir: {}
    volumeMounts:
    - name: example-sa-cred
      mountPath: /var/secrets/google
    volumes:
    - name: example-sa-cred
      secret:
        secretName: example-sa-cred

3. Create backup job

$ kubectl create job moco-backup-test --from=cronjob/moco-backup-test
job.batch/moco-backup-test created

4. Backup job logs

{"level":"info","ts":"2023-02-22T03:03:07Z","msg":"chosen source","index":1,"time":"20230222-030307","uuid":"e9730923-b25a-11ed-b451-2efa5c483f0a","binlog":"binlog.000001"}
Initializing...
Acquiring global read lock
Global read lock acquired
Initializing - done
Gathering information...
0 out of 4 schemas will be dumped and within them 0 tables, 0 views.
0 out of 11 users will be dumped.
Gathering information - done
All transactions have been started
Locking instance for backup
Global read lock has been released
Writing global DDL files
Writing users DDL
Writing schema metadata...
Writing DDL...
Writing table metadata...
Running data dump using 4 threads.
Dumping data...
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
Dumping data - done
Dump duration: 00:00:00s
Total duration: 00:00:00s
Schemas dumped: 0
Tables dumped: 0
Uncompressed data size: 0 bytes
Compressed data size: 0 bytes
Compression ratio: 0.0
Rows written: 0
Bytes written: 0 bytes
Average uncompressed throughput: 0.00 B/s
Average compressed throughput: 0.00 B/s
{"level":"info","ts":"2023-02-22T03:03:07Z","msg":"work dir usage (full dump)","bytes":24576}
{"level":"info","ts":"2023-02-22T03:03:08Z","msg":"uploaded dump file","key":"moco/default/test/20230222-030307/dump.tar","bytes":10240}
{"level":"info","ts":"2023-02-22T03:03:08Z","msg":"backup finished successfully"}

5. Check GCP console

スクリーンショット 2023-02-22 12 09 15

@d-kuro d-kuro force-pushed the d-kuro/gcs branch 5 times, most recently from f43b656 to b1e1cee Compare February 8, 2023 03:54
@d-kuro d-kuro changed the title WIP: Support Google Cloud Storage Support Google Cloud Storage Feb 22, 2023
@d-kuro d-kuro requested a review from masa213f February 22, 2023 03:45
@d-kuro d-kuro self-assigned this Feb 22, 2023
@d-kuro d-kuro marked this pull request as ready for review February 22, 2023 03:45
value: fake-gcs-server.default.svc:4443
bucketConfig:
bucketName: moco
endpointURL: http://fake-gcs-server.default.svc:4443
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpointURL is not used when gcs backend. Please remove this field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I fixed it.
bf6744d

value: fake-gcs-server.default.svc:4443
bucketConfig:
bucketName: moco
endpointURL: http://fake-gcs-server.default.svc:4443
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpointURL is not used when gcs backend. Please remove this field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed: bf6744d

bucket := b.client.Bucket(b.name)

w := bucket.Object(key).NewWriter(ctx)
w.ChunkSize = int(decidePartSize(objectSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the GCS have a limitation on the maximum number of parts per upload?
If the GCS has a different limitation from Amazon S3, we need to adjust accordingly.

The Amazon S3 has the following limitation.
So MOCO adjusts the chunk size according to backup file size. Please reffer to #318
https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html

Maximum number of parts per upload 10,000

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I've researched, I haven't found any specific limits. However, specifying the chunk size involves a trade-off with memory, and it seems that it is already well tuned by default. I think it's okay to leave the chunk size specification to the default in the client library and remove the description, but what do you think?

The Go client library uses a buffer size that's equal to the chunk size. The buffer size must be a multiple of 256 KiB (256 x 1024 bytes). Larger buffer sizes typically make uploads faster, but note that there's a tradeoff between speed and memory usage. If you're running several resumable uploads concurrently, you should set Writer.ChunkSize to a value that's smaller than 16 MiB to avoid memory bloat.

https://cloud.google.com/storage/docs/resumable-uploads#go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about chunk size.
816f1e4

Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
Signed-off-by: d-kuro <kurosawa7620@gmail.com>
@d-kuro d-kuro requested a review from masa213f March 21, 2023 09:53
Copy link
Contributor

@masa213f masa213f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@masa213f masa213f merged commit 22e575d into main Mar 24, 2023
@masa213f masa213f deleted the d-kuro/gcs branch March 24, 2023 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants