Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Add snapshot capability to the OpenSearch cluster. #278

Closed
prudhvigodithi opened this issue Sep 3, 2022 · 7 comments
Closed
Labels

Comments

@prudhvigodithi
Copy link
Collaborator

prudhvigodithi commented Sep 3, 2022

Background:

As part of the next phase roadmap snapshot capability will be added to the operator, to create the OpenSearch cluster snapshot via the base configured yaml.

Proposal:

snapshot: 
  type: s3
  snapshot_repo: <CUSTOM_NAME>
##User required settings to connect to s3
  settings:
    bucket: my-bucket
    another_setting: setting-value
    region: us-east-1
    base_path: os-snapshot

Design

With the install custom plugins capability , its possible to now install repository-s3 (pluginsList: ["repository-s3"]), using this plugin the initial start is to add snapshot capability to the operator to store the snapshots to the s3 bucket.
Once the snasphot of type s3 is added to the yaml and with user configured settings, an API call will be invoked to the cluster.
Example
PUT "https://localhost:9200/_snapshot/my_s3_repository_1?pretty" -H 'Content-Type: application/json' -d' { "type": "s3", "settings": { "bucket": "opensearch-s3-snapshot", "region": "us-east-1", "base_path": "os-snapshot" } } '

Assumptions:

Sample AWS IAM policy to be added to the node role

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

Snapshot Management and Shared responsibilities

  • The base snapshot setup will be created and done by the operator.
  • Users can work on custom CronJob to invoke the snapshots periodically, example as PUT _snapshot/my_s3_repository_1/%3Csnapshot-%7Bnow%2Fd%7D%3E"
apiVersion: batch/v1
kind: CronJob
metadata:
  name: opensearch-snapshot-cron
spec:
  schedule: "@daily"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: opensearch-snapshot-cron
            image: centos:7
            command:
            - /bin/bash
            args:
            - -c
   ## The following can be better handled with jq to fetch the {"accepted":true}
            - 'curl -s -i -k -u admin:admin -XPUT "https://<SERVICE_NAME>:<PORT>/_snapshot/_snapshot/my_s3_repository_1/%3Csnapshot-%7Bnow%2Fd%7D%3E"'
          restartPolicy: OnFailure

Future Enhancement

@prudhvigodithi prudhvigodithi changed the title [Discussion] Add snapshot capability to the OpenSearch cluster. [Proposal] Add snapshot capability to the OpenSearch cluster. Sep 3, 2022
@max-frank
Copy link
Contributor

max-frank commented Sep 5, 2022

If I may suggest it might be worth making the proposed snapshot config accept a list. This way it would be possible to configure more than one snapshot policy, i.e., for snapshotting to different repositories (e.g., S3 + GCS) or snapshotting specific index patterns.

Also ideally the 4 repository plugins listed in the official docs should be supported (which should be easy enough given that only the repository settings would be different)

  • repository-azure
  • repository-gcs
  • repository-hdfs
  • repository-s3

@swoehrl-mw
Copy link
Collaborator

My thoughts on this:

  • It should be possible to configure multiple repositories
  • At least s3, azure and gcs repositories should be supported so we cover the big 3 cloud providers
  • Not sure if this needs to be in the first iteration but we should offer users a way to automatically get regular snapshots (basically the operator should create the cronjob for them)
  • The operator should report an error to the user if the needed repository plugin is not in the plugin list

@prudhvigodithi
Copy link
Collaborator Author

prudhvigodithi commented Sep 12, 2022

Hey @max-frank thanks, as @swoehrl-mw mentioned we can start with s3, azure and gcs repositories, but what I propose is the method to use of snapshot: type: <cloud_provider> followed by the user configured settings (used s3 as an example above).
Aslo @swoehrl-mw yes initial roll out, keeping simple I'm planning to just add the snapshot capability, also I would keep cronjob decision to the user, as there is always an _sm policy a user can configure from dashboard, the operator need not manage this overhead.https://opensearch.org/docs/latest/opensearch/snapshots/sm-api/. Also from cronjob instead of triggering an -XPUT API, its far better to do it via dashboard using snapshot-management policy, if not user can always extend to create a cron job to manage the snapshots. WDYT? @segalziv @idanl21 @dbason

@swoehrl-mw
Copy link
Collaborator

@prudhvigodithi : I suggest to change the config yaml structure a bit:

snapshot: 
  repository:
    name: <CUSTOM_NAME>
    type: s3
    settings:
      ##User required settings to connect to s3
      bucket: my-bucket
      another_setting: setting-value
      region: us-east-1
      base_path: os-snapshot

That way if we later decide to add options to configure snapshot schedules and the like we can add it as a key like snapshot.schedules.

@max-frank
Copy link
Contributor

Something to also consider in terms of backups is the new remote backend storage feature released with 2.3.

While the feature is still experimental for now it would probably a good idea to consider it during design of the configuration here since it also is based on the repositories. Just to make sure that whatever format is chosen here can also support the eventual addition of remote backend storage.

@ibotty
Copy link
Contributor

ibotty commented Oct 19, 2022

One additional thing this needs for s3-compatible blob stores is setting the endpoint, etc. This has to be done in the opensearch.yml afaict.

https://opensearch.org/docs/latest/opensearch/snapshots/snapshot-restore#register-repository

swoehrl-mw added a commit that referenced this issue Apr 12, 2023
#### Related issue:
#278

#### Key Points:

- Added logic to configure the snapshot repo settings for the OpenSearch
cluster.
- Multiple snapshot repo's can be configured at the same time.
- Added logic to run a k8s job to call OpenSearch API to add the user
configured snapshot repo settings.
- The setup expects the following prerequisites are met:
1) The related plugins (ex repository-s3) are installed using [add
plugins](https://github.com/Opster/opensearch-k8s-operator/blob/main/docs/userguide/main.md#add-plugins)
method.
2) The required roles/permissions for the backend cloud are pre-created.
3) Since add a snapshot repo configuration should be done after all the
cluster nodes are up and ready, ensure the cluster is fully healthy
before adding the snapshot settings.
- Once the Snapshot setting is added and applied, user can create the
right policies from OpenSearch Dashboard (or via snapshot management
API) to run the snapshots to the configured repos (which is done by the
operator)

#### Sample configuration:
NOTE: Add the `snapshot` setting and apply the config file, only after
the cluster is fully functional.
```
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
spec:
  security:
    tls:
       http:
         generate: true 
       transport:
         generate: true
         perNode: true
  general:
    snapshotRepositories: 
       - name: my_s3_repository_3
         type: s3
         settings:
          bucket: opensearch-s3-snapshot
          region: us-east-1
          base_path: os-snapshot_3
       - name: my_s3_repository_4
         type: s3
         settings:
          bucket: opensearch-s3-snapshot
          region: us-east-1
          base_path: os-snapshot_1
    httpPort: 9400
    serviceName: my-first-cluster
    version: 2.6.0
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    version: 2.6.0
    enable: true
    replicas: 1
  nodePools:
    - component: masters
      replicas: 3
      persistence:
        emptyDir: {}
      roles:
        - "data"
        - "cluster_manager"
```
@idanl21
Copy link
Collaborator

idanl21 commented May 10, 2023

Added on last release (v2.3.0) as BETA feature. Closing the issue, Please open new one with implementations that left for GA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants