Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions docs/en/solutions/How_to_migrate_harbor_registry_pvc_storage_to_s3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
---
products:
- Alauda DevOps
kind:
- Solution
---

# Harbor Registry Storage Migration: PVC to S3

## Issue

This guide provides step-by-step instructions for migrating Harbor registry data from PVC (Persistent Volume Claim) storage to S3-compatible storage. This migration helps improve scalability and reduces storage management overhead.

## Environment

This solution is compatible with Alauda Build of Harbor v2.12.z.

## Resolution

### Prerequisites
Comment thread
kycheng marked this conversation as resolved.

Before starting the migration, ensure you have:

- **Important**: A fully deployed Harbor instance with `read-only mode` enabled. To enable read-only mode, Navigate to Harbor web `Administration → Configuration → System Settings → Repository Read Only`.
- **Important**: Since Harbor needs to be set to read-only mode during migration, it's recommended to simulate this process in a test environment first, evaluate the migration time, and allocate sufficient maintenance window.
- An S3-compatible storage service (MinIO, Ceph, AWS S3, etc.) with appropriate access credentials.
- A pre-created S3 bucket for storing Harbor registry data.
- Download and sync the rclone migration tool image to your internal registry for use in subsequent steps:

```txt
# Download URL for China Region
https://cloud.alauda.cn/attachments/knowledge/337969938/rclone-amd64.tgz
https://cloud.alauda.cn/attachments/knowledge/337969938/rclone-arm64.tgz

# Download URLs for Other Regions
https://cloud.alauda.io/attachments/knowledge/337969938/rclone-amd64.tgz
https://cloud.alauda.io/attachments/knowledge/337969938/rclone-arm64.tgz
```

### S3 Region Configuration

#### How to Determine the Correct Region

Please refer to your S3 provider's official documentation to determine the correct region for your specific service. Most providers will have this information available in their console, dashboard, or documentation.

### Migration Process

#### Migrate Registry Data to S3

This section describes how to migrate existing Harbor registry data from PVC to S3 storage using rclone. The migration process includes:

1. **Data Synchronization**: Copy all registry data from PVC to S3
2. **Data Verification**: Verify the integrity of migrated data

Comment thread
kycheng marked this conversation as resolved.
Execute the following script to perform the migration:

Comment thread
kycheng marked this conversation as resolved.
```bash
export S3_HOST=http://xxxxx:xxx # S3 storage endpoint
export S3_PROVIDER=Minio # Configure based on S3 type. Supported providers: Minio, Ceph, AWS, etc. Refer to: https://rclone.org/docs/#configure
export S3_KEY_ID=xxxx
export S3_ACCESS_KEY=xxxxx
export S3_BUCKET=harbor # Create this bucket in S3 beforehand
export S3_REGION=us-east-1 # If S3 doesn't have regions, this is not needed. If it exists, configure it and add region = $S3_REGION in the config below
export SYNC_IMAGE=rclone/rclone:1.71.0 # Replace with your internal registry image
Comment thread
kycheng marked this conversation as resolved.
export HARBOR_REGISTRY_PVC=xxxxx
export HARBOR_NS=xxxxx

Comment thread
kycheng marked this conversation as resolved.
cat>sync-and-check-s3.yaml<<EOF
apiVersion: v1
data:
rclone.conf: |-
[harbor-s3]
type = s3
provider = $S3_PROVIDER
env_auth = false
access_key_id = $S3_KEY_ID
secret_access_key = $S3_ACCESS_KEY
endpoint = $S3_HOST
acl = private
# Add region configuration if your S3 service requires it
# region = $S3_REGION
kind: ConfigMap
Comment on lines +71 to +82
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not embed S3 credentials in a ConfigMap; switch to a Secret and env_auth.

Leaking access keys into a world-readable ConfigMap is a security risk. Move creds to a Secret, enable env_auth, and remove keys from rclone.conf.

Apply:

 apiVersion: v1
 data:
   rclone.conf: |-
     [harbor-s3]
     type = s3
     provider = $S3_PROVIDER
-    env_auth = false
-    access_key_id = $S3_KEY_ID
-    secret_access_key = $S3_ACCESS_KEY
+    env_auth = true
     endpoint = $S3_HOST
     acl = private
+    # Strongly recommended for MinIO/Ceph:
+    force_path_style = true
     # Add region configuration if your S3 service requires it
     # region = $S3_REGION
 kind: ConfigMap

Insert this Secret before the ConfigMap:

apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  namespace: $HARBOR_NS
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: $S3_KEY_ID
  AWS_SECRET_ACCESS_KEY: $S3_ACCESS_KEY
---

And inject env into both containers (see line 115 and 137 blocks):

         - image: $SYNC_IMAGE
           name: sync-data
           args:
             - sync
             - /data
             - harbor-s3:$S3_BUCKET
             - --progress
+          envFrom:
+            - secretRef:
+                name: s3-credentials
@@
         - image: $SYNC_IMAGE
           name: check-sync
           args:
             - check
             - /data
             - harbor-s3:$S3_BUCKET
             - --one-way
             - --progress
+          envFrom:
+            - secretRef:
+                name: s3-credentials
🤖 Prompt for AI Agents
In docs/en/solutions/How_to_migrate_harbor_registry_pvc_storage_to_s3.md around
lines 86 to 97, the rclone.conf sample embeds S3 credentials in a ConfigMap;
replace that by creating a Kubernetes Secret (Opaque, stringData keys
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) inserted before the ConfigMap,
remove access_key_id and secret_access_key lines from rclone.conf, set env_auth
= true in rclone.conf, and update the pod/container specs referenced at lines
~115 and ~137 to inject the Secret as environment variables (AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY) instead of embedding credentials in the ConfigMap.

metadata:
name: s3-config
namespace: $HARBOR_NS
---
apiVersion: batch/v1
kind: Job
metadata:
name: sync-and-check-s3
namespace: $HARBOR_NS
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
automountServiceAccountToken: false
initContainers:
# Step 1: Sync data to S3
- image: $SYNC_IMAGE
imagePullPolicy: IfNotPresent
name: sync-data
args:
- sync
- /data
- harbor-s3:$S3_BUCKET
- --progress
Comment on lines +104 to +107
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Prevent accidental deletions when destination bucket is non-empty.

rclone sync will delete extraneous objects on the destination. Use a new/empty bucket, run a dry-run, or prefer copy to be safe.

-            - sync
+            # Prefer 'copy' if the bucket may contain other data; or ensure it's empty.
+            - copy
             - /data
             - harbor-s3:$S3_BUCKET
             - --progress
+            # Add a dry-run first if unsure:
+            # - --dry-run

Also add to the process description:

 1. **Data Synchronization**: Copy all registry data from PVC to S3
 2. **Data Verification**: Verify the integrity of migrated data
+> Important: If the target bucket is not empty, prefer rclone copy or run sync with --dry-run first to avoid unintended deletions.

Also applies to: 65-69

🤖 Prompt for AI Agents
In docs/en/solutions/How_to_migrate_harbor_registry_pvc_storage_to_s3.md around
lines 119-122 (also applies to lines 65-69), the rclone example uses "rclone
sync" which will delete extraneous objects in the destination bucket; update the
docs to warn readers to avoid accidental deletions by recommending either using
a new/empty S3 bucket, running rclone with --dry-run first, or using "rclone
copy" instead of "sync", and add a short process step that explicitly instructs
these precautions before running the sync.

resources:
limits:
cpu: 4
memory: 4Gi
requests:
cpu: 1
memory: 1Gi
Comment thread
kycheng marked this conversation as resolved.
volumeMounts:
- mountPath: /root/.config/rclone/
name: rclone-config
- mountPath: /data
name: data
containers:
Comment thread
kycheng marked this conversation as resolved.
Comment on lines +116 to +120
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid relying on root HOME for rclone config; set an explicit config path.

Mount to a neutral path and set RCLONE_CONFIG so older rclone builds work even if --config isn’t honored.

Apply to both containers:

-          volumeMounts:
-            - mountPath: /root/.config/rclone/
-              name: rclone-config
+          env:
+            - name: RCLONE_CONFIG
+              value: /etc/rclone/rclone.conf
+          volumeMounts:
+            - mountPath: /etc/rclone/
+              name: rclone-config
             - mountPath: /data
               name: data

Also applies to: 153-157

# Step 2: Check/verify the sync
- image: $SYNC_IMAGE
imagePullPolicy: IfNotPresent
name: check-sync
args:
- check
- /data
- harbor-s3:$S3_BUCKET
- --one-way
- --progress
resources:
limits:
cpu: 4
memory: 4Gi
requests:
cpu: 1
memory: 1Gi
Comment thread
kycheng marked this conversation as resolved.
volumeMounts:
Comment thread
kycheng marked this conversation as resolved.
- mountPath: /root/.config/rclone/
name: rclone-config
- mountPath: /data
name: data
volumes:
- configMap:
name: s3-config
name: rclone-config
- name: data
persistentVolumeClaim:
claimName: $HARBOR_REGISTRY_PVC
EOF

kubectl apply -f sync-and-check-s3.yaml
```

#### Migration Verification

monitor the migration progress (optional)

```bash
kubectl logs -n $HARBOR_NS -l job-name=sync-and-check-s3 -c sync-data -f
```

The log containing "0 differences found" indicates successful synchronization.

```bash
export HARBOR_NS=xxxxx
kubectl logs -n $HARBOR_NS -l job-name=sync-and-check-s3 | grep "0 differences found"
Defaulted container "check-sync" out of: check-sync, sync-data (init)
2025/09/01 07:30:12 NOTICE: S3 bucket harbor: 0 differences found
Comment thread
kycheng marked this conversation as resolved.
```

#### Update Harbor Configuration to Use S3 Storage

After successfully migrating the data, update the Harbor configuration to use S3 storage instead of PVC. This step configures Harbor to read and write registry data directly from/to the S3 bucket.

Create a Kubernetes Secret containing S3 access credentials. The secret must include the following keys that Harbor registry expects:

- `REGISTRY_STORAGE_S3_ACCESSKEY`: Base64-encoded S3 access key
- `REGISTRY_STORAGE_S3_SECRETKEY`: Base64-encoded S3 secret key

```yaml
apiVersion: v1
data:
REGISTRY_STORAGE_S3_ACCESSKEY: <base64-encoded-access-key>
REGISTRY_STORAGE_S3_SECRETKEY: <base64-encoded-secret-key>
kind: Secret
metadata:
name: s3-secret
namespace: <harbor-namespace> # Replace with your Harbor namespace
type: Opaque
```

Add the following content to the Harbor resource (note that storage configurations other than registry must be preserved):

```yaml
apiVersion: operator.alaudadevops.io/v1alpha1
kind: Harbor
metadata:
name: harbor
spec:
helmValues:
persistence:
enabled: true
# Add the following content
imageChartStorage:
disableredirect: true
s3:
existingSecret: s3-secret # an secret for S3 accesskey and secretkey
bucket: harbor # Storage bucket created in S3 cluster
region: us-east-1 # S3 region (required for AWS S3, optional for MinIO/Ceph)
regionendpoint: http://xxxxx # S3 cluster access address, note that the access port must be included
v4auth: true
type: s3
# END
Comment on lines +202 to +214
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harbor S3 settings: add secure/pathstyle; keep other storage intact.

When regionendpoint is http, set secure: false. Many S3‑compatible backends need pathstyle: true. Add notes about self‑signed TLS (skipverify: true).

          s3:
            existingSecret: s3-secret # an secret for S3 accesskey and secretkey
            bucket: harbor # Storage bucket created in S3 cluster
            region: us-east-1 # S3 region (required for AWS S3, optional for MinIO/Ceph)
            regionendpoint: http://xxxxx # S3 cluster access address, note that the access port must be included
            v4auth: true
+           secure: false          # set true if endpoint is https
+           pathstyle: true        # recommended for MinIO/Ceph RGW
+           # skipverify: true     # if using self-signed certs on https endpoint
          type: s3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
persistence:
enabled: true
# Add the following content
imageChartStorage:
disableredirect: true
s3:
existingSecret: s3-secret # an secret for S3 accesskey and secretkey
bucket: harbor # Storage bucket created in S3 cluster
region: us-east-1 # S3 region (required for AWS S3, optional for MinIO/Ceph)
regionendpoint: http://xxxxx # S3 cluster access address, note that the access port must be included
v4auth: true
type: s3
# END
persistence:
enabled: true
imageChartStorage:
disableredirect: true
s3:
existingSecret: s3-secret # an secret for S3 accesskey and secretkey
bucket: harbor # Storage bucket created in S3 cluster
region: us-east-1 # S3 region (required for AWS S3, optional for MinIO/Ceph)
regionendpoint: http://xxxxx # S3 cluster access address; include port if needed
v4auth: true
secure: false # set true if using an HTTPS endpoint
pathstyle: true # recommended for MinIO/Ceph RGW (path-style requests)
# skipverify: true # if using self-signed certs on an HTTPS endpoint
type: s3
🤖 Prompt for AI Agents
In docs/en/solutions/How_to_migrate_harbor_registry_pvc_storage_to_s3.md around
lines 199 to 211, the provided Harbor S3 config is missing explicit secure,
pathstyle and skipverify settings; update the example to (1) set secure: false
when regionendpoint uses http (otherwise true for https), (2) add pathstyle:
true for S3‑compatible backends that require path style addressing, and (3)
document skipverify: true as an optional setting for self‑signed TLS; do this
while keeping the rest of the storage configuration unchanged and add a brief
comment explaining when to use each flag.

```

### Verification and Testing

After completing the configuration update, verify that the migration was successful by testing Harbor functionality:

1. **Test Docker Operations**: Log in to Harbor locally and verify that docker push/pull operations work correctly
2. **Check Storage**: Confirm that new images are being stored in the S3 bucket
3. **Verify Existing Images**: Ensure that previously migrated images can still be pulled successfully