-
Notifications
You must be signed in to change notification settings - Fork 9
Prerequisites
It would help to get answers to following questions on customer setup for planning project migration for a workspace
-
How many groups of project migrations have been identified based on users/groups sharing and collaborating on projects?
-
Will a single user or team be migrated to CML in one shot or in batches. Recommendation is to migrate a user or team to target CML in one shot to avoid confusion on tracking where projects are running and this complexity will need to be borne by end users. Same project cannot be used in source and target CML simultaneously.
-
List of users, teams and projects need to be tracked
-
For each project
- Should know size of each project to estimate storage capacity on bastion host.
- If batch migration , total of each project size needs to be considered for estimating storage capacity
- How many models, applications, jobs, sessions are running pre-migration needs to be tracked. This will be required for validation post migration. Validation and starting workloads is manual and is left to PS or customer
- If customer is running third party apps or services within an Source(CDSW/CML) application or model, then they will need to plan for testing this in Target CML before kickstarting project migration.
-
For each round of project migration
- Make sure workspace settings are in sync - https://github.com/cloudera/cmlutils/wiki/Prerequisites#migration-prerequisites
- Make sure Source (CDSW/CML) project files are backed up
-
Users / Group
- How many total users in each Source(CDSW/CML) cluster needs to be migrated?
- Will admins be running the tool or end users or both? If admins, then might need to change owner for projects post migration to original Owner through CML API manually post migration
- Is the customer using Local Teams or CDP Teams in CDSW / CML which is not backed by an LDAP group ? If using this, its the responsibility of Admins to keep the Users and Groups in Sync post migration. To avoid this, make sure CDP Groups and CML Teams are always backed by an LDAP group
- Unix like system (MacOS or Linux)
- Connectivity to both Source and Target.
- Rsync should be installed.
- Python version >=3.10.
- Sufficient disk space to hold project contents. For stronger security, the disk and/or file system should be encrypted.
- Set up custom CA certificates if required (Check Troubleshooting Section)
-
Create a CML Workspace on the Target with the configuration similar to Source Workspace.
To Create Workspace on Public Cloud (AWS) : https://docs.cloudera.com/machine-learning/cloud/workspaces/topics/ml-provision-workspaces.html
To Create Workspace on Private Cloud : https://docs.cloudera.com/machine-learning/1.5.1/workspaces-privatecloud/topics/ml-pvc-provision-ml-workspace.html
-
All the users/roles/groups should be created on target workspace before starting the migration. For more details check the FAQs section.
-
All the teams should be created on target workspace before migration. It is expected that the admin will have prior information of the teams from the source workspace. To create teams, refer this. Migration of project created under team context is supported.
-
All the custom runtimes should be configured on target workspace prior to running project migration. To Create a new custom runtime refer here and to add custom runtimes to catalog refer here.
-
The user migrating the project should be an Admin/Owner/Collaborator of the project. Please note that the user migrating the project will become the owner of the project in the target CML workspace. After migration, to change ownership of the project check the FAQs section.
-
The models/jobs/applications are created in a paused or stopped state in the destination workspace. It is recommended to halt all jobs/applications at the Source before migration to prevent any data corruption.
-
The workspace-level settings found under the "Site Administration" tab will not be migrated through the utility. Administrators will need to manually configure these settings in the Target workspace.
- User quota and custom quota settings.
- Runtime settings (Resource profile, Environment variables).
- Runtime addons are now supported in CML instead of the host mount feature of CDSW. (Applicable only if Source is CDSW)
- Security settings (Authentication settings - local, LDAP, SAML, Kerberos configuration.)
- General settings (SMTP settings)
- Please make sure that you have rsync enabled runtime image (cloudera/ml-runtime-workbench-python3.9-standard-rsync) added to your CML workspace runtime catalog. The image is hosted on DockerHub. If the rsync image is not readily accessible, it can be created from Dockerfile hosted here and hosted in any registry. If not, please ask your workspace admin to do that for you.
- Please make sure that your default ssh public key is added under your user settings. If not, please add them in User Settings->Remote Editing->SSH public keys for session access. The default ssh public key should be available on your machine at
~/.ssh/\<something\>.pub. The SSH public key must be uploaded to both source and target workspaces. - If a ssh key pair doesn't already exist in your system, please create one. It is recommended to avoid setting a passphrase for the ssh key because there are multiple instances in which ssh connection is established. If a passphrase is set, automation using the utility would be tedious.
- The Legacy API key must be available and noted down with you. You will need this API key during migration. To generate a new key, head over to User Settings->API Keys->Legacy API key.