azure-cosmos-db-mongo-migration

This repo: https://github.com/Azure-Samples/azure-cosmos-db-mongo-migration

MongoDB to CosmosDB Migrations: Questions to Consider

Migration Context?

Reason(s) for the migration
- Other goals beyond moving the data to another DB
Source Database
Target Database(s)
Number of databases and collections/containers
Amount of data
Network bandwidth to Azure
Batch vs Streaming
Global Locations / Regions
Timeframe
Other Requirements / Constraints

Database/Schema design?

Is it a verbatim migration?
Do you want to optimize/refactor the design?
Do you want to optimize for CosmosDB costs and Autoscaling?
Large documents over 2MB
Sharding / Horizontal Partitioning / Partition Keys

CosmosDB Target API?

CosmosDB/Mongo 4.x
CosmosDB/Core API (aka - CosmosDB/SQL API)
Other

CosmosDB Mongo API Feature Support

What are your preferred technologies?

Linux/Bash/Python
DotNet
Azure Data Factory
Other

Who is involved in executing the migration?

The Application Developers
The DevOps team
Third-party Vendor, Partner, or Systems Integrator
Microsoft

Migration Solutions to Consider

mongoexport/mongoimport or mongodump/mongorestore
Azure Data Factory
Data Migration Service (DMS) - tutorial
Data Migration Service (DMS) - features
Data Migration Tool
Pre-Migration Assessment
Data Migration Assistant Notebook
Also, this Code-based Migration Process in this repository

Code-based Migration Process Features

This project framework provides the following features:

MongoDB to CosmosDB Data Migration
Optional Data Transformation
Several migration approaches are supported by this project, including:
- mongoexport from source database is used in all cases
- optional verbatim migrations
- optional mongoexport document wrangling/transformation
- optional target database schema/collection refactoring
- optional analysis of mongoexport files for candidate CosmosDB partition-keys
- load CosmosDB with mongoimport, a Dotnet client program, or Azure Data Factory (ADF)
The Process uses metadata-driven code generation to generate executable artifacts
- shell scripts
- configuration files
- Azure Data Factory JSON code files
A User-edited mapping file which maps source to target databases
- this mapping file can itself be generated
The intent is to have a highly automated migration process
- support migrating dozens/hundreds of databases, and thousands of collections
- zero human edits of the generated code/artifacts are necessary

Getting Started

See the Documentation and clone this repository.

Prerequisites

git
python 3
mongo tooling - mongoexport, mongoimport, mongo shell
Developer workstation of any OS (macOS, Windows, Linux)
Azure CLI program, az
Ubuntu Virtual Machine(s) in Azure to execute the majority of the process
Docker for executing the Reference Application locally
Your source-control system that is network-accessable to Azure VMs
See the docs on workstation setup

Installation

Clone this GitHub repository to your Development workstation and/or VM with network access to your current MongoDB database(s).

$ git clone https://github.com/Azure-Samples/azure-cosmos-db-mongo-migration.git

Then copy this project to your Git or other source-control system.

It is assumed that your source-control system (GitHub, Azure DevOps, etc) has network access to Azure VMs.

Quickstart

See the Documentation, Summary of Actions and Scripts to Execute

Demo

A Reference Application is included in this repository to show how to use this migration process.

This reference application database can be deployed to either a Docker-containerized MongoDB instance, or to your development MongoDB instance.

See running MongoDB locally as a Docker container.

Resources

A Directory Map of this Repository

Directory m2c/ contains the implementation of this migration process, while directory reference_app/ contains a working example of this process.

├── docs                             <-- project documentation in markdown format
│
├── m2c                              <-- The root Mongo-To-Cosmos (m2c) implementation directory
│   ├── az                           <-- az CLI scripts for Azure Resource Provisioning
│   │   └── uvm                      <-- files related to the Ubuntu Linux Azure VM used in this process
│   │       └── scripts              <-- setup scripts to execute on an Ubuntu Linux Azure VM
│   ├── dotnet_mongo_loader          <-- C# program to load the target Cosmos/Mongo database
│   ├── plots                        <-- png visualizations created by mongoexport_pk_analyze.py
│   ├── pysrc                        <-- python source files
│   ├── templates                    <-- Python Jinja2 templates for code generation
│
└── reference_app
    │
    ├── artifacts                    <-- the generated code artifacts in this directory and below
    │   ├── adf                      <-- the generated Azure Data Factory artifacts
    │   │   ├── dataset
    │   │   ├── linkedService
    │   │   └── pipeline
    │   └── shell                    <-- generated shell scripts to execute on a linux VM
    │       ├── data
    │       ├── dotnet_mongo_loader  <-- the DotNet loader program
    │       ├── mongo                <-- generated mongo scripts for CosmosDB target database
    │
    ├── data
    │   ├── metadata                 <-- metadata, mapping, and manifest files
    │   └── mongoexports             <-- the mongoexport files from the source database
    │       ├── olympics             <-- one subdirectory for each source database
    │       └── openflights
    │
    └── mongo_docker                 <-- this directory can run the Mongo Docker container
        ├── data_wrangling           <-- "private" directory for creating the loadable reference data
        ├── mongo                    <-- mongodb "ddl" scripts to create the reference databases
        ├── olympics                 <-- the data for the olympics reference database
        │   ├── import_json
        │   └── raw
        └── openflights              <-- the data for the openflights (travel) reference database
            ├── import_json
            └── raw

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
docs		docs
img		img
m2c		m2c
reference_app		reference_app
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
backup.xml		backup.xml

License

Azure-Samples/azure-cosmos-db-mongo-migration

Folders and files

Latest commit

History

Repository files navigation

azure-cosmos-db-mongo-migration

MongoDB to CosmosDB Migrations: Questions to Consider

Migration Context?

Database/Schema design?

CosmosDB Target API?

CosmosDB Mongo API Feature Support

What are your preferred technologies?

Who is involved in executing the migration?

Migration Solutions to Consider

Code-based Migration Process Features

Getting Started

Prerequisites

Installation

Quickstart

Demo

Resources

A Directory Map of this Repository

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages