Skip to content

Commit

Permalink
Merge pull request #31 from coralproject/feat/CORL-2494
Browse files Browse the repository at this point in the history
[CORL 2494] Refactor
  • Loading branch information
wyattjoh committed Apr 25, 2022
2 parents e936e28 + a30397d commit 04e1c3b
Show file tree
Hide file tree
Showing 47 changed files with 1,224 additions and 1,850 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/release.yml
Expand Up @@ -14,12 +14,12 @@ jobs:
# Note the fetch-depth: 0 option is required for the change log to
# work correctly with goreleaser.
fetch-depth: 0
- uses: actions/setup-go@v2
- uses: actions/setup-go@v3
with:
go-version: "^1.15.6"
- uses: golangci/golangci-lint-action@v2.5.2
go-version: "^1.18"
- uses: golangci/golangci-lint-action@v3.1.0
with:
version: v1.35
version: v1.45.2
- name: Unit testing
run: go test ./...
- uses: goreleaser/goreleaser-action@v2
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/test.yml
Expand Up @@ -11,16 +11,16 @@ jobs:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- uses: actions/setup-go@v2
- uses: actions/setup-go@v3
with:
go-version: "^1.15.6"
- uses: golangci/golangci-lint-action@v2.5.2
go-version: "^1.18"
- uses: golangci/golangci-lint-action@v3.1.0
with:
version: v1.35
version: v1.45.2
- name: Unit testing
run: go test ./...
- name: Validate Generated Files
run: |
go install github.com/mailru/easyjson/easyjson@v0.7.7
go generate ./...
git diff --quiet || exit 1
git diff --quiet || exit 1
5 changes: 5 additions & 0 deletions .golangci.yml
@@ -0,0 +1,5 @@
issues:
exclude-rules:
- text: "unknown JSON option \"intern\""
linters:
- staticcheck
2 changes: 2 additions & 0 deletions .goreleaser.yml
Expand Up @@ -6,6 +6,7 @@ builds:
- linux
goarch:
- amd64
- arm64

archives:
- files:
Expand All @@ -15,6 +16,7 @@ release:
github:
owner: coralproject
name: coral-importer
prerelease: auto

checksum:
name_template: "checksums.txt"
Expand Down
248 changes: 129 additions & 119 deletions README.md
Expand Up @@ -7,6 +7,116 @@ to download a release of the `coral-importer` tool.

## Strategies


### Legacy

This import strategy is designed to migrate data from Coral ^4.12.0 to
^6.17.1.

This strategy requires you to stop any Coral instances that are
interacting with the MongoDB database that Coral uses. The first step to
this process is to dump the files from the remote MongoDB database to
your local machine to perform the migration:

```bash
#!/bin/bash

# If this script errors, stop.
set -e

# Set this to the TALK_MONGO_URL used by Coral.
export TALK_MONGO_URL="..."

# Set this to a folder where we'll export the documents from your ^4 database.
export CORAL_INPUT_DIRECTORY="$PWD/coral/input"

# Set this to a folder where we'll export the documents to be uploaded to your
# ^6 database.
export CORAL_OUTPUT_DIRECTORY="$PWD/coral/output"

# Make the directories used by this and the following tools.
mkdir -p "${CORAL_INPUT_DIRECTORY}" "${CORAL_OUTPUT_DIRECTORY}"

# Dump each collection to the export directory. This operation can take some
# time for larger data sets.
collections=(actions assets comments settings users)
for collection in ${collections[*]}
do
mongoexport --uri "$TALK_MONGO_URL" -c $collection -o "${CORAL_INPUT_DIRECTORY}/${collection}.json"
done

# Set this to the ID of your Tenant. If you're importing from a ^4 instance
# and do not have a Tenant ID, generate one using `uuidgen`.
export CORAL_TENANT_ID="c2440817-464e-4a8f-8851-24effd8fee9d"

# Set this to the ID of your Site. If you're importing from a ^4 instance
# and do not have a Site ID, generate one using `uuidgen`.
export CORAL_SITE_ID="3f183f3d-205f-41da-881a-e5089057b78f"

# This importer tool is designed to work with Coral at thw following migration
# version. This is the newest file in the
# https://github.com/coralproject/talk/tree/develop/src/core/server/services/migrate/migrations
# directory for your version of Coral.
export CORAL_MIGRATION_ID="1582929716101"

# Set this to the file location where you want to export your log files to.
export CORAL_LOG="$PWD/coral/logs.json"

# Run the importer tool in dry mode to perform document validation before we
# actually write any files. This may take some time and will use about 40% of
# the dataset's size in RAM to perform the validation.
coral-importer legacy --dryRun

# If the previous command completed successfully, then you can run it for real.
coral-importer legacy

# This should write all the output files to the `$CORAL_OUTPUT_DIRECTORY`
# directory. If you did not use SSO with your ^4 instance of Coral using
# plugins, you can continue below to the *Importing* section. Otherwise.

# Set this to the name of a directory we can write output files that have
# been mapped.
export CORAL_MAPPER_POST_DIRECTORY="$PWD/coral/post"

mkdir -p "${CORAL_MAPPER_POST_DIRECTORY}"

# If your custom SSO plugin saved the User ID in the Users `profiles` array
# as `profiles.id`, like the following:
#
# {
# "profiles": [
# {
# "provider": "my-auth",
# "id": "..."
# }
# ]
# }
#
# Then you should set the following so the mapper can grab the `profiles.id`
# from the profile and map it to a corresponding SSO profile for ^6.
# export CORAL_MAPPER_USERS_SSO_PROVIDER="my-auth"

# If a custom plugin wrote a users username to a field other than `username`,
# such as:
#
# {
# "metadata": {
# "displayName": "My Name"
# }
# }
#
# Then you should set the following so the mapper can grab the username from
# the other field.
# export CORAL_MAPPER_USERS_USERNAME="metadata.displayName"

# Run the importer tool in dry mode to perform document validation before we
# actually write any files.
coral-importer legacy --dryRun map

# If the previous command completed successfully, then you can run it for real.
coral-importer legacy map
```

### CSV

When importing vis CSV, each column must be provided, even if it is optional. In
Expand Down Expand Up @@ -51,7 +161,7 @@ iconv -f ISO88592 -t UTF8 < users.csv > users-clean.csv

#### Format

`data/csv/users.csv`:
**`data/csv/users.csv`**:

| # | Column | Type | Required | Description |
| --- | ---------- | ------- | -------- | -------------------------------------------------------------------------------------------------- |
Expand All @@ -62,7 +172,7 @@ iconv -f ISO88592 -t UTF8 < users.csv > users-clean.csv
| 4 | banned | boolean | no | Can be one of `true` or `false` (Default `false`). |
| 5 | created_at | string | no | [ISO8601](http://en.wikipedia.org/wiki/ISO_8601) formatted date string (Defaults to current date). |

`data/csv/stories.csv`:
**`data/csv/stories.csv`**:

| # | Column | Type | Required | Description |
| --- | ------------ | ------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------- |
Expand All @@ -74,7 +184,7 @@ iconv -f ISO88592 -t UTF8 < users.csv > users-clean.csv
| 5 | closed_at | string | No | Date as a [ISO8601](http://en.wikipedia.org/wiki/ISO_8601) formatted date string when commenting was closed (Default is unset). |
| 6 | mode | string | No | Story mode, can be one of `COMMENTS`, `QA`, or `RATINGS_AND_REVIEWS` (Default `COMMENTS`) |

`data/csv/comments.csv`:
**`data/csv/comments.csv`**:

| # | Column | Type | Required | Description |
| --- | ---------- | ------ | -------- | -------------------------------------------------------------------------------------------------------------- |
Expand All @@ -87,137 +197,37 @@ iconv -f ISO88592 -t UTF8 < users.csv > users-clean.csv
| 6 | status | string | No | Status of the Comment, can be one of `APPROVED`, `REJECTED`, or `NONE` (Default's to `NONE`). |
| 7 | rating | number | No | Rating attached to the Comment |

### Legacy

```sh
DATABASE_NAME=coral
MONGO_CONTAINER_ID=mongo-export

# Make the export directory.
mkdir -p "export"

# Dump each collection to the export directory. This operation can take some
# time for larger data sets.
collections=(actions assets comments settings users)
for collection in ${collections[*]}
do
docker run --rm -ti --link $MONGO_CONTAINER_ID:mongo-export -v $PWD/export:/mnt/export mongo:4.2 mongoexport --host mongo-export -d $DATABASE_NAME -c $collection -o /mnt/export/${collection}.json
done

# This now provides the export files that can be processed by the importer.
TENANT_ID=c2440817-464e-4a8f-8851-24effd8fee9d
SITE_ID=3f183f3d-205f-41da-881a-e5089057b78f
INPUT=data/legacy
OUTPUT=database

coral-importer --quiet legacy --input $INPUT --tenantID $TENANT_ID --siteID $SITE_ID --output $OUTPUT 2> output.log
```

### Livefyre
## Importing

```sh
# Perform the import parsing operation.
TENANT_ID=c2440817-464e-4a8f-8851-24effd8fee9d
SITE_ID=3f183f3d-205f-41da-881a-e5089057b78f
USER_INPUT=data/livefyre/users.json
COMMENTS_INPUT=data/livefyre/comments.json
OUTPUT=database

coral-importer --quiet livefyre --users $USER_INPUT --comments $COMMENTS_INPUT --tenantID $TENANT_ID --siteID $SITE_ID --output $OUTPUT 2> output.log
```
# Set this to the MONGO_URL used by the new Coral ^6 instance. This should be
# different than the CORAL_MONGO_URI used by ^4.
export MONGO_URL="..."

## Importing
# Set this to a folder where we'll export the documents to be uploaded to your
# ^5 database.
export CORAL_OUTPUT_DIRECTORY="$PWD/coral/output"

```sh
# Upload the generated imports to MongoDB.
TARGET_MONGO_CONTAINER=mongo
DATABASE_NAME=coral
CONCURRENCY=$(sysctl -n hw.ncpu)
# This command should get the number of CPU's available on your machine,
# otherwise if it fails just set it to the number of CPU's manually.
export CONCURRENCY="$(sysctl -n hw.ncpu)"

# For each of these collections, import them into the new MongoDB database.
collections=(commentActions stories users comments)
for collection in ${collections[*]}
do
if [ ! -f $PWD/$OUTPUT/$collection.json ]
then
echo "$PWD/$OUTPUT/$collection.json does not exist, not importing $collection collection"
continue
fi

docker run --rm -ti -v $PWD:/mnt/import --link mongo:$TARGET_MONGO_CONTAINER mongo:4.2 mongoimport --uri=mongodb://mongo/$DATABASE_NAME --file=/mnt/import/$OUTPUT/$collection.json --collection $collection --numInsertionWorkers $CONCURRENCY
mongoimport --uri "$MONGO_URL" --file "${CORAL_OUTPUT_DIRECTORY}/$collection.json" --collection "$collection" --numInsertionWorkers $CONCURRENCY
done
```

If you're updating documents:

```sh
collections=(commentActions stories users comments)
for collection in ${collections[*]}
do
if [ ! -f $PWD/$OUTPUT/$collection.json ]
then
echo "$PWD/$OUTPUT/$collection.json does not exist, not importing $collection collection"
continue
fi

docker run --rm -ti -v $PWD:/mnt/import --link mongo:$TARGET_MONGO_CONTAINER mongo:4.2 mongoimport --uri=mongodb://mongo/$DATABASE_NAME --file=/mnt/import/$OUTPUT/$collection.json --collection $collection --numInsertionWorkers $CONCURRENCY --mode upsert --upsertFields id
done
```

## Tricks

Print all the active operations on your database with messages.

```js
db.currentOp()
.inprog.filter((op) => Boolean(op.msg))
.map((op) => ({ ns: op.ns, msg: op.msg, command: op.command }));
```

## Benchmarks

| | |
| ------------------ | ----- |
| Running importer | 1m30s |
| Importing Comments | 4m30s |
| Importing Stories | 4s |
| Importing Users | 30s |
| Rebuilding Indexes | 20m |

## Developing

```sh
collections=(actions assets comments settings users)
# For each of these collections, import them into the new MongoDB database.
collections=(commentActions stories users comments)
for collection in ${collections[*]}
do
head ${collection}.json > ${collection}_sample.json
head -n1 ${collection}.json > ${collection}_single.json
mongoimport --uri "$MONGO_URL" --file "${CORAL_OUTPUT_DIRECTORY}/$collection.json" --collection "$collection" --numInsertionWorkers $CONCURRENCY --mode upsert --upsertFields id
done
```

To update models run:
```sh
go generate ./...
```

To locally build an executable for testing use the output flag:
```sh
go build -o <location>
```

## Changelog

### v0.4.13
- (coral) handles `metadata.displayName` values if present on legacy users

### v0.4.3

- (coral) monotonic timestamps will no longer reset and cause a conflict if more
than 1000 are created in the same second.

### v0.4.1

- (coral) `createdAt` timestamps that are used by Coral as cursors now are
emitted as unique timestamps when not disabled with the
`--disableMonotonicCursorTimes` flag. This means that every timestamp emitted
that shares the same second time will automatically have it's ms time
incremented to prevent collisions.
```
24 changes: 0 additions & 24 deletions common/cli.go

This file was deleted.

0 comments on commit 04e1c3b

Please sign in to comment.