This repository has been archived by the owner. It is now read-only.
No description, website, or topics provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
downloads
packages
Dockerfile
connector.js
diff.js
package.json
readme.md
run

readme.md

npm-registry Algolia connector

Replicate the full npmjs registry and watch for updates. Supports being killed at any moment during either full replication or watching.

Usage

Local

npm install
APPLICATION_ID='ALGOLIA_APPLICATION_ID' \
API_KEY='ALGOLIA_ADMIN_API_KEY' \
INDEX_PREFIX='npmjs-'
CONFIG='{
  "NPM_REGISTRY": "https://skimdb.npmjs.com/registry",
  "PACKAGES_INDEXNAME": "registry",
  "REPLICATION_CONCURRENCY": 10000,
  "DOWNLOADS_CONCURRENCY": 100,
  "WATCH_CONCURRENCY": 1,
  "EXIT_AFTER": "5min"
}' \
./run

Docker

Build it:

docker build -t npmjs-connector .

Run it:

docker run \
-e APPLICATION_ID='ALGOLIA_APPLICATION_ID' \
-e API_KEY='ALGOLIA_ADMIN_API_KEY' \
-e INDEX_PREFIX='npmjs-' \
-e CONFIG='{
  "NPM_REGISTRY": "https://skimdb.npmjs.com/registry",
  "PACKAGES_INDEXNAME": "registry",
  "REPLICATION_CONCURRENCY": 10000,
  "DOWNLOADS_CONCURRENCY": 100,
  "WATCH_CONCURRENCY": 1,
  "EXIT_AFTER": "5min"
}' \
npmjs-connector

Workflow

The goal is to be resilient to failures or interruptions of service without having to re-replicate everything.

  1. get current lastSequence known, either the current from repo or the one from index
  2. get current replicateLastPackage known if not found, browse repository to find the first package (by page) if found but special "DONE" token, pass replication
  3. start replication at this package
  4. every loop of replication = save replicateLastPackage
  5. once replication is done, save lastSequence known, store special DONE flag in replicateLastPackage
  6. start download job, start at downloadsLastPackage or first package of index
  7. at each download run, save downloadsLastPackage
  8. use lastSequence known, start watching
  9. every watch loop, save lastSequence known

Download count and repo watching can be done in parallel once full replication is done.