Skip to content
This repository has been archived by the owner. It is now read-only.
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
run
 
 

npm-registry Algolia connector

Replicate the full npmjs registry and watch for updates. Supports being killed at any moment during either full replication or watching.

Usage

Local

npm install
APPLICATION_ID='ALGOLIA_APPLICATION_ID' \
API_KEY='ALGOLIA_ADMIN_API_KEY' \
INDEX_PREFIX='npmjs-'
CONFIG='{
  "NPM_REGISTRY": "https://skimdb.npmjs.com/registry",
  "PACKAGES_INDEXNAME": "registry",
  "REPLICATION_CONCURRENCY": 10000,
  "DOWNLOADS_CONCURRENCY": 100,
  "WATCH_CONCURRENCY": 1,
  "EXIT_AFTER": "5min"
}' \
./run

Docker

Build it:

docker build -t npmjs-connector .

Run it:

docker run \
-e APPLICATION_ID='ALGOLIA_APPLICATION_ID' \
-e API_KEY='ALGOLIA_ADMIN_API_KEY' \
-e INDEX_PREFIX='npmjs-' \
-e CONFIG='{
  "NPM_REGISTRY": "https://skimdb.npmjs.com/registry",
  "PACKAGES_INDEXNAME": "registry",
  "REPLICATION_CONCURRENCY": 10000,
  "DOWNLOADS_CONCURRENCY": 100,
  "WATCH_CONCURRENCY": 1,
  "EXIT_AFTER": "5min"
}' \
npmjs-connector

Workflow

The goal is to be resilient to failures or interruptions of service without having to re-replicate everything.

  1. get current lastSequence known, either the current from repo or the one from index
  2. get current replicateLastPackage known if not found, browse repository to find the first package (by page) if found but special "DONE" token, pass replication
  3. start replication at this package
  4. every loop of replication = save replicateLastPackage
  5. once replication is done, save lastSequence known, store special DONE flag in replicateLastPackage
  6. start download job, start at downloadsLastPackage or first package of index
  7. at each download run, save downloadsLastPackage
  8. use lastSequence known, start watching
  9. every watch loop, save lastSequence known

Download count and repo watching can be done in parallel once full replication is done.

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published