It has been rewrited in Python3, available here
- slower: more than 2 times faster in Python even reading the gzip file without extract
- slow unserialization: NodeJS unserialize all the XML elements, can't unserialize only some specific XML fields
- more stable: no NodeJS core crashes
- simpler, more readable, less lines, no callbacks hell & pyramid of Doom
This project is a importer of Discogs XMLs : https://data.discogs.com/ into Elasticsearch
Compatible with Elasticsearch 7.0.0 or later
Feel free to fork & upgrade.
This will create an index for each files :
- artists
- masters
- releases
- Run
npm install
- Configure your Elasticsearch configuration in file config/config.json
npm run import -- -h
usage: boot.js [-h] [-v] [-d DATE] [-f {artists,masters,releases,labels}]
Discogs XMLs to Elasticsearch importer
Optional arguments:
-h, --help Show this help message and exit.
-v, --version Show program's version number and exit.
-d DATE, --date DATE Specify a Discogs date file. ex: 20170101
-f {artists,masters,releases,labels}, --file {artists,masters,releases,labels}
Specify a Discogs index to import
Example command :
npm run import -- --d 20170101 -f master
implement labels import :
- Add labels in config.json
- Add mapping in src/elasticsearch/mappings/labels.mapping.json
- Memory leak on Debian 9 while downloading Discogs files.
You can download them manually and run the import command after :
cd downloads
wget https://discogs-data.s3-us-west-2.amazonaws.com/data/2020/discogs_20200901_artists.xml.gz
wget https://discogs-data.s3-us-west-2.amazonaws.com/data/2020/discogs_20200901_masters.xml.gz
wget https://discogs-data.s3-us-west-2.amazonaws.com/data/2020/discogs_20200901_releases.xml.gz
gunzip discogs_20200901_*
cd ..
npm run import -- --d 20200901 -f master