Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md

README.md

keyserver-elasticsearch

This is the documentation for https://keyserver-elasticsearch.daylightpirates.org/

It is an elasticsearch node that contains a recent dump of the SKS keyserver pool database (I maintain a keyserver in the pool). This pool is what is GPG uses by default to fetch public keys when using gpg --recv-key. The purpose of this elasticsearch project is to let people do data analysis on the keys in the pool.

NOTE: Do not rely on elasticsearch's _id for looking up keys! I will be refreshing the database with a new dump regularly, which will create completely new elasticsearch ids for the keys (it's easier to blow away and recreate than try and find/update based on fingerprint).

Current dump from: Sun Apr 5 05:02:58 UTC 2015

Document format

The keys loaded into the index are based on the output from my openpgp.py script that converts pgp public keys to json.

https://github.com/diafygi/openpgp-python#output-formats

Quick links

Instructions for making your own

The server this runs on isn't super powerful, so if you want to run some heavy queries, you may want to setup a local copy on your system. Here's how.

Step 1: Setup elasticsearch on your local machine.

Step 2: Download openpgp.py.

mkdir ~/opengpg-python
cd ~/openpgp-python
wget https://raw.githubusercontent.com/diafygi/openpgp-python/master/openpgp.py > openpgp.py

Step 3: Download the latest SKS keyserver dump (this will take a while, ~7GB).

mkdir ~/dump
cd ~/dump
wget -c -r -p -e robots=off --timestamping --level=1 --cut-dirs=3 \
--no-host-directories http://keyserver.mattrude.com/dump/current/

Step 4: Parse keyserver dump to json gzip files (split every 1000 lines) (this will take several hours, ~16GB).

ls -1 ~/dump/*.pgp | \
xargs -I % sh -c "python ~/openpgp-python/openpgp.py --merge-public-keys '%' | \
split -l 1000 -d --filter 'gzip -9 > $FILE.gz' - '%.json.'"

Step 5: Bulk index each gzip file into elasticsearch (this will take several hours, ~100GB).

ls -1 ~/dump/*.json.*.gz | \
xargs -I % sh -c "zcat '%' | \
sed '0~1 s/^/{ \"index\" : { \"_index\" : \"keyserver1\", \"_type\" : \"key\" } }\n/' | \
curl -X POST --data-binary @- http://localhost:9200/_bulk | \
{ cat -; echo ''; } >> ~/results.log"