Skip to content

Latest commit



206 lines (159 loc) · 8.69 KB

File metadata and controls

206 lines (159 loc) · 8.69 KB

Table of Contents


Scripts to help elasticsearch sysadmin work. Uses elasticsearch-py. (snapshots and repositories)

Create a elasticsearch repository:

$ python createrepo --reponame repo1 --repopath /var/lib/elasticsearch/backup-repo-1
{u'acknowledged': True}

$ python createrepo --reponame repo2 --repopath /var/lib/elasticsearch/backup-repo-2 {u'acknowledged': True}

List repositories:

$ python getrepos
{u'repo1': {u'settings': {u'location': u'/var/lib/elasticsearch/backup-repo-1'},
            u'type': u'fs'},
 u'repo2': {u'settings': {u'location': u'/var/lib/elasticsearch/backup-repo-2'},
            u'type': u'fs'}}

Create snapshot

$ python createsnap --reponame repo1 --snapname
 snapshot2 --indices aips
{u'snapshot': {u'duration_in_millis': 102,
               u'end_time': u'2015-09-15T20:24:54.305Z',
               u'end_time_in_millis': 1442348694305,
               u'failures': [],
               u'indices': [u'aips'],
               u'shards': {u'failed': 0, u'successful': 5, u'total': 5},
               u'snapshot': u'snapshot2',
               u'start_time': u'2015-09-15T20:24:54.203Z',
               u'start_time_in_millis': 1442348694203,
               u'state': u'SUCCESS'}}

Get snapshot information

$ python getsnap --reponame repo1 --snapname snapshot2
{u'snapshots': [{u'duration_in_millis': 102,
                 u'end_time': u'2015-09-15T20:24:54.305Z',
                 u'end_time_in_millis': 1442348694305,
                 u'failures': [],
                 u'indices': [u'aips'],
                 u'shards': {u'failed': 0, u'successful': 5, u'total': 5},
                 u'snapshot': u'snapshot2',
                 u'start_time': u'2015-09-15T20:24:54.203Z',
                 u'start_time_in_millis': 1442348694203,
                 u'state': u'SUCCESS'}]}
$ python getsnap --reponame repo1
{u'snapshots': [{u'duration_in_millis': 35,
                 u'end_time': u'2015-09-15T20:24:30.376Z',
                 u'end_time_in_millis': 1442348670376,
                 u'failures': [],
                 u'indices': [],
                 u'shards': {u'failed': 0, u'successful': 0, u'total': 0},
                 u'snapshot': u'snapshot1',
                 u'start_time': u'2015-09-15T20:24:30.341Z',
                 u'start_time_in_millis': 1442348670341,
                 u'state': u'SUCCESS'},
                {u'duration_in_millis': 102,
                 u'end_time': u'2015-09-15T20:24:54.305Z',
                 u'end_time_in_millis': 1442348694305,
                 u'failures': [],
                 u'indices': [u'aips'],
                 u'shards': {u'failed': 0, u'successful': 5, u'total': 5},
                 u'snapshot': u'snapshot2',
                 u'start_time': u'2015-09-15T20:24:54.203Z',
                 u'start_time_in_millis': 1442348694203,
                 u'state': u'SUCCESS'}]}

Restore snapshot

$ python restore --reponame repo1 --snapname snapshot2 --index aips --target aipsrestoretest2
{u'snapshot': {u'indices': [u'aipsrestoretest2'],
               u'shards': {u'failed': 0, u'successful': 5, u'total': 5},
               u'snapshot': u'snapshot2'}} (scan)


Scan ES index transfers for backlog entries (status:backlog) and show count (number of items):

$ python scan_count -i transfers  -q '{"query": {"match": {"status": "backlog"}}}'


Scan ES index transfers for backlog entries and print the entries:

$ python scan_show -i transfers  -q '{"query": {"match": {"status": "backlog"}}}'
{u'_score': 0.0, u'_type': u'transferfile', u'_id': u'AVms3OUfHYlZdVHpuf8w', u'_source': {u'accessionid': u'2012-090', u'status': u'backlog', u'sipuuid': u'b1b60f4e-a336-4184-9c66-8a406ae92e50', u'created': 1484663862.353921, u'file_extension': u'jpg', u'filename': u'txsau_ms00418_00674.jpg', u'ingestdate': u'2017-01-17', u'relative_path': u'MarquisePhotosmd-b1b60f4e-a336-4184-9c66-8a406ae92e50/objects/Pride_Picnic_1994/jpegs/txsau_ms00418_00674.jpg', u'fileuuid': u'e92ef9ba-2836-4739-aa37-f0b5bb747e18', u'origin': u'6be772ad-b0d0-4271-9602-6a3f98ecada1'}, u'_index': u'transfers'}
{u'_score': 0.0, u'_type': u'transferfile', u'_id': u'AVms3OWoHYlZdVHpuf82', u'_source': {u'accessionid': u'2012-090', u'status': u'backlog', u'sipuuid': u'b1b60f4e-a336-4184-9c66-8a406ae92e50', u'created': 1484663862.353921, u'file_extension': u'log', u'filename': u'filenameCleanup.log', u'ingestdate': u'2017-01-17', u'relative_path': u'MarquisePhotosmd-b1b60f4e-a336-4184-9c66-8a406ae92e50/logs/filenameCleanup.log', u'fileuuid': u'', u'origin': u'6be772ad-b0d0-4271-9602-6a3f98ecada1'}, u'_index': u'transfers'}


Scan ES index transfers for backlog entries and delete the entries from the index:

$ python scan_del -i transfers  -q '{"query": {"match": {"status": "backlog"}}}'
{u'found': True, u'_type': u'transferfile', u'_id': u'AVms3OUfHYlZdVHpuf8w', u'_version': 3, u'_index': u'transfers'}
{u'found': True, u'_type': u'transferfile', u'_id': u'AVms3OWoHYlZdVHpuf82', u'_version': 3, u'_index': u'transfers'}

( checking for backlog items after deletion)
$ python scan_count -i transfers  -q '{"query": {"match": {"status": "backlog"}}}'
0 (update search indices)

Apply the search document mappings found in Archivematica releases to existing documents using the Reindex API. This is the fastest method to update Archivematica search indices when upgrading the software and should guarantee a working Archivematica installation, but it will not populate the contents of new fields introduced in the new release.

Before using this script, please back up your search indices. The script assumes that the cluster is reachable via but the es_url variable can be adjusted as needed.

Ensure that the Elasticsearch heap size is big enough to accomodate the size of the indices. The size can be adjusted in /etc/default/elasticsearch (Ubuntu) or /etc/sysconfig/elasticsearch (CentOS).

$ grep ES_JAVA_OPTS= /etc/default/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"

To find the right value, it is possible to guess the total size by listing all search indices. E.g. assuming that the search cluster is available via, try running the following command:

$ curl -s -X GET 'http://localhost:9200/_cat/indices/%2A?v=&s=index:desc'
health status index         uuid                   pri rep docs.count docs.deleted store.size
yellow open   transfers     lYqkYjwZRy2XG8CP_3S3PQ   5   1          0            0      1.2kb          1.2kb
yellow open   transferfiles K5gnDZyOQz2JdIeZ6adJsQ   5   1          0            0      1.2kb          1.2kb
yellow open   aips          yAyK_koXThaZcWsBYfzN7w   5   1         17            0    101.4mb        101.4mb
yellow open   aipfiles      TVrrX8jkRhWWxGfvK_M6zg   5   1      11987            0      2.9gb          2.9gb

In the example above, we'd use:

ES_JAVA_OPTS="-Xms3g -Xmx3g"

To run the script, please just download in the Archivematica server and run the script with a user with sudo privileges. For instance:


chmod +x


In our example, this script took 11 minutes to complete. If it fails, try checking out the logs (/var/log/elasticsearch.log). Most likely, the JVM heap size ran out of memory. You can start over by restoring your back up or putting back the old indices.

More details can be found in the Archivematica upgrading docs.