Skip to content

ahmia/ahmia-index

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

Ahmia index

Ahmia search engine use elasticsearch to index content.

Installation

  • Please install elastic search 6.2+ from the official repository thanks to the official guide
  • Install python3, python3-pip.
  • Install python packages required, preferably in a virtualenv, with:
pip install -r requirements.txt

Ensure that you default version is python3:

python --version

Configuration

example.env contains some default values that should work out of the box. Copy this to .env to create your own instance of environment settings:

cp example.env .env

Review the .env file to ensure that it fits your needs. Make any modifications needed there.

Elasticsearch

Default configuration is enough to run index in dev mode. Here is suggestion for a more secure configuration

/etc/security/limits.conf

elasticsearch - nofile unlimited
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

/etc/elasticsearch/jvm.options

As a general rule, you should set -Xms and -Xmx to the SAME value, which should be 50% of your total available RAM.

-Xms15g
-Xmx15g

/etc/default/elasticsearch

MAX_OPEN_FILES=unlimited
MAX_LOCKED_MEMORY=unlimited

/etc/elasticsearch/elasticsearch.yml

bootstrap.mlockall: true
script.engine.groovy.inline.update: on
script.engine.groovy.inline.aggs: on

Start the service

# systemctl start elasticsearch
curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
  "index.max_result_window" : "30000"
}'

Init mappings

Please do this when running for the first time

$ bash setup_index.sh

Alternatively you could set up the indices manually, somehow like this:

$ curl -XPUT -i "localhost:9200/tor-2018-01/" -H 'Content-Type: application/json' -d "@./mappings_tor.json"
$ curl -XPUT -i "localhost:9200/i2p-2018-01/" -H 'Content-Type: application/json' -d "@./mappings_i2p.json"
$ curl -XPUT -i "localhost:9200/tor-2018-02/" -H 'Content-Type: application/json' -d "@./mappings_tor.json"
$ curl -XPUT -i "localhost:9200/i2p-2018-02/" -H 'Content-Type: application/json' -d "@./mappings_i2p.json"
...
...

Keep latest-tor, latest-i2p aliases pointed to latest monthly indices

This needs to be the first time you deploy and then once per month

$ python point_to_indexes.py

Filter some abuse sites

$ bash call_filtering.sh

Crontab

# Execute child abuse text filtering over the index every hour
30 * * * * cd /home/juha/ahmia-index && bash wrap_filtering.sh > ./crontab_filter.log 2>&1
# First of Each Month:
10 04 01 * * cd /home/juha/ahmia-index && python point_to_indexes.py --add > ./add_alias.log 2>&1
# On 16th of Each Month
10 04 16 * * cd /home/juha/ahmia-index && python point_to_indexes.py --rm > ./remove_alias.log 2>&1

About

Ahmia's elasticsearch index

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •