Skip to content

drkane/find-that-charity

main
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
api
 
 
 
 
 
 
ftc
 
 
 
 
geo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Find that charity

Elasticsearch-powered search engine for looking for charities and other non-profit organisations. Allows for:

  • importing data nearly 20 sources in the UK, ensuring that duplicates are matched to one record.
  • An elasticsearch index that can be queried.
  • Org-ids are added to organisations.
  • Reconciliation API for searching organisations, based on an optimised search query.
  • Facility for uploading a CSV of charity names and adding the (best guess) at a charity number.
  • HTML pages for searching for a charity

Installation

  1. Clone repository
  2. Create virtual environment (python -m venv env)
  3. Activate virtual environment (env/bin/activate or env/Scripts\activate)
  4. Install requirements (pip install -r requirements.txt)
  5. Install postgres
  6. Start postgres
  7. Install elasticsearch 7 - you may need to increase available memory (see below)
  8. Start elasticsearch
  9. Create .env file in root directory. Contents based on .env.example.
  10. Create the database tables (python ./manage.py migrate && python ./manage.py createcachetable)
  11. Import data on charities (python ./manage.py import_charities)
  12. Import data on nonprofit companies (python ./manage.py import_companies)
  13. Import data on other non-profit organisations (python ./manage.py import_all)
  14. Add organisations to elasticsearch index (python ./manage.py es_index) - (Don't use the default search_index command as this won't setup aliases correctly)

Dokku Installation

1. Set up dokku server

SSH into server and run:

# create app
dokku apps:create ftc

# postgres
sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git postgres
dokku postgres:create ftc-db
dokku postgres:link ftc-db ftc

# elasticsearch
sudo dokku plugin:install https://github.com/dokku/dokku-elasticsearch.git elasticsearch
export ELASTICSEARCH_IMAGE="elasticsearch"
export ELASTICSEARCH_IMAGE_VERSION="7.7.1"
dokku elasticsearch:create ftc-es
dokku elasticsearch:link ftc-es ftc
# configure elasticsearch 7:
# https://github.com/dokku/dokku-elasticsearch/issues/72#issuecomment-510771763

# setup elasticsearch increased memory (might be needed)
nano /var/lib/dokku/services/elasticsearch/ftc-es/config/jvm.options
# replace `-Xms512m` with `-Xms2g`
# replace `-Xms512m` with `-Xmx2g`
# restart elasticsearch
dokku elasticsearch:restart ftc-es

# SSL
sudo dokku plugin:install https://github.com/dokku/dokku-letsencrypt.git
dokku config:set --no-restart ftc DOKKU_LETSENCRYPT_EMAIL=your@email.tld
dokku letsencrypt ftc
dokku letsencrypt:cron-job --add

2. Add as a git remote and push

On local machine:

git remote add dokku dokku@SERVER_HOST:ftc
git push dokku master

3. Setup and run import

On Dokku server run:

# setup
dokku run ftc python ./manage.py migrate
dokku run ftc python ./manage.py createcachetable

# run import
dokku run ftc python ./manage.py charity_setup
dokku run ftc python ./manage.py import_charities
dokku run ftc python ./manage.py import_companies
dokku run ftc python ./manage.py import_all
dokku run ftc python ./manage.py es_index

Server

The server uses django. Run it with the following command:

python ./manage.py runserver

The server offers the following API endpoints:

  • /reconcile: a reconciliation service API conforming to the OpenRefine reconciliation API specification.

  • /charity/12345: Look up information about a particular charity

Todo

Current status is a proof-of-concept, needs a bit of work to get up and running.

Priorities:

  • tests for ensuring data is correctly imported
  • server tests
  • use results of server/recon_test.py to produce the best reconciliation search query for use in the server (recon_test_7 seems the best at the moment)
  • threshold for when to use the result vs discard

Future development:

  • upload a CSV file and reconcile each row with a charity
  • allow updating a charity with additional possible names

Testing

coverage run manage.py test && coverage html
python -m http.server -d htmlcov --bind 127.0.0.1 8001