This repository is the home of the UK Charity Activity Tags, a project to classify every UK registered charity using two classification taxonomies.
The project was a collaboration between NCVO Research, Dr Christopher Damm at the Centre for Regional Economic and Social Research (Sheffield Hallam University) and David Kane, an independent freelance researcher. The project was funded by Esmée Fairbairn Foundation.
The project started in Autumn 2020 with the first draft release of data in Summer 2021.
The classification and the results are licensed under a Creative Commons Attribution 4.0 International License.
The scripts included in the repository were created using Python 3.9. They are likely to work with other versions of Python too.
To use the python scripts, you'll need to install the required packages. The best way to do this is with a virtual environment:
python -m venv env # creates a virtual environment in the ./env directory
# now activate the virtual environment
env\Scripts\activate # (on windows)
source env/bin/activate # (on unix/mac os)
pip -r requirements.txt # installs the requirements
You can then run the python scripts as described above - remember to activate the virtual environment every time you open a new terminal.
Dependencies are managed using pip-tools. First install it with:
python -m pip install pip-tools wheel setuptools
Then add any additional dependencies to requirements.in
. Run pip-compile
to create an updated requirements.txt
file, and then run pip-sync
to install the new requirements.
!! Important - don't edit the requirements.txt
file directly, it should only be edited with pip-compile
Data outputs from the project. The following resources are available:
- ICNP/TSO:
icnptso.csv
- UK-CAT:
ukcat.csv
These files show the charities that were manually classified as part of this project.
These files show the results of running automatic classification for UK-CAT and ICNP/TSO against the latest lists of active and inactive charities in the UK.
The UK-CAT classification used a system of rules-based classification as described in the methodology. The ICNP/TSO classification uses a machine-learning model that is overwritten by any manual classifications found in the sample.
charities_active-ukcat.csv
charities_inactive-ukcat.csv
charities_active-icnptso.csv
charities_inactive-icnptso.csv
This directory contains the project documentation, which is turned into a website using mkdocs.
You can run a local version of the docs using mkdocs serve
.
The website is generated using Github actions.
These notebooks contain code for processing the data, such as the machine learning model for ICNP/TSO classification.
To run the notebooks from with the virtual environment, use the following code (from veekaybee.github.io), after install the dependencies above
ipython kernel install --user --name=ukcat
jupyter notebook # or `jupyter lab`
This is a python module providing commands to fetch and apply the data from this project.
The commands are as follows:
python -m ukcat fetch charities
This will create two CSV files containing data on charities. The files will be created in the ./data/
folder, and are ./data/charities_active.csv
and ./data/charities_inactive.csv
.
These files are based on data from the Charity Commission for England and Wales, the Scottish Charity Regulator and the Charity Commission for Northern Ireland. Data is used under the Open Government Licence.
These are project internal scripts that fetch data from the airtable bases used by the project. They are used to create the two files ./data/sample.csv
and ./data/ukcat.csv
that are already available in the repo. They can only be operated correctly with the correct airtable credentials.
To fetch data you need to set two environment variables containing the airtable base ID and API key. The easiest way is to create a file called .env
in this directory, and add the following lines (with the correct values):
AIRTABLE_API_KEY=keyGOESHERE
AIRTABLE_BASE_ID=appGOESHERE
An example can be found in .env-sample
.
The commands to fetch the data are:
python -m ukcat fetch tags
python -m ukcat fetch sample
python -m ukcat fetch sample --table-name="Top charities" --save-location="./data/top2000.csv"
This script creates a Logistic Regression model for the ICNP/TSO categories, using the data found in
sample.csv
and top2000.csv
, based on the parameters created in ./notebooks/icnptso-machine-learning-test.ipynb
.
python -m ukcat train icnptso
By default this will save the model as a pickle file to ./data/icnptso_ml_model.pkl
.
This script uses the regular expressions from ./data/ukcat.csv
to apply tags to a list of charities.
python -m ukcat apply ukcat --charity-csv "./data/charities_active.csv" -f "name" -f "activities"
python -m ukcat apply ukcat --charity-csv "./data/charities_inactive.csv" -f "name" -f "objects"
This will create the charities_active-ukcat.csv
and charities_inactive-ukcat.csv
files that are included in the ./data/
folder. Each file gives a number of rows for each charity showing the UK-CAT tags that have been applied based on the regular expression keywords.
You can choose to include the name of the charity and the tag name by adding the --add-names
option. You can also choose to add "parent" codes into the same data, by using the --include-groups
option.
This script uses the machine learning model created in ./notebooks/icnptso-machine-learning-test.ipynb
to find the best ICNP/TSO category for a list of charities.
If the model doesn't already exist it will be created, using the files sample.csv
and top2000.csv
python -m ukcat apply icnptso --charity-csv "./data/charities_active.csv" -f "name" -f "activities"
python -m ukcat apply icnptso --charity-csv "./data/charities_inactive.csv" -f "name" -f "objects"
This will create the charities_active-icnptso.csv
and charities_inactive-icnptso.csv
files that are included in the ./data/
folder. Each file gives a row per charity with the best estimated ICNP/TSO category, along with the model's estimated probability of the correctness of that category.
You can choose to include the name of the charity and the tag name by adding the --add-names
option.
The scripts included in the repository were created using Python 3.9. They are likely to work with other versions of Python too.
To use the python scripts, you'll need to install the required packages. The best way to do this is with a virtual environment:
python -m venv env # creates a virtual environment in the ./env directory
# now activate the virtual environment
env\Scripts\activate # (on windows)
source env/bin/activate # (on unix/mac os)
pip -r requirements.txt # installs the requirements
You can then run the python scripts as described above - remember to activate the virtual environment every time you open a new terminal.
Dependencies are managed using pip-tools. First install it with:
python -m pip install pip-tools wheel setuptools
Then add any additional dependencies to requirements.in
. Run pip-compile
to create an updated requirements.txt
file, and then run pip-sync
to install the new requirements.
!! Important - don't edit the requirements.txt
file directly, it should only be edited with pip-compile