Popit relationship fetcher and importer
This script pulls data from politikus.sinarproject.org and cache it in networkx to enable offline processing. The cache can then be saved into a Neo4j database for further processing and visualization.
Building and Installing
The project depends on the following tools / python package in order to build and install properly.
- Python 3.6 and up
- While the development work targets Neo4j 4.1, earlier version should work.
- Poetry - follow the installation instruction found here.
- Python wheel - you can install via pip
pip3 install wheel
- In order to generate graph, python would need to be compiled to work with
tk-devpackage on Ubuntu.
- Clone this project
git clone https://github.com/Sinar/popit_relationship cd popit_relationship
- Install and build the project
Install the built project with pip (filename of the
.whl file may vary). Please ensure your
PATH is configured properly.
pip3 install ./dist/popit_relationship-0.1.0-py3-none-any.whl
If you are reinstalling after pulling the latest changes, add a
pip3 install --force-reinstall ./dist/popit_relationship-0.1.0-py3-none-any.whl
Most of the configuration is saved within
.env file, please refer to the
.env.example for example. Besides
NEO4J_URI, the script should work with the default settings.
NEO4J_AUTHstores the username and passsword pair separated by a backslash character
NEO4J_URIstores the URI to the neo4j database, e.g.
ENDPOINT_APIstores the ENDPOINT API URI, currently defaulted to
https://politikus.sinarproject.org/@search, the script should work with other similar APIs
CRAWL_INTERVALstores the time to wait between every API call (defaulted to
CACHE_PATHstores the path to the cache file (defaulted to
The configuration environment variables can be overwritten while executing the script (please refer to the usage examples below).
After following the installation guide, if the python environment is properly configured, a script named
primport should be made available. Sub-commands can then be issued for different tasks.
Configuration options can be overriden as environment variables, e.g. when running
primport in Bash
NEO4J_AUTH=neo4j/someOtherPassword primport reset db
primport reset cacheresets the cache file
primport reset dbclears the Neo4j database
primport sync personfetches the
primport sync orgfetches the
primport sync postfetches the
primport sync membershipfetches the
primport sync relationshipfetches the
primport sync ownershipfetches the
Ownership Control StatementAPI
primport sync allfetches all of the above
primport visualize $node1 [$node2 $node3 ...]generates a graph from cache including
$node3etc are optional).
$nodeis a URI to an entity, for instance
- The maximum depth can be overwritten by passing
--depth=1(value is defaulted to
Saving to the database
primport savesaves the cached data to the Neo4j database to allow further work.
Usage without installing the wheel package
- The script can be executed normally as follows
git clone https://github.com/Sinar/popit_relationship cd popit_relationship poetry install poetry run python src/popit_relationship/primport.py reset db
poetry run python src/popit_relationship/primport.py)
Test is done through PyTest
poetry run pytest