This repository contains a Django project that uses the SayIt component to provide transcripts of parliament debates in Visegrad and Balkan countries in a modern, searchable format.
See installation instructions in a separate document.
Adding of a new parliament
Each parliament is hosted on its own subdomain, e.g.
nrsr.sk.sayit.parldata.eu for National Council of Slovak Republic.
The SayIt component allows multiple subdomain based instances. All of them are stored in one database and share settings like collation (needed for language-dependent sorting), timezone or fulltext search configuration.
Because we need to set those settings individually for each parliament, the multi-instance functionality of SayIt cannot be used and we must implement it differently.
The WSGI application extracts the parliament from WSGIProcessGroup directive in the subdomain's virtual host section of the Apache configuration file and it imports parliament-specific settings. Thus WSGI processes for all subdomains share the same Django project where they add their specific settings on initialization.
The following steps are needed to add a new parliament:
Create a new database
sayit_<country_code>_<parliament_code>with collation settings corresponding to primary language of the parliament. Example:
CREATE DATABASE sayit_sk_nrsr WITH LC_CTYPE 'sk_SK.UTF-8' LC_COLLATE 'sk_SK.UTF-8' TEMPLATE template0 OWNER sayit;
When the required locale is missing in your system, create it first and restart database server:
$ sudo locale-gen xx_YY.UTF-8 $ sudo service postgresql restart
Copy one of the files in
/subdomainsdirectory and adjust its content for the new parliament.
Create database tables:
$ source /home/projects/.virtualenvs/sayit/bin/activate (sayit)$ ./manage.py syncdb --settings subdomains.<your-parliament> (sayit)$ deactivate
Connect to the new database and create additional indexes to speed-up regular data imports:
CREATE INDEX speeches_section_source_url on speeches_section(source_url); CREATE INDEX speeches_speech_source_url on speeches_speech(source_url);
/etc/apache2/sites-available/sayit.parldata.eu.conf, copy one of the
VirtualHostblocks and edit the copy to correspond with the new parliament. Then
$ sudo service apache2 reload
Create a new Google Analytics property within the existing GA account and set the two-digit code of the property in the
Based on the ElasticSearch indexing settings for the new parliament you may need to add some files for a new language to ElasticSearch config path (usually
/usr/share/elasticsearch/config) and restart it. Those files for some languages are in
/conf/elasticsearchsubdirectory in the repo.
Some useful resources on configuring languages in Elasticsearch:
Importing of data
Data are imported from
api.parldata.eu via commandline script
manage.py using the command
load_parldata and the subdomain
--settings option. Running the command without
specifying a subdomain imports data for all subdomains. The script must
be executed in virtual environment of the installation and as the user
running the webserver (because of Caching).
Quality of debates data at
api.parldata.eu for all parliaments may
be checked before initial import by a simple script
To initially import data for Slovak parliament subdomain:
$ source /home/projects/.virtualenvs/sayit/bin/activate (sayit)$ sudo -u www-data /home/projects/sayit/manage.py load_parldata --settings subdomains.sk_nrsr --initial
To load new data since the last import for all subdomains:
(sayit)$ sudo -u www-data /home/projects/sayit/manage.py load_parldata
Schedule the incremental update to be executed by Cron if regular updates are needed.
Some implementation notes
Web admin interface
Administration through web interface is disabled as well as logging in. Data can be manipulated only by the commands above.
SayIt templates that needed to be modified are duplicated from SayIt to
sayit_parladata_eu/templates directory and adjusted there. Those
templates override the original SayIt ones thanks to installed Django
SayIt uses SASS, Compass, and Foundation for its CSS. Minor tweaks for
this project are placed into a simple CSS file
manage.py collectstatic and
manage.py refresh_cache after any
All instances corresponding to the subdomains share the same codebase and
the same Django project. Each subdomain has its own
in Apache config file and its own settings in the
directory. The settings for a particular subdomain are loaded as follows:
The WSGI application extracts the parliament from WSGIProcessGroup
directive that is unique in each
VirtualHost block and it imports
settings for that parliament from
are some parliament-specific settings and then the main file with common
settings is imported in a way that passes the specific ones. The common
settings file loads private settings from
conf/private.yml file that
is not present in the repository.
The same settings loading is used in
manage.py, only the module with
parliament-specific settings is provided by
manage.py commands like
--settings directive is not needed.
Rendering of templates for long debates (hundreds of speeches) may take a long time. Because of that, caching is need.
Server-side caching on the filesystem is used for all section views and the speakers list. Pages are rendered into cache in advance by the import script for all imported or updated sections. Hence a user never waits for a template to render, the page is always served from cache.
The cache must be manually refreshed after any modification of application code that affects output of views or after any changes in CSS. Refresh the cache for all subdomains by Django command:
(sayit)$ sudo -u www-data /home/projects/sayit/manage.py refresh_cache
Django's FileBasedCache creates files accessible only by the user who created them. Because the cache is written by the import script and read by the webserver, both have to run as the same user. Therefore the import script and cache refreshment command must be executed as the webserver user, eg. www-data.