Project Empire is a set of tools designed to help with keeping information about assets and companies of inluential people or groups. It was originally created for business structures of former prime minister and owner of vast business empire Andrej Babiš, but is now generalized and can be used for anybody else.
Project was created by České centrum pro investigativní žurnalistiku, o.p.s. and co-funded by the European Commission in the framework of the Stars4Media pilot programme.
Currently running Project Empire wikis:
Person of interest | Project Empire wiki |
---|---|
Andrej Babiš | https://imperiumab.investigace.cz/ |
Daniel Křetínský | https://imperiumdk.investigace.cz/ |
If you have any questions related to this project, please contact it@investigace.cz.
- How Project Empire works?
- How to use Project Empire?
- Database documentation
- 0. Introduction sheet
- 1. Legal entities sheet
- 1.1. Legal entities owners sheet
- 1.2. Legal entities other relationships sheet
- 1.3. Legal entities sources sheet
- 1.4. Legal entities previous names sheet
- 1.5. Legal entities previous addresses sheet
- 1.6. Legal entities media mentions sheet
- 2. People sheet
- 2.1. People sources sheet
- 3. Subsidies sheet
- 3.1. Subsidies payments sheet
- 3.2. Subsidies sources sheet
- Adjusting structure
- Wiki documentation
- Scripts documentation
- License
There are 3 main parts to a working Project Empire. Database, which is a Google spreadsheet containing all the data about business structures of selected person of interest. Wiki, which makes it possible to publicly browse through the database. And scripts, which most importantly offer a way to push data from the database to wiki.
Let's show all the parts using demo data. We have put together tiny sample from our first database done for Andrej Babiš. Here are links for both the database and wiki:
- Demo database: https://docs.google.com/spreadsheets/d/1EJ-bP-qqTjZx2jcY6hfG3uAKvn6_FtCA-vqD4qVzn-k/edit
- Demo wiki: https://project-empire-demo.investigace.cz/
If you open the database, you can see that on the first sheet there is an introduction to that specific database. In our demo database we have 6 legal entities, 2 people and 2 subsidies. All the data is then organized in separate sheets.
If you are interested in the legal entities like companies, trusts, etc., you can browse them in the sheet 1. Legal entities. People are in sheet 2. People and subsidies in 3. Subsidies. The rest of sheets are additional information for each of these base data types.
One of the important parts of databases with business structures is keeping the relationships between legal entities and people. You can find these in sheets 1.1. Legal entities owners and 1.2. Legal entities other relationships. It is important to keep the data organized like this as the scripts rely on the structure.
Normally the database would not be publicly available, but only shared to people collaborating on the data. That's why part of Project Empire is also wiki, which allows publishing the database for anyone to browse through.
Project Empire wiki is a custom-configured MediaWiki which is prepared to be run via Docker anywhere you want — in the cloud or on your server. After installing, you can customize it yourself to present your data exactly how you want.
You can see that the demo wiki right now contains the same number of legal entities, people and subsidies as the database. And if you open for example the company AGROFERT, a.s., you can see, apart from the basic information, all the owner and other relationships the company has and media mentions from the database together at that one page. Plus offers links to the related pages. There lies the value of the wiki — it displays the data clearly and in easily browsable fashion.
But how you get the data from the database to the wiki? The data don't have to be manually updated, we have a script, which is takes the database spreadsheet and pushes its data to the wiki.
There are few different scripts as part of Project Empire, which all help with gathering data about business structures, but the most important script is definitely the one pushing the data from the database to the wiki.
The script is a Python command line script, so you have to be at least slightly versed in running command line scripts to be able to use it, even though we tried to simplify its usage as much as possible. The process is then that you clone this GitHub repository to your machine, set up Python virtual environment and just run the script.
And that's it. That's the whole Project Empire. If you want to install it and play with it yourself, you can find the documentation for it below.
First step in using Project Empire is to set up the database. Since the database is a simple Google spreadsheet, you can do that by making a copy of either the demo database spreadsheet or the empty database spreadsheet. You should be able to make a copy by opening the spreadsheet, clicking option File > Make a copy and filling information about your new spreadsheet.
- Demo database spreadsheet: https://docs.google.com/spreadsheets/d/1EJ-bP-qqTjZx2jcY6hfG3uAKvn6_FtCA-vqD4qVzn-k/edit
- Empty database spreadsheet: https://docs.google.com/spreadsheets/d/19syMW_V3G6AmG0yIHBZzys2RfBH4zsAbObO5FrhjB68/edit
After the copy is created, fill your person or group of interest on sheet 0. Introduction and then continue with updating data on the other sheets.
We recommend sharing the spreadsheet only to the specific people who will be collaborating on the data and not share it publicly as it may contain sensitive information.
Next step is installing Project Empire wiki. As mentioned before, wiki is a custom-configured MediaWiki which is prepared to be run via Docker. We chose Docker as that way it can be safely run the same even on different systems.
Start the installation by setting up a server preferrably running latest Ubuntu (at the time of writing the Ubuntu LTS version was 20.04), with docker and docker-compose installed, SSH access, public IPv4 address and reachable HTTP (80) and HTTPS (443) ports. It can be your own physical server as well as server set up at some cloud platform.
If you don't want to set up such server yourself, here is a tutorial how to set up such server on DigitalOcean.
Continue with picking the domain where you want the wiki to be run, you will need to update DNS records for it. The wiki is prepared to be run on either separate domain (e.g. project-empire-wiki.org) or subdomain (e.g. project-empire-wiki.example.org). In case of separate domain, you want to create two A records pointing to the public IPv4 address of server, one for the plain domain and one for www subdomain (wiki takes care of the redirecting then). In case of subdomain, you want one A record for that subdomain pointing to the public IPv4 address of the server.
Example: If the server has public IPv4 address 1.2.3.4 and you want the wiki to run at subdomain project-empire-wiki.example.org, add A record for project-empire-wiki.example.org with value 1.2.3.4.
After changing the DNS records please wait for them to propagate before continuing with the installation, because obtaining HTTPS certificates from Lets Encrypt depends on server being accessible at the picked domain or subdomain.
Optionally, you can create Amazon S3 bucket for backups of the wiki. Empire data in the wiki obviously does not have to be backed up, because they can always be repushed from database spreadsheet, but other wiki settings like uploaded images, custom pages, etc. would be gone if something happens to your server. Note that together with the bucket you need to create also user with the programmable access via Access key ID and Secret access key.
Even though this step is optional, wee strongly recommend creating Amazon S3 bucket for the backups.
When you have a server and, optionally, backup bucket set up, it is time for the actual installation of Project Empire wiki.
First, you will need to clone this repository to the server. For example in the home folder of the logged-in user run following:
$ git clone git@github.com:investigace/project-empire.git
That should create project-empire
folder with wiki
folder inside. Go to the wiki folder.
$ cd project-empire/wiki
Next is providing the configuration. Copy .env.example
and rename the copy to .env
. After that, edit .env
with editor of your choice and fill in your configuration. Here we will be using vim
.
$ cp .env.example .env
$ vim .env
Each of the configuration options in .env
is explained, so go through each of them and set them as you need. When finished, save the changes to the .env
file and close it.
Now we run the installation. We recommend running it as root, so type in following:
$ sudo su
$ docker-compose up
That should build images of all the wiki services, create docker containers and run them. Note that it can take few minutes to finish. If everything is fine, the output should end with similar to following:
At this point, the wiki is installed and running at your picked domain, but without correct certificates - if you navigate to the domain in the browser you should see warning. That is to be expected, because the certificates were just obtained from Lets Encrypt and nginx service needs to be restarted to load them. Since you cannot run the wiki in the terminal forever anyway, we will restart the wiki as daemon.
First hit Ctrl+C to stop current running services (you might have to wait a bit till they stop), And when that is done, type in following, which will start the services as daemon in background.
$ docker-compose up -d
Now if you navigate to the domain in your browser, you should see your Project Empire wiki without any warnings, ready to be used.
When you have wiki running, next step is to push there the data from the database. We have prepared for it script written in Python you have to run on your machine. Subsequent instructions expect *nix system, so if you are on Windows machine, run them in Windows Subsystem for Linux (WSL).
To be able to run the script, you first need to make sure you have Python of at least version 3.8 and have it available in terminal. To check both you can run following command to get the Python version:
$ python3 --version
Continue with cloning this repository to your machine and then open the scripts
folder in the repository
$ git clone git@github.com:investigace/project-empire.git
$ cd project-empire/scripts
We are using Python's virtual environment to separate all the libraries needed by scripts from other libraries you might have installed on your system. To initialize the virtual environment, run inside scripts
folder:
$ python3 -m venv .venv
$ . .venv/bin/activate
When inside virtual environment, install the libraries:
$ pip3 install -r requirements.txt
Now you should be set up to actually run the scripts. Note that next time you want to run the scripts, you don't need to fully initialize the virtual environment, it is enough to just run . .venv/bin/activate
.
The script for pushing data from the database currently cannot read data directly from Google spreadsheet, so to provide data you first need to download spreadsheet as Microsoft Excel file. Open your database Google spreadsheet, click option File > Download > Microsoft Excel (.xlsx)
and save the Excel file somewhere on your system.
Let's finally push the data. The script push_empire_database_to_wiki.py
takes 3 arguments, path to the database Excel file, domain of the wiki and wiki user to use. Here is an example how we run it for the demo:
$ ./push_empire_database_to_wiki.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls project-empire-demo.investigace.cz admin
After that the script asks for password of the wiki user. When correct, the script will compute all the needed changes to the wiki pages and asks whether to do the actual changes. Here is an example of output when the script was first run for the demo:
$ ./push_empire_database_to_wiki.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xlsx project-empire-demo.investigace.cz admin
Password for user admin at Empire wiki https://project-empire-demo.investigace.cz/: ********************
Loaded Empire database: 6 legal entities, 2 people, 0 subsidies
Connected to Empire wiki
Preparing changes to be pushed...
Prepared following changes:
Pages to be created:
AB private trust I
AB private trust II
AGROFERT, a.s.
AGROFERT, a.s. (1994-2005)
Agrofert USA, Inc.
SynBiol, a.s.
Andrej Babiš
Zbyněk Průša
Pages to be updated:
Legal entities overview
People overview
Template:Empire summary table
Are you sure you want to push these to Empire MediaWiki? (y/n)
If the changes look correct to you, you can type in y
and that will push all the changes to wiki. Typing in n
or anything else will stop the script.
And that's it. Congratulations! After pushing is done, you should be able to see all the data in the wiki.
Note that you have to run the script anytime you want to publish updated data from the database to wiki.
We have built the Project Empire database structure to best accommodate all the different data we encountered when mapping business empires and we will explain the structure in the following sections. Scripts and wiki depend on this structure — sheets and columns having correct names and data being in specific format —, but we understand that the needs of your projects can vary and there are safe ways to adjust the structure without breaking any of the scripts and wiki features. Adjusting is explained right after all the database spreadsheet sheets.
- Demo database spreadsheet: https://docs.google.com/spreadsheets/d/1EJ-bP-qqTjZx2jcY6hfG3uAKvn6_FtCA-vqD4qVzn-k/edit
- Empty database spreadsheet: https://docs.google.com/spreadsheets/d/19syMW_V3G6AmG0yIHBZzys2RfBH4zsAbObO5FrhjB68/edit
This sheet is only informative and is not used by scripts or wiki at all. Feel free to remove or completely change as you desire.
First of the 3 main sheets, keeping information about legal entities like companies, trusts, etc. Every row in this sheet is one legal entity and its current details.
Column | Example value | Required | Explanation |
---|---|---|---|
Database identifier | AGROFERT, a.s. | Yes | Unique identification in this database. Should be name of the legal entity. When other sheets are referencing legal entity, they are referencing what is written in this column. If there are more legal entities with same name, we recommend to differentiate them by adding explanation to the parenthesis. Eg. if there are two Acme Ltd. companies and one is in United Kingdom and the other in New Zealand, then one could have identifier "Acme Ltd. (UK)" and the other "Acme Ltd. (NZ)" |
Legal entity type | Company | Yes | Type of legal entity. Whether it is company, trust, etc. Scripts and wiki do not rely on any specific values here, so feel free to differentiate legal entity types by any values you wish. |
Name | AGROFERT, a.s. | Yes | The actual current name of the legal entity. |
Country | CZ | Yes | Country where the legal entity is currently founded. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc. |
Identification number | 26185610 | No | Identification for the legal entity in its country. Can be only number, but also any text, eg. in Germany it can be "Stendal HRB 12345". |
Address | Pyšelská 2327/2, Chodov, 149 00 Praha 4 | No | Current address of the legal entity. Ideally full, but can be also partial, eg. only city. |
Foundation date | 2000-07-01 | No | The date when the legal entity was founded. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. |
Dissolution date | N/A | No | If the legal entity was dissolved, this is the date when the dissolution happened. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. If the company was not dissolved, we recommend to fill N/A. |
Other notes | Umbrella company for empire of Andrej Babiš | No | Add any other notes you want for the legal entity in this column. |
Sheet with information about legal entity owners. Every row in this sheet is one owner record for referenced legal entity. Ownerships are separate from other relationships as they are especially important when mapping the business structure and as the ownerships have the additional Owned percentage column.
Note that there are 2 types of ownership records you can define using this sheet. Either referencing one, which references some legal entity or person you have in the database, or partial one, which does not reference. The partial record can be useful eg. when you want full ownership history for some company, but don't actually want to track all the owners in the database, because some owners are not part of the business empire you are after.
Column | Example value | Required | Explanation |
---|---|---|---|
Owned legal entity reference | AGROFERT, a.s. | Yes | Legal entity which is being owned. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities. |
Owner legal entity or person reference | Andrej Babiš | No | If the owner is in the database, fill here Database identifier of either legal entity or person from the database. If the owner is not in the database for any reason, leave empty. |
Owner type | Person | Yes | Add "Legal entity" when the owner is a legal entity, "Person" when it is a person. |
Owner name | Andrej Babiš | Yes | Name of the owner, either legal entity name or person name. |
Owner country | CZ | No | Country of the owner. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc. |
Owner address | Pyšelská 2327/2, Chodov, 149 00 Praha 4 | No | Current address of the owner. Ideally full, but can be also partial, eg. only city. |
Owner legal entity identification number | 26185610 | No | If the owner is legal entity, this column is for identification number in its country. Can be only number, but also any text, eg. in Germany it can be "Stendal HRB 12345". |
Owner person date of birth | 2000-01-01 | No | If the owner is person, this column is for date of birth of that person. Must be in ISO-8601 format, which is YYYY-MM-DD. |
Owned percentage | 100 | No | How many percents does the owner owns. Add value from 0 to 100. But also textual values like "Majority" are allowed. |
Owned since date | 2005-05-30 | No | The date when the ownership started. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. |
Owned until date | 2017-02-02 | No | If the ownership ended, this is the date when that happened. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. If the ownership did not end, we recommend to fill N/A. |
Ownership details | In 2017 transferred ownership to trusts | No | Add any other notes you want for the ownership in this column. |
Sheet with information about other legal entity relationships than ownerships. Every row in this sheet is one relationship to some other legal entity or person, for example who was director of legal entity.
Note that there are 2 types of relationship records you can define using this sheet. Either referencing one, which references some legal entity or person you have in the database, or partial one, which does not reference. The partial record can be useful eg. when you want full relationship history for some company, but don't actually want to track all the related people and legal entities in the database, because some of those are not part of the business empire you are after.
Column | Example value | Required | Explanation |
---|---|---|---|
Legal entity reference | AGROFERT, a.s. | Yes | Legal entity for which this relationship is defined. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities. |
Related legal entity or person reference | Andrej Babiš | No | If the related legal entity or person is in the database, fill here Database identifier of their row in the database. If the legal entity or person is not in the database for any reason, leave empty. |
Related type | Person | Yes | Add "Legal entity" when the relationship is with a legal entity, "Person" when it is with a person. |
Related name | Andrej Babiš | Yes | Name of the related legal entity or person. |
Related country | CZ | No | Country of the related legal entity or person. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc. |
Related address | Pyšelská 2327/2, Chodov, 149 00 Praha 4 | No | Current address of the related legal entity or person. Ideally full, but can be also partial, eg. only city. |
Related legal entity identification number | 26185610 | No | If the relationship is with legal entity, this column is for identification number in its country. Can be only number, but also any text, eg. in Germany it can be "Stendal HRB 12345". |
Related person date of birth | 2000-01-01 | No | If the relationship is with person, this column is for date of birth of that person. Must be in ISO-8601 format, which is YYYY-MM-DD. |
Related since date | 2005-05-30 | No | The date when the relationship started. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. |
Related until date | 2017-02-02 | No | If the relationship ended, this is the date when that happened. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. If the relationship did not end, we recommend to fill N/A. |
Relationship details | Chairman of the board | No | Add any other notes you want for the relationship in this column. |
Sheet with sources of information about legal entities. If you don't need structured sources, it is enough to list sources in Other notes of legal entity. Every row in this sheet is one source used to gain information about a legal entity.
Column | Example value | Required | Explanation |
---|---|---|---|
Legal entity reference | AGROFERT, a.s. | Yes | Legal entity this source belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities. |
Source summary | Veřejný Rejstřík a Sbírka Listin - Ministerstvo Spravedlnosti České Republiky. Justice.cz. | No | Summary or name of the source document. |
Information gained from source | Company identifier, names, addresses, foundation date | No | What kind of information was obtained from the source. |
Source last checked date | 2020-07-06 | No | The date when the source was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD. |
Source URL | https://or.justice.cz/ias/ui/rejstrik-firma.vysledky?subjektId=525681&typ=UPLNY | No | Full URL of the source document. |
Sheet for previous names of legal entities. You might not need previous names this much structured, but structure here can be useful when searching for historical records where the legal entity can be using some previous name.
Column | Example value | Required | Explanation |
---|---|---|---|
Legal entity reference | SynBiol, a.s. | Yes | Legal entity this previous name belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities. |
Previous name | SYNTHESIA a.s. | Yes | Previous name of the legal entity. |
Named since date | 2004-11-08 | No | The date the legal entity gained this name. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. |
Named until date | 2006-03-27 | No | The date the legal entity lost this name. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. |
Sheet for previous addresses of legal entities. You might not need previous addresses this much structured, but structure here can be useful when searching for historical records where the legal entity can be using some previous address.
Column | Example value | Required | Explanation |
---|---|---|---|
Legal entity reference | AGROFERT, a.s. | Yes | Legal entity this previous address belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities. |
Previous address | Pyšelská 1234, Chodov, 149 00 Praha 4 | Yes | Previous address of the legal entity. |
Address since date | 2004-11-08 | No | The date the legal entity gained this address. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. |
Address until date | 2006-03-27 | No | The date the legal entity lost this address. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. |
Sheet for information about mentions of legal entity in media. Every row in this sheet is one mention in media.
Column | Example value | Required | Explanation |
---|---|---|---|
Legal entity reference | AGROFERT, a.s. | Yes | Legal entity this media mention belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities. |
Summary of the media mention | Wikipedia page | No | Summary or name of the mention in media. |
Media last checked date | 2022-01-01 | No | The date when the mention was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD. |
Media mention url | https://cs.wikipedia.org/wiki/Agrofert | No | Full URL of the mention in media. |
Second of the 3 main sheets, keeping information about people. Every row in this sheet is one person and their current details.
Column | Example value | Required | Explanation |
---|---|---|---|
Database identifier | Andrej Babiš | Yes | Unique identification in this database. Should be name of the person. When other sheets are referencing person, they are referencing what is written in this column. If there are more people with same name, we recommend to differentiate them by adding year of birth in the parenthesis. Eg. if there are two persons with name John Doe and one is born in 1950 and the other in 1980, then one would have identifier "John Doe (1950)" and the other "John Doe (1980)" |
Full name | Andrej Babiš | Yes | Full name of the person. Last name should be at the end. |
Nationality | CZ | No | Nationality of the person. Does not support multiple nationalities, so use Other notes to track nationalities of multinationals. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc. |
Date of birth | 2000-01-01 | No | The date of birth. Must be in ISO-8601 format, which is YYYY-MM-DD. Full date of birth is not published in the wiki, only the year. |
Residence country | CZ | No | Country of the person's residence. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc. |
Residence full address | Pyšelská 2327/2, Chodov, 149 00 Praha 4 | No | Current address of the person's residence. Ideally full, but can be also partial, eg. only city. Full address is not published in the wiki, only value in column Residence only city. |
Residence only city | Praha 4 | No | City from the current address of the person's residence. |
Other notes | Former prime minister of Czechia | No | Add any other notes you want for the person in this column. |
Sheet with sources of information about people. If you don't need structured sources, it is enough to list sources in Other notes of person. Every row in this sheet is one source used to gain information about a person.
Column | Example value | Required | Explanation |
---|---|---|---|
Person reference | Andrej Babiš | Yes | Person this source belongs to. Should have exactly the same value as Database identifier of some row in sheet 2. People. |
Source summary | AGROFERT INC. :: Massachusetts (US) :: OpenCorporates. Opencorporates.Com. | No | Summary or name of the source document. |
Information gained from source | Connection to AGROFERT INC. | No | What kind of information was obtained from the source. |
Source last checked date | 2020-07-06 | No | The date when the source was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD. |
Source URL | https://opencorporates.com/companies/us_ma/383446187 | No | Full URL of the source document. |
Third of the 3 main sheets, keeping information about subsidies. Every row in this sheet is one subsidy and its details. Note that this sheet does not have any amount columns, because the subsidies are often split into more payments from different providers or in different years and we keep the amounts on the payments only.
Column | Example value | Required | Explanation |
---|---|---|---|
Database identifier | CEDR-30/04UL-2003 | Yes | Unique identification in this database. Should be ideally project code. When other sheets are referencing subsidy, they are referencing what is written in this column. If there are more subsidies with same project code, we recommend to differentiate them by adding something in the parenthesis. Eg. if there are two subsidies with code A-123 and one is from 2010 and the other from 2015, then one would have identifier "A-123 (2010)" and the other "A-123 (2015)" |
Receiving legal entity reference | AGROFERT, a.s. | Yes | Legal entity which received this subsidy. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities. |
Year | 2002 | No | Year in which the subsidy was received. Could be also range of years like "2007-2013". |
Project name | Zlepšení hygieny provozu při skladování drůbeže | No | Name of the subsidized project. |
Project code | 30/04UL-2003 | No | Code of the subsidized project. |
Programme name | SAPARD | No | Name of the programme the subsidy was received from. |
Programme code | SAPARD | No | Code of the programme the subsidy was received from. |
Notes | Signed in 2002, carried out in 2003. | No | Add any notes you want for the subsidy in this column. |
Sheet for keeping information about subsidy payments. Every row in this sheet is payment from one provider in given year belonging to some subsidy.
Column | Example value | Required | Explanation |
---|---|---|---|
Subsidy reference | CEDR-30/04UL-2003 | Yes | Subsidy this payment belongs to. Should have exactly the same value as Database identifier of some row in sheet 3. Subsidies. |
Provider | Czech Ministry of agriculture | No | Institution which provided this payment. |
Year | 2003 | No | Year the payment was made. |
Original currency | CZK | No | The original currency of the payment. Must be three-letter code defined by standard ISO 4217. Eg. Euro is EUR, US dollar is USD, Pound sterling is GBP, etc. |
Amount in original currency | 556,875.00 | No | Amount of the payment in the original currency. Must be either in the English notation (12,345.67) or without any notation (12345). |
Amount in EUR | 22,736.00 | No | Amount of the payment in Euro. Must be either in the English notation (12,345.67) or without any notation (12345). |
Notes | Using approximate exchange rate 24,49 CZK/EUR. | No | Add any notes you want for the payment in this column. |
Sheet with sources of information about subsidies. If you don't need structured sources, it is enough to list sources in Notes of subsidy. Every row in this sheet is one source used to gain information about a subsidy.
Column | Example value | Required | Explanation |
---|---|---|---|
Subsidy reference | CEDR-30/04UL-2003 | Yes | Subsidy this source belongs to. Should have exactly the same value as Database identifier of some row in sheet 3. Subsidies. |
Source summary | Subsidy record on Hlídač státu | No | Summary or name of the source document. |
Information gained from source | Full subsidy record | No | What kind of information was obtained from the source. |
Source last checked date | 2021-04-28 | No | The date when the source was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD. |
Source URL | https://www.hlidacstatu.cz/Dotace/Detail/cedr-a85110a7e99bbff7370eebb382f90913c193db91 | No | Full URL of the source document. |
Scripts rely on the structure to be able to load the data from the database spreadsheet, but there are safe ways to adjust the structure. Scripts only need:
- sheets to have the exact names,
- columns to have the exact headers,
- headers to be on row 4 and data starting on the row 5.
Which means that you can do anything, which does not mess up these needs. You can add new sheets or add new columns as you please. You can reorder the columns in any of the sheets. Or you can for example safely use colors or different fonts.
If you want to restore wiki from S3 backup, there is s3-backup Docker service prepared for that. On the server, where you want to restore wiki from backup, prepare everything as you would when installing new wiki. That is: follow the steps from 2. Install wiki all the way up to command docker-compose up
— you don't want to run this command as that would install fresh new wiki.
Instead of running docker-compose up
, first start only database by running:
docker-compose up -d db
When the database is built and running, fire up s3-restore service, which will connect to the S3 bucket with backups, download the most recent backup and restore the wiki from it. Restoring in this situation is populating the database from database dump and restoring contents of mediawiki/shared
folder. Note that to make it work, you need to provide in .env
the credentials for accessing the correct S3 bucket.
Restore command:
docker-compose run --rm s3-restore
After the restore is done, just continue with the installation process by running docker-compose up
. That's it!
All the scripts are written in Python v3 and should be therefore runnable on any system where Python v3 can be installed. Library dependencides are defined in requirements.txt
file and we recommend installing them only in the virtual environment (venv) so the dependencies do not affect any other scripts you might have on your system. Before running any of the following scripts we therefore recommend running following to enter virtual environment and install the dependencies:
$ python3 -m venv .venv
$ . .venv/bin/activate
$ pip3 install -r requirements.txt
The scripts follow convention that if you run them without any arguments or with -h
/--help
option, help is printed with all the arguments and options of that script. Eg.:
$ ./push_empire_database_to_wiki.py
usage: push_empire_database_to_wiki.py [-h] DATABASE_EXCEL WIKI WIKI_USER
Push Empire database data to Empire wiki
positional arguments:
DATABASE_EXCEL Path to the Excel spreadsheet file with Empire database
WIKI Domain or full URL of Empire wiki, e.g. https://empirewiki.example.org/
WIKI_USER Username to use for login to Empire wiki
optional arguments:
-h, --help show this help message and exit
For pushing data from database spreadsheet to the wiki. Has 3 arguments: path to the database spreadsheet downloaded as Excel, wiki domain and wiki user name. When run, it prompts for password of wiki user.
Example run:
$ ./push_empire_database_to_wiki.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls project-empire-demo.investigace.cz admin
For fetching relationships of Czech companies from ARES database, which is managed by Czech Ministry of Finance. Fetches full history of ownerships and only current other relationships (like people on the board, etc.). Saves the result in relationships_of_cz_companies_from_ares.xlsx
Excel file in same structure as is used in database spreadsheet for easy copying.
Has 1 argument: path to the database spreadsheet downloaded as Excel. And offers 3 options: one to cache XML files from ARES in case you are running the scripts multiple times and don't want to download fresh XMLs every time. Second option to automatically confirm continuing after the database is loaded by script. And the last option is to only fetch for companies which does not have any relationships so far in the database.
Example run:
$ ./fetch_relationships_of_cz_companies_from_ares.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls
With the options:
$ ./fetch_relationships_of_cz_companies_from_ares.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls --cache-ares-xmls --yes --only-companies-without-relationships
For fetching subsidies of Czech companies from platform hlidacstatu.cz. Fetches all the subsidies and removes duplicate records as identified by hlidacstatu.cz. When run, it prompts for API key to hlidacstatu.cz, which you can obtain by creating an account there and then navigating to their API page. Saves the result in subsidies_of_cz_companies_from_hlidacstatu.xlsx
Excel file in same structure as is used in database spreadsheet for easy copying.
Has 1 argument: path to the database spreadsheet downloaded as Excel. Does not offer any additional options.
Example run:
$ ./fetch_subsidies_of_cz_companies_from_hlidacstatu.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls
Everything in this repository is licensed under GNU General Public License v3.0.