Skip to content

Tools designed to help with mapping huge business structures of influential people and groups

License

Notifications You must be signed in to change notification settings

investigace/project-empire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Empire

Project Empire is a set of tools designed to help with keeping information about assets and companies of inluential people or groups. It was originally created for business structures of former prime minister and owner of vast business empire Andrej Babiš, but is now generalized and can be used for anybody else.

Project was created by České centrum pro investigativní žurnalistiku, o.p.s. and co-funded by the European Commission in the framework of the Stars4Media pilot programme.

Currently running Project Empire wikis:

Person of interest Project Empire wiki
Andrej Babiš https://imperiumab.investigace.cz/
Daniel Křetínský https://imperiumdk.investigace.cz/

If you have any questions related to this project, please contact it@investigace.cz.

Table of contents


How Project Empire works?

There are 3 main parts to a working Project Empire. Database, which is a Google spreadsheet containing all the data about business structures of selected person of interest. Wiki, which makes it possible to publicly browse through the database. And scripts, which most importantly offer a way to push data from the database to wiki.

Let's show all the parts using demo data. We have put together tiny sample from our first database done for Andrej Babiš. Here are links for both the database and wiki:

Database

If you open the database, you can see that on the first sheet there is an introduction to that specific database. In our demo database we have 6 legal entities, 2 people and 2 subsidies. All the data is then organized in separate sheets.

If you are interested in the legal entities like companies, trusts, etc., you can browse them in the sheet 1. Legal entities. People are in sheet 2. People and subsidies in 3. Subsidies. The rest of sheets are additional information for each of these base data types.

One of the important parts of databases with business structures is keeping the relationships between legal entities and people. You can find these in sheets 1.1. Legal entities owners and 1.2. Legal entities other relationships. It is important to keep the data organized like this as the scripts rely on the structure.

Normally the database would not be publicly available, but only shared to people collaborating on the data. That's why part of Project Empire is also wiki, which allows publishing the database for anyone to browse through.

Wiki

Project Empire wiki is a custom-configured MediaWiki which is prepared to be run via Docker anywhere you want — in the cloud or on your server. After installing, you can customize it yourself to present your data exactly how you want.

You can see that the demo wiki right now contains the same number of legal entities, people and subsidies as the database. And if you open for example the company AGROFERT, a.s., you can see, apart from the basic information, all the owner and other relationships the company has and media mentions from the database together at that one page. Plus offers links to the related pages. There lies the value of the wiki — it displays the data clearly and in easily browsable fashion.

But how you get the data from the database to the wiki? The data don't have to be manually updated, we have a script, which is takes the database spreadsheet and pushes its data to the wiki.

Scripts

There are few different scripts as part of Project Empire, which all help with gathering data about business structures, but the most important script is definitely the one pushing the data from the database to the wiki.

The script is a Python command line script, so you have to be at least slightly versed in running command line scripts to be able to use it, even though we tried to simplify its usage as much as possible. The process is then that you clone this GitHub repository to your machine, set up Python virtual environment and just run the script.

And that's it. That's the whole Project Empire. If you want to install it and play with it yourself, you can find the documentation for it below.

↑ Jump to Table of Contents


How to use Project Empire?

1. Start your database

First step in using Project Empire is to set up the database. Since the database is a simple Google spreadsheet, you can do that by making a copy of either the demo database spreadsheet or the empty database spreadsheet. You should be able to make a copy by opening the spreadsheet, clicking option File > Make a copy and filling information about your new spreadsheet.

After the copy is created, fill your person or group of interest on sheet 0. Introduction and then continue with updating data on the other sheets.

We recommend sharing the spreadsheet only to the specific people who will be collaborating on the data and not share it publicly as it may contain sensitive information.

2. Install wiki

Next step is installing Project Empire wiki. As mentioned before, wiki is a custom-configured MediaWiki which is prepared to be run via Docker. We chose Docker as that way it can be safely run the same even on different systems.

2.1. Set up server with docker and docker-compose

Start the installation by setting up a server preferrably running latest Ubuntu (at the time of writing the Ubuntu LTS version was 20.04), with docker and docker-compose installed, SSH access, public IPv4 address and reachable HTTP (80) and HTTPS (443) ports. It can be your own physical server as well as server set up at some cloud platform.

If you don't want to set up such server yourself, here is a tutorial how to set up such server on DigitalOcean.

2.2. Update DNS records

Continue with picking the domain where you want the wiki to be run, you will need to update DNS records for it. The wiki is prepared to be run on either separate domain (e.g. project-empire-wiki.org) or subdomain (e.g. project-empire-wiki.example.org). In case of separate domain, you want to create two A records pointing to the public IPv4 address of server, one for the plain domain and one for www subdomain (wiki takes care of the redirecting then). In case of subdomain, you want one A record for that subdomain pointing to the public IPv4 address of the server.

Example: If the server has public IPv4 address 1.2.3.4 and you want the wiki to run at subdomain project-empire-wiki.example.org, add A record for project-empire-wiki.example.org with value 1.2.3.4.

After changing the DNS records please wait for them to propagate before continuing with the installation, because obtaining HTTPS certificates from Lets Encrypt depends on server being accessible at the picked domain or subdomain.

2.3. (optional) Set up Amazon S3 bucket for backups

Optionally, you can create Amazon S3 bucket for backups of the wiki. Empire data in the wiki obviously does not have to be backed up, because they can always be repushed from database spreadsheet, but other wiki settings like uploaded images, custom pages, etc. would be gone if something happens to your server. Note that together with the bucket you need to create also user with the programmable access via Access key ID and Secret access key.

Even though this step is optional, wee strongly recommend creating Amazon S3 bucket for the backups.

2.4. Set up wiki using docker

When you have a server and, optionally, backup bucket set up, it is time for the actual installation of Project Empire wiki.

First, you will need to clone this repository to the server. For example in the home folder of the logged-in user run following:

$ git clone git@github.com:investigace/project-empire.git

That should create project-empire folder with wiki folder inside. Go to the wiki folder.

$ cd project-empire/wiki

Next is providing the configuration. Copy .env.example and rename the copy to .env. After that, edit .env with editor of your choice and fill in your configuration. Here we will be using vim.

$ cp .env.example .env
$ vim .env

Each of the configuration options in .env is explained, so go through each of them and set them as you need. When finished, save the changes to the .env file and close it.

Now we run the installation. We recommend running it as root, so type in following:

$ sudo su
$ docker-compose up

That should build images of all the wiki services, create docker containers and run them. Note that it can take few minutes to finish. If everything is fine, the output should end with similar to following:

At this point, the wiki is installed and running at your picked domain, but without correct certificates - if you navigate to the domain in the browser you should see warning. That is to be expected, because the certificates were just obtained from Lets Encrypt and nginx service needs to be restarted to load them. Since you cannot run the wiki in the terminal forever anyway, we will restart the wiki as daemon.

First hit Ctrl+C to stop current running services (you might have to wait a bit till they stop), And when that is done, type in following, which will start the services as daemon in background.

$ docker-compose up -d

Now if you navigate to the domain in your browser, you should see your Project Empire wiki without any warnings, ready to be used.

3. Push data from database to the wiki

When you have wiki running, next step is to push there the data from the database. We have prepared for it script written in Python you have to run on your machine. Subsequent instructions expect *nix system, so if you are on Windows machine, run them in Windows Subsystem for Linux (WSL).

To be able to run the script, you first need to make sure you have Python of at least version 3.8 and have it available in terminal. To check both you can run following command to get the Python version:

$ python3 --version

Continue with cloning this repository to your machine and then open the scripts folder in the repository

$ git clone git@github.com:investigace/project-empire.git
$ cd project-empire/scripts

We are using Python's virtual environment to separate all the libraries needed by scripts from other libraries you might have installed on your system. To initialize the virtual environment, run inside scripts folder:

$ python3 -m venv .venv
$ . .venv/bin/activate

When inside virtual environment, install the libraries:

$ pip3 install -r requirements.txt

Now you should be set up to actually run the scripts. Note that next time you want to run the scripts, you don't need to fully initialize the virtual environment, it is enough to just run . .venv/bin/activate.

The script for pushing data from the database currently cannot read data directly from Google spreadsheet, so to provide data you first need to download spreadsheet as Microsoft Excel file. Open your database Google spreadsheet, click option File > Download > Microsoft Excel (.xlsx) and save the Excel file somewhere on your system.

Let's finally push the data. The script push_empire_database_to_wiki.py takes 3 arguments, path to the database Excel file, domain of the wiki and wiki user to use. Here is an example how we run it for the demo:

$ ./push_empire_database_to_wiki.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls project-empire-demo.investigace.cz admin

After that the script asks for password of the wiki user. When correct, the script will compute all the needed changes to the wiki pages and asks whether to do the actual changes. Here is an example of output when the script was first run for the demo:

$ ./push_empire_database_to_wiki.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xlsx project-empire-demo.investigace.cz admin
Password for user admin at Empire wiki https://project-empire-demo.investigace.cz/: ********************
Loaded Empire database: 6 legal entities, 2 people, 0 subsidies
Connected to Empire wiki
Preparing changes to be pushed...

Prepared following changes:

Pages to be created:
  AB private trust I
  AB private trust II
  AGROFERT, a.s.
  AGROFERT, a.s. (1994-2005)
  Agrofert USA, Inc.
  SynBiol, a.s.
  Andrej Babiš
  Zbyněk Průša

Pages to be updated:
  Legal entities overview
  People overview
  Template:Empire summary table

Are you sure you want to push these to Empire MediaWiki? (y/n)

If the changes look correct to you, you can type in y and that will push all the changes to wiki. Typing in n or anything else will stop the script.

And that's it. Congratulations! After pushing is done, you should be able to see all the data in the wiki.

Note that you have to run the script anytime you want to publish updated data from the database to wiki.

↑ Jump to Table of Contents


Database documentation

We have built the Project Empire database structure to best accommodate all the different data we encountered when mapping business empires and we will explain the structure in the following sections. Scripts and wiki depend on this structure — sheets and columns having correct names and data being in specific format —, but we understand that the needs of your projects can vary and there are safe ways to adjust the structure without breaking any of the scripts and wiki features. Adjusting is explained right after all the database spreadsheet sheets.

0. Introduction sheet

This sheet is only informative and is not used by scripts or wiki at all. Feel free to remove or completely change as you desire.

1. Legal entities sheet

First of the 3 main sheets, keeping information about legal entities like companies, trusts, etc. Every row in this sheet is one legal entity and its current details.

Column Example value Required Explanation
Database identifier AGROFERT, a.s. Yes Unique identification in this database. Should be name of the legal entity. When other sheets are referencing legal entity, they are referencing what is written in this column.

If there are more legal entities with same name, we recommend to differentiate them by adding explanation to the parenthesis. Eg. if there are two Acme Ltd. companies and one is in United Kingdom and the other in New Zealand, then one could have identifier "Acme Ltd. (UK)" and the other "Acme Ltd. (NZ)"
Legal entity type Company Yes Type of legal entity. Whether it is company, trust, etc. Scripts and wiki do not rely on any specific values here, so feel free to differentiate legal entity types by any values you wish.
Name AGROFERT, a.s. Yes The actual current name of the legal entity.
Country CZ Yes Country where the legal entity is currently founded. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc.
Identification number 26185610 No Identification for the legal entity in its country. Can be only number, but also any text, eg. in Germany it can be "Stendal HRB 12345".
Address Pyšelská 2327/2, Chodov, 149 00 Praha 4 No Current address of the legal entity. Ideally full, but can be also partial, eg. only city.
Foundation date 2000-07-01 No The date when the legal entity was founded. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here.
Dissolution date N/A No If the legal entity was dissolved, this is the date when the dissolution happened. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. If the company was not dissolved, we recommend to fill N/A.
Other notes Umbrella company for empire of Andrej Babiš No Add any other notes you want for the legal entity in this column.

1.1. Legal entities owners sheet

Sheet with information about legal entity owners. Every row in this sheet is one owner record for referenced legal entity. Ownerships are separate from other relationships as they are especially important when mapping the business structure and as the ownerships have the additional Owned percentage column.

Note that there are 2 types of ownership records you can define using this sheet. Either referencing one, which references some legal entity or person you have in the database, or partial one, which does not reference. The partial record can be useful eg. when you want full ownership history for some company, but don't actually want to track all the owners in the database, because some owners are not part of the business empire you are after.

Column Example value Required Explanation
Owned legal entity reference AGROFERT, a.s. Yes Legal entity which is being owned. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities.
Owner legal entity or person reference Andrej Babiš No If the owner is in the database, fill here Database identifier of either legal entity or person from the database. If the owner is not in the database for any reason, leave empty.
Owner type Person Yes Add "Legal entity" when the owner is a legal entity, "Person" when it is a person.
Owner name Andrej Babiš Yes Name of the owner, either legal entity name or person name.
Owner country CZ No Country of the owner. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc.
Owner address Pyšelská 2327/2, Chodov, 149 00 Praha 4 No Current address of the owner. Ideally full, but can be also partial, eg. only city.
Owner legal entity identification number 26185610 No If the owner is legal entity, this column is for identification number in its country. Can be only number, but also any text, eg. in Germany it can be "Stendal HRB 12345".
Owner person date of birth 2000-01-01 No If the owner is person, this column is for date of birth of that person. Must be in ISO-8601 format, which is YYYY-MM-DD.
Owned percentage 100 No How many percents does the owner owns. Add value from 0 to 100. But also textual values like "Majority" are allowed.
Owned since date 2005-05-30 No The date when the ownership started. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here.
Owned until date 2017-02-02 No If the ownership ended, this is the date when that happened. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. If the ownership did not end, we recommend to fill N/A.
Ownership details In 2017 transferred ownership to trusts No Add any other notes you want for the ownership in this column.

1.2. Legal entities other relationships sheet

Sheet with information about other legal entity relationships than ownerships. Every row in this sheet is one relationship to some other legal entity or person, for example who was director of legal entity.

Note that there are 2 types of relationship records you can define using this sheet. Either referencing one, which references some legal entity or person you have in the database, or partial one, which does not reference. The partial record can be useful eg. when you want full relationship history for some company, but don't actually want to track all the related people and legal entities in the database, because some of those are not part of the business empire you are after.

Column Example value Required Explanation
Legal entity reference AGROFERT, a.s. Yes Legal entity for which this relationship is defined. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities.
Related legal entity or person reference Andrej Babiš No If the related legal entity or person is in the database, fill here Database identifier of their row in the database. If the legal entity or person is not in the database for any reason, leave empty.
Related type Person Yes Add "Legal entity" when the relationship is with a legal entity, "Person" when it is with a person.
Related name Andrej Babiš Yes Name of the related legal entity or person.
Related country CZ No Country of the related legal entity or person. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc.
Related address Pyšelská 2327/2, Chodov, 149 00 Praha 4 No Current address of the related legal entity or person. Ideally full, but can be also partial, eg. only city.
Related legal entity identification number 26185610 No If the relationship is with legal entity, this column is for identification number in its country. Can be only number, but also any text, eg. in Germany it can be "Stendal HRB 12345".
Related person date of birth 2000-01-01 No If the relationship is with person, this column is for date of birth of that person. Must be in ISO-8601 format, which is YYYY-MM-DD.
Related since date 2005-05-30 No The date when the relationship started. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here.
Related until date 2017-02-02 No If the relationship ended, this is the date when that happened. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here. If the relationship did not end, we recommend to fill N/A.
Relationship details Chairman of the board No Add any other notes you want for the relationship in this column.

1.3. Legal entities sources sheet

Sheet with sources of information about legal entities. If you don't need structured sources, it is enough to list sources in Other notes of legal entity. Every row in this sheet is one source used to gain information about a legal entity.

Column Example value Required Explanation
Legal entity reference AGROFERT, a.s. Yes Legal entity this source belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities.
Source summary Veřejný Rejstřík a Sbírka Listin - Ministerstvo Spravedlnosti České Republiky. Justice.cz. No Summary or name of the source document.
Information gained from source Company identifier, names, addresses, foundation date No What kind of information was obtained from the source.
Source last checked date 2020-07-06 No The date when the source was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD.
Source URL https://or.justice.cz/ias/ui/rejstrik-firma.vysledky?subjektId=525681&typ=UPLNY No Full URL of the source document.

1.4. Legal entities previous names sheet

Sheet for previous names of legal entities. You might not need previous names this much structured, but structure here can be useful when searching for historical records where the legal entity can be using some previous name.

Column Example value Required Explanation
Legal entity reference SynBiol, a.s. Yes Legal entity this previous name belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities.
Previous name SYNTHESIA a.s. Yes Previous name of the legal entity.
Named since date 2004-11-08 No The date the legal entity gained this name. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here.
Named until date 2006-03-27 No The date the legal entity lost this name. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here.

1.5. Legal entities previous addresses sheet

Sheet for previous addresses of legal entities. You might not need previous addresses this much structured, but structure here can be useful when searching for historical records where the legal entity can be using some previous address.

Column Example value Required Explanation
Legal entity reference AGROFERT, a.s. Yes Legal entity this previous address belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities.
Previous address Pyšelská 1234, Chodov, 149 00 Praha 4 Yes Previous address of the legal entity.
Address since date 2004-11-08 No The date the legal entity gained this address. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here.
Address until date 2006-03-27 No The date the legal entity lost this address. When full date, it must be in ISO-8601 format, which is YYYY-MM-DD. But if you don't have the full date, it is ok to just have year here.

1.6. Legal entities media mentions sheet

Sheet for information about mentions of legal entity in media. Every row in this sheet is one mention in media.

Column Example value Required Explanation
Legal entity reference AGROFERT, a.s. Yes Legal entity this media mention belongs to. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities.
Summary of the media mention Wikipedia page No Summary or name of the mention in media.
Media last checked date 2022-01-01 No The date when the mention was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD.
Media mention url https://cs.wikipedia.org/wiki/Agrofert No Full URL of the mention in media.

2. People sheet

Second of the 3 main sheets, keeping information about people. Every row in this sheet is one person and their current details.

Column Example value Required Explanation
Database identifier Andrej Babiš Yes Unique identification in this database. Should be name of the person. When other sheets are referencing person, they are referencing what is written in this column.

If there are more people with same name, we recommend to differentiate them by adding year of birth in the parenthesis. Eg. if there are two persons with name John Doe and one is born in 1950 and the other in 1980, then one would have identifier "John Doe (1950)" and the other "John Doe (1980)"
Full name Andrej Babiš Yes Full name of the person. Last name should be at the end.
Nationality CZ No Nationality of the person. Does not support multiple nationalities, so use Other notes to track nationalities of multinationals. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc.
Date of birth 2000-01-01 No The date of birth. Must be in ISO-8601 format, which is YYYY-MM-DD. Full date of birth is not published in the wiki, only the year.
Residence country CZ No Country of the person's residence. Must be two-letter country code defined by standard ISO 3166-1 alpha-2. Eg. Germany is DE, France is FR, United States are US, etc.
Residence full address Pyšelská 2327/2, Chodov, 149 00 Praha 4 No Current address of the person's residence. Ideally full, but can be also partial, eg. only city. Full address is not published in the wiki, only value in column Residence only city.
Residence only city Praha 4 No City from the current address of the person's residence.
Other notes Former prime minister of Czechia No Add any other notes you want for the person in this column.

2.1. People sources sheet

Sheet with sources of information about people. If you don't need structured sources, it is enough to list sources in Other notes of person. Every row in this sheet is one source used to gain information about a person.

Column Example value Required Explanation
Person reference Andrej Babiš Yes Person this source belongs to. Should have exactly the same value as Database identifier of some row in sheet 2. People.
Source summary AGROFERT INC. :: Massachusetts (US) :: OpenCorporates. Opencorporates.Com. No Summary or name of the source document.
Information gained from source Connection to AGROFERT INC. No What kind of information was obtained from the source.
Source last checked date 2020-07-06 No The date when the source was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD.
Source URL https://opencorporates.com/companies/us_ma/383446187 No Full URL of the source document.

3. Subsidies sheet

Third of the 3 main sheets, keeping information about subsidies. Every row in this sheet is one subsidy and its details. Note that this sheet does not have any amount columns, because the subsidies are often split into more payments from different providers or in different years and we keep the amounts on the payments only.

Column Example value Required Explanation
Database identifier CEDR-30/04UL-2003 Yes Unique identification in this database. Should be ideally project code. When other sheets are referencing subsidy, they are referencing what is written in this column.

If there are more subsidies with same project code, we recommend to differentiate them by adding something in the parenthesis. Eg. if there are two subsidies with code A-123 and one is from 2010 and the other from 2015, then one would have identifier "A-123 (2010)" and the other "A-123 (2015)"
Receiving legal entity reference AGROFERT, a.s. Yes Legal entity which received this subsidy. Should have exactly the same value as Database identifier of some row in sheet 1. Legal entities.
Year 2002 No Year in which the subsidy was received. Could be also range of years like "2007-2013".
Project name Zlepšení hygieny provozu při skladování drůbeže No Name of the subsidized project.
Project code 30/04UL-2003 No Code of the subsidized project.
Programme name SAPARD No Name of the programme the subsidy was received from.
Programme code SAPARD No Code of the programme the subsidy was received from.
Notes Signed in 2002, carried out in 2003. No Add any notes you want for the subsidy in this column.

3.1. Subsidies payments sheet

Sheet for keeping information about subsidy payments. Every row in this sheet is payment from one provider in given year belonging to some subsidy.

Column Example value Required Explanation
Subsidy reference CEDR-30/04UL-2003 Yes Subsidy this payment belongs to. Should have exactly the same value as Database identifier of some row in sheet 3. Subsidies.
Provider Czech Ministry of agriculture No Institution which provided this payment.
Year 2003 No Year the payment was made.
Original currency CZK No The original currency of the payment. Must be three-letter code defined by standard ISO 4217. Eg. Euro is EUR, US dollar is USD, Pound sterling is GBP, etc.
Amount in original currency 556,875.00 No Amount of the payment in the original currency. Must be either in the English notation (12,345.67) or without any notation (12345).
Amount in EUR 22,736.00 No Amount of the payment in Euro. Must be either in the English notation (12,345.67) or without any notation (12345).
Notes Using approximate exchange rate 24,49 CZK/EUR. No Add any notes you want for the payment in this column.

3.2. Subsidies sources sheet

Sheet with sources of information about subsidies. If you don't need structured sources, it is enough to list sources in Notes of subsidy. Every row in this sheet is one source used to gain information about a subsidy.

Column Example value Required Explanation
Subsidy reference CEDR-30/04UL-2003 Yes Subsidy this source belongs to. Should have exactly the same value as Database identifier of some row in sheet 3. Subsidies.
Source summary Subsidy record on Hlídač státu No Summary or name of the source document.
Information gained from source Full subsidy record No What kind of information was obtained from the source.
Source last checked date 2021-04-28 No The date when the source was last checked. Must be in ISO-8601 format, which is YYYY-MM-DD.
Source URL https://www.hlidacstatu.cz/Dotace/Detail/cedr-a85110a7e99bbff7370eebb382f90913c193db91 No Full URL of the source document.

Adjusting structure

Scripts rely on the structure to be able to load the data from the database spreadsheet, but there are safe ways to adjust the structure. Scripts only need:

  1. sheets to have the exact names,
  2. columns to have the exact headers,
  3. headers to be on row 4 and data starting on the row 5.

Which means that you can do anything, which does not mess up these needs. You can add new sheets or add new columns as you please. You can reorder the columns in any of the sheets. Or you can for example safely use colors or different fonts.

↑ Jump to Table of Contents


Wiki documentation

Restore wiki from S3 backup

If you want to restore wiki from S3 backup, there is s3-backup Docker service prepared for that. On the server, where you want to restore wiki from backup, prepare everything as you would when installing new wiki. That is: follow the steps from 2. Install wiki all the way up to command docker-compose up — you don't want to run this command as that would install fresh new wiki.

Instead of running docker-compose up, first start only database by running:

docker-compose up -d db

When the database is built and running, fire up s3-restore service, which will connect to the S3 bucket with backups, download the most recent backup and restore the wiki from it. Restoring in this situation is populating the database from database dump and restoring contents of mediawiki/shared folder. Note that to make it work, you need to provide in .env the credentials for accessing the correct S3 bucket.

Restore command:

docker-compose run --rm s3-restore

After the restore is done, just continue with the installation process by running docker-compose up. That's it!

↑ Jump to Table of Contents


Scripts documentation

All the scripts are written in Python v3 and should be therefore runnable on any system where Python v3 can be installed. Library dependencides are defined in requirements.txt file and we recommend installing them only in the virtual environment (venv) so the dependencies do not affect any other scripts you might have on your system. Before running any of the following scripts we therefore recommend running following to enter virtual environment and install the dependencies:

$ python3 -m venv .venv
$ . .venv/bin/activate
$ pip3 install -r requirements.txt

The scripts follow convention that if you run them without any arguments or with -h/--help option, help is printed with all the arguments and options of that script. Eg.:

$ ./push_empire_database_to_wiki.py
usage: push_empire_database_to_wiki.py [-h] DATABASE_EXCEL WIKI WIKI_USER

Push Empire database data to Empire wiki

positional arguments:
  DATABASE_EXCEL  Path to the Excel spreadsheet file with Empire database
  WIKI            Domain or full URL of Empire wiki, e.g. https://empirewiki.example.org/
  WIKI_USER       Username to use for login to Empire wiki

optional arguments:
  -h, --help      show this help message and exit

push_empire_database_to_wiki.py

For pushing data from database spreadsheet to the wiki. Has 3 arguments: path to the database spreadsheet downloaded as Excel, wiki domain and wiki user name. When run, it prompts for password of wiki user.

Example run:

$ ./push_empire_database_to_wiki.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls project-empire-demo.investigace.cz admin

fetch_relationships_of_cz_companies_from_ares.py

For fetching relationships of Czech companies from ARES database, which is managed by Czech Ministry of Finance. Fetches full history of ownerships and only current other relationships (like people on the board, etc.). Saves the result in relationships_of_cz_companies_from_ares.xlsx Excel file in same structure as is used in database spreadsheet for easy copying.

Has 1 argument: path to the database spreadsheet downloaded as Excel. And offers 3 options: one to cache XML files from ARES in case you are running the scripts multiple times and don't want to download fresh XMLs every time. Second option to automatically confirm continuing after the database is loaded by script. And the last option is to only fetch for companies which does not have any relationships so far in the database.

Example run:

$ ./fetch_relationships_of_cz_companies_from_ares.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls

With the options:

$ ./fetch_relationships_of_cz_companies_from_ares.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls --cache-ares-xmls --yes --only-companies-without-relationships

fetch_subsidies_of_cz_companies_from_hlidacstatu.py

For fetching subsidies of Czech companies from platform hlidacstatu.cz. Fetches all the subsidies and removes duplicate records as identified by hlidacstatu.cz. When run, it prompts for API key to hlidacstatu.cz, which you can obtain by creating an account there and then navigating to their API page. Saves the result in subsidies_of_cz_companies_from_hlidacstatu.xlsx Excel file in same structure as is used in database spreadsheet for easy copying.

Has 1 argument: path to the database spreadsheet downloaded as Excel. Does not offer any additional options.

Example run:

$ ./fetch_subsidies_of_cz_companies_from_hlidacstatu.py ~/Downloads/Project\ Empire\ -\ Demo\ \(Andrej\ Babiš\).xls

↑ Jump to Table of Contents


License

Everything in this repository is licensed under GNU General Public License v3.0.

↑ Jump to Table of Contents