A Python-based web scraping tool that automates the extraction of Annual Financial Reports from the Belgian Financial Services and Markets Authority (FSMA) STORI database.
This scraper navigates the FSMA STORI website, retrieves a list of registered issuers, and automatically downloads their Annual Financial Reports published from 2011 onwards. For each document, it extracts the issuer's Legal Entity Identifier (LEI) and saves the PDF using a standardized naming convention:
IssuerName_LEI_AnnualReport_PublicationDate_EN.pdf
Note: The script is configured to process 5 issuers as a demonstration. You can modify this setting to scrape all available issuers or any custom number.
Download the project ZIP file and extract it to your desired location, then navigate to the project folder:
cd WebSpiderWindows (Git Bash/MINGW64/PowerShell):
python -m venv venv
source venv/Scripts/activateWindows (Command Prompt):
python -m venv venv
venv\Scripts\activatemacOS/Linux:
python3 -m venv venv
source venv/bin/activateVerify activation: You should see (venv) at the beginning of your command prompt.
pip install -r requirements.txtThis installs:
selenium- Web automation frameworkwebdriver-manager- Automatic ChromeDriver management
python spider.pyThe script will:
- Open Chrome browser and navigate to FSMA STORI
- Process 5 issuers (configurable)
- Download English Annual Financial Reports to the
Output/folder
Check the downloaded files:
# View downloaded PDFs
ls Output/
### Step 6: Deactivate Virtual Environment
When finished:
```bash
deactivateWebSpider/
├── spider.py # Main scraper script
├── requirements.txt # Python dependencies
├── README.md # This file
├── Output/ # Downloaded PDF reports (created on first run)
│ └── *.pdf
└── venv/ # Virtual environment (created during setup)
To process more than 5 issuers, edit spider.py (around line 281):
max_issuers_to_process = 5 # Change this number to process more issuersDownloaded reports with standardized naming:
Ageas_529900T6UXZT0XS8RS47_AnnualReport_2024-04-30_EN.pdf
KBC_Group_5493008GNIDXL00JPR61_AnnualReport_2024-03-28_EN.pdf