FSMA STORI Web Scraper

A Python-based web scraping tool that automates the extraction of Annual Financial Reports from the Belgian Financial Services and Markets Authority (FSMA) STORI database.

📖 Overview

This scraper navigates the FSMA STORI website, retrieves a list of registered issuers, and automatically downloads their Annual Financial Reports published from 2011 onwards. For each document, it extracts the issuer's Legal Entity Identifier (LEI) and saves the PDF using a standardized naming convention:

IssuerName_LEI_AnnualReport_PublicationDate_EN.pdf

Note: The script is configured to process 5 issuers as a demonstration. You can modify this setting to scrape all available issuers or any custom number.

🚀 Quick Start Guide

Step 1: Download and Extract

Download the project ZIP file and extract it to your desired location, then navigate to the project folder:

cd WebSpider

Step 2: Set Up Virtual Environment

Windows (Git Bash/MINGW64/PowerShell):

python -m venv venv
source venv/Scripts/activate

Windows (Command Prompt):

python -m venv venv
venv\Scripts\activate

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Verify activation: You should see (venv) at the beginning of your command prompt.

Step 3: Install Dependencies

pip install -r requirements.txt

This installs:

selenium - Web automation framework
webdriver-manager - Automatic ChromeDriver management

Step 4: Run the Scraper

python spider.py

The script will:

Open Chrome browser and navigate to FSMA STORI
Process 5 issuers (configurable)
Download English Annual Financial Reports to the Output/ folder

Step 5: Review Results

Check the downloaded files:

# View downloaded PDFs
ls Output/

### Step 6: Deactivate Virtual Environment

When finished:
```bash
deactivate

📁 Project Structure

WebSpider/
├── spider.py                    # Main scraper script
├── requirements.txt             # Python dependencies
├── README.md                    # This file
├── Output/                      # Downloaded PDF reports (created on first run)
│   └── *.pdf
└── venv/                        # Virtual environment (created during setup)

⚙️ Configuration

To process more than 5 issuers, edit spider.py (around line 281):

max_issuers_to_process = 5  # Change this number to process more issuers

📊 Output

1. PDF Files (`Output/` folder)

Downloaded reports with standardized naming:

Ageas_529900T6UXZT0XS8RS47_AnnualReport_2024-04-30_EN.pdf
KBC_Group_5493008GNIDXL00JPR61_AnnualReport_2024-03-28_EN.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSMA STORI Web Scraper

📖 Overview

🚀 Quick Start Guide

Step 1: Download and Extract

Step 2: Set Up Virtual Environment

Step 3: Install Dependencies

Step 4: Run the Scraper

Step 5: Review Results

📁 Project Structure

⚙️ Configuration

📊 Output

1. PDF Files (`Output/` folder)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
requirements.txt		requirements.txt
spider.py		spider.py

Folders and files

Latest commit

History

Repository files navigation

FSMA STORI Web Scraper

📖 Overview

🚀 Quick Start Guide

Step 1: Download and Extract

Step 2: Set Up Virtual Environment

Step 3: Install Dependencies

Step 4: Run the Scraper

Step 5: Review Results

📁 Project Structure

⚙️ Configuration

📊 Output

1. PDF Files (Output/ folder)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. PDF Files (`Output/` folder)

Packages