Skip to content

devils-angel/WebSpider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

FSMA STORI Web Scraper

A Python-based web scraping tool that automates the extraction of Annual Financial Reports from the Belgian Financial Services and Markets Authority (FSMA) STORI database.

📖 Overview

This scraper navigates the FSMA STORI website, retrieves a list of registered issuers, and automatically downloads their Annual Financial Reports published from 2011 onwards. For each document, it extracts the issuer's Legal Entity Identifier (LEI) and saves the PDF using a standardized naming convention:

IssuerName_LEI_AnnualReport_PublicationDate_EN.pdf

Note: The script is configured to process 5 issuers as a demonstration. You can modify this setting to scrape all available issuers or any custom number.

🚀 Quick Start Guide

Step 1: Download and Extract

Download the project ZIP file and extract it to your desired location, then navigate to the project folder:

cd WebSpider

Step 2: Set Up Virtual Environment

Windows (Git Bash/MINGW64/PowerShell):

python -m venv venv
source venv/Scripts/activate

Windows (Command Prompt):

python -m venv venv
venv\Scripts\activate

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Verify activation: You should see (venv) at the beginning of your command prompt.

Step 3: Install Dependencies

pip install -r requirements.txt

This installs:

  • selenium - Web automation framework
  • webdriver-manager - Automatic ChromeDriver management

Step 4: Run the Scraper

python spider.py

The script will:

  • Open Chrome browser and navigate to FSMA STORI
  • Process 5 issuers (configurable)
  • Download English Annual Financial Reports to the Output/ folder

Step 5: Review Results

Check the downloaded files:

# View downloaded PDFs
ls Output/

### Step 6: Deactivate Virtual Environment

When finished:
```bash
deactivate

📁 Project Structure

WebSpider/
├── spider.py                    # Main scraper script
├── requirements.txt             # Python dependencies
├── README.md                    # This file
├── Output/                      # Downloaded PDF reports (created on first run)
│   └── *.pdf
└── venv/                        # Virtual environment (created during setup)

⚙️ Configuration

To process more than 5 issuers, edit spider.py (around line 281):

max_issuers_to_process = 5  # Change this number to process more issuers

📊 Output

1. PDF Files (Output/ folder)

Downloaded reports with standardized naming:

Ageas_529900T6UXZT0XS8RS47_AnnualReport_2024-04-30_EN.pdf
KBC_Group_5493008GNIDXL00JPR61_AnnualReport_2024-03-28_EN.pdf

About

A Python-based web scraping tool that automates the extraction of Annual Financial Reports from the Belgian Financial Services and Markets Authority (FSMA) STORI database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages