Skip to content

DmitriySimb/financial-data-parser

Repository files navigation

Financial Data Parser

Description

Financial Data Parser is a Python CLI project for extracting financial indicators from Yahoo Finance.

The project parses a Yahoo Finance financials page, searches for a requested field, returns the extracted values, and includes additional tools for testing, profiling, and terminal-based performance visualization.

The main goal of this project is to demonstrate a complete workflow:

environment setup → data parsing → testing → profiling → performance analysis → visualization

Features

  • CLI interface for requesting financial data
  • Yahoo Finance HTML parsing
  • extraction of financial fields such as Total Revenue
  • input validation and error handling
  • virtual environment setup script
  • pytest-based tests
  • cProfile performance profiling
  • conversion of profiling results into .dat files
  • terminal visualization with termgraph

Project Structure

financial-data-parser/  
├── data/  
│   ├── profiling_calls.dat  
│   └── profiling_time.dat  
├── profiling/  
│   ├── profiling-http.txt  
│   ├── profiling-ncalls.txt  
│   ├── pstats-cumulative.txt  
│   └── stats.prof
├── screenshots/
│   ├── calls.png
│   └── time.png
├── scripts/  
│   ├── build_termgraph_data.py  
│   ├── pstats_top5.py  
│   └── visualize.sh  
├── tests/  
│   └── test_finance_parser.py  
├── .gitignore  
├── finance_parser.py  
├── main.py  
├── README.md  
├── README_RUS.md  
├── requirements.txt  
└── setup_env.py  

Requirements

  • Python 3.x
  • pip

Project dependencies are listed in requirements.txt:

  • beautifulsoup4
  • pytest
  • termgraph

Installation

Create a virtual environment and install dependencies:

python3 setup_env.py

Activate the virtual environment:

source .venv/bin/activate

Usage

Run the parser from the project root:

python3 main.py MSFT "Total Revenue"

Example output:

('Total Revenue', '293,812,000', '281,724,000', '245,122,000', '211,915,000', '198,270,000')

Arguments:

  • MSFT — ticker symbol
  • Total Revenue — requested financial field

Main Files

main.py

Entry point of the project.

It reads command-line arguments, calls the parser, and prints the result.

finance_parser.py

Contains the core parsing logic:

  • builds Yahoo Finance URL
  • sends HTTP request using urllib
  • parses HTML using BeautifulSoup
  • extracts the requested financial field

setup_env.py

Creates a local virtual environment and installs dependencies from requirements.txt.

tests/test_finance_parser.py

Contains tests for the parser.

scripts/build_termgraph_data.py

Extracts useful profiling data and converts it into .dat files for termgraph.

scripts/visualize.sh

Draws terminal charts from .dat files.

scripts/pstats_top5.py

Reads stats.prof and prints the top functions sorted by cumulative time.


Testing

Run tests:

pytest

The tests check:

  • that the parser returns a tuple
  • that the first returned value matches the requested field
  • that invalid ticker names raise an exception
  • that invalid field names raise an exception

Profiling

The project includes profiling results generated with cProfile.

Generate profiling output sorted by internal time:

python3 -m cProfile main.py MSFT "Total Revenue" > profiling/profiling-http.txt

Generate profiling output sorted by number of calls:

python3 -m cProfile -s ncalls main.py MSFT "Total Revenue" > profiling/profiling-ncalls.txt

Generate binary profiling data for pstats:

python3 -m cProfile -o profiling/stats.prof main.py MSFT "Total Revenue"

Generate cumulative profiling report:

python3 scripts/pstats_top5.py > profiling/pstats-cumulative.txt

Building Data for Visualization

Convert profiling reports into .dat files:

python3 scripts/build_termgraph_data.py

This creates:

data/profiling_time.dat
data/profiling_calls.dat

Terminal Visualization

Visualize profiling data:

sh scripts/visualize.sh profiling_time.dat

profiling_time.dat

sh scripts/visualize.sh profiling_calls.dat

profiling_calls.dat


Performance Analysis

Profiling results show that the application is I/O-bound.

The dominant cost is network communication:

  • SSL socket reading accounts for most of the total execution time

CPU-bound operations such as:

  • HTML parsing
  • regex matching
  • list and dictionary operations

have a much smaller impact on performance.

Conclusion:

Optimizing Python code will not significantly improve performance.
The main optimization direction is reducing or parallelizing network requests.


Skills Demonstrated

  • Python scripting
  • CLI application development
  • virtual environment setup
  • dependency management
  • HTTP requests with urllib
  • HTML parsing with BeautifulSoup
  • error handling
  • unit testing with pytest
  • profiling with cProfile and pstats
  • performance analysis
  • terminal visualization with termgraph
  • project structuring

Notes

This project uses web scraping. Yahoo Finance page structure may change over time, which can affect parsing.

The project is intended for educational and portfolio purposes.

About

CLI tool for parsing financial data from Yahoo Finance with profiling, performance analysis, and terminal visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors