Financial Data Parser is a Python CLI project for extracting financial indicators from Yahoo Finance.
The project parses a Yahoo Finance financials page, searches for a requested field, returns the extracted values, and includes additional tools for testing, profiling, and terminal-based performance visualization.
The main goal of this project is to demonstrate a complete workflow:
environment setup → data parsing → testing → profiling → performance analysis → visualization
- CLI interface for requesting financial data
- Yahoo Finance HTML parsing
- extraction of financial fields such as Total Revenue
- input validation and error handling
- virtual environment setup script
- pytest-based tests
- cProfile performance profiling
- conversion of profiling results into .dat files
- terminal visualization with termgraph
financial-data-parser/
├── data/
│ ├── profiling_calls.dat
│ └── profiling_time.dat
├── profiling/
│ ├── profiling-http.txt
│ ├── profiling-ncalls.txt
│ ├── pstats-cumulative.txt
│ └── stats.prof
├── screenshots/
│ ├── calls.png
│ └── time.png
├── scripts/
│ ├── build_termgraph_data.py
│ ├── pstats_top5.py
│ └── visualize.sh
├── tests/
│ └── test_finance_parser.py
├── .gitignore
├── finance_parser.py
├── main.py
├── README.md
├── README_RUS.md
├── requirements.txt
└── setup_env.py
- Python 3.x
- pip
Project dependencies are listed in requirements.txt:
- beautifulsoup4
- pytest
- termgraph
Create a virtual environment and install dependencies:
python3 setup_env.pyActivate the virtual environment:
source .venv/bin/activateRun the parser from the project root:
python3 main.py MSFT "Total Revenue"Example output:
('Total Revenue', '293,812,000', '281,724,000', '245,122,000', '211,915,000', '198,270,000')
Arguments:
- MSFT — ticker symbol
- Total Revenue — requested financial field
Entry point of the project.
It reads command-line arguments, calls the parser, and prints the result.
Contains the core parsing logic:
- builds Yahoo Finance URL
- sends HTTP request using urllib
- parses HTML using BeautifulSoup
- extracts the requested financial field
Creates a local virtual environment and installs dependencies from requirements.txt.
Contains tests for the parser.
Extracts useful profiling data and converts it into .dat files for termgraph.
Draws terminal charts from .dat files.
Reads stats.prof and prints the top functions sorted by cumulative time.
Run tests:
pytestThe tests check:
- that the parser returns a tuple
- that the first returned value matches the requested field
- that invalid ticker names raise an exception
- that invalid field names raise an exception
The project includes profiling results generated with cProfile.
Generate profiling output sorted by internal time:
python3 -m cProfile main.py MSFT "Total Revenue" > profiling/profiling-http.txtGenerate profiling output sorted by number of calls:
python3 -m cProfile -s ncalls main.py MSFT "Total Revenue" > profiling/profiling-ncalls.txtGenerate binary profiling data for pstats:
python3 -m cProfile -o profiling/stats.prof main.py MSFT "Total Revenue"Generate cumulative profiling report:
python3 scripts/pstats_top5.py > profiling/pstats-cumulative.txtConvert profiling reports into .dat files:
python3 scripts/build_termgraph_data.pyThis creates:
data/profiling_time.dat
data/profiling_calls.dat
Visualize profiling data:
sh scripts/visualize.sh profiling_time.datsh scripts/visualize.sh profiling_calls.datProfiling results show that the application is I/O-bound.
The dominant cost is network communication:
- SSL socket reading accounts for most of the total execution time
CPU-bound operations such as:
- HTML parsing
- regex matching
- list and dictionary operations
have a much smaller impact on performance.
Conclusion:
Optimizing Python code will not significantly improve performance.
The main optimization direction is reducing or parallelizing network requests.
- Python scripting
- CLI application development
- virtual environment setup
- dependency management
- HTTP requests with
urllib - HTML parsing with
BeautifulSoup - error handling
- unit testing with
pytest - profiling with
cProfileandpstats - performance analysis
- terminal visualization with
termgraph - project structuring
This project uses web scraping. Yahoo Finance page structure may change over time, which can affect parsing.
The project is intended for educational and portfolio purposes.

