GitHub - domingo1021/Financial-Statement-Scraping: Scrape Taiwan financial statement with Python

Financial statement scraping

Reptile target

The reptile target of this project is Taiwan Stock Exchange (台灣證交所公開資訊觀測站, TWSE).

Package used

Mainly using Python programming language packages below to help departments to scrape financial statement online.

Python selenium
HTTP requests
Pandas

Step to utilize this project, First stage

Prepare .csv file as input to this project. For the format of csv file, please refer to 測試檔.csv
Input some information about the financial statement you want to search for.

Option (1): Set your input csv file path directly in InputLoader to initialize an input object as below:

InputLoader("./input/測試檔.csv")

Option (2) --> More flexible: Using tkinter to select .cvs file you want.

import tkinter as tk
from tkinter import filedialog

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename(initialdir = "./input")

loader = InputLoader(file_path)

Type cmd command lines below to prepare for required environment.
- Building virtual environment: python -m venv virtual
- Activate virtual environment: virtual\Scripts\activate
- Install requirements package for this project: pip install -r requirements.txt
Type python main.py in terminal to run the project.
After scraping financial statement, all the financial statement like 2330_108年第四季中文版合併財報.pdf will be stored in a directory belongs to each company respectively. And the directory name is set as the ticker number of the stock in Taiwan stock market, like 2330.
User can also get an output.csv in ./output/輸出_{Input file name} to get access to more information about an individual scrape.

Result

For testing file, we imitate some real time condition, and provide 4 real demands file and 6 fake demands (which are some exceptions we want to catch) in 測試檔.csv file.

This project use only 1~2 minutes depends on your device to get correct answer!

Reminder

The chromedriver.exe software is for Chrome 99 version, make sure you have upgraded Chrome version before using this project, otherwise, download chromedriver.exe for your current Chrome version.
Please set a Time interval, which is time.sleep(10) in Python, about ten seconds between any two times of scrape in order to avoid anti-scraping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial statement scraping

Reptile target

Package used

Step to utilize this project, First stage

Result

Reminder

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
2330		2330
input		input
output		output
DataLoader.py		DataLoader.py
README.md		README.md
URLScraping.py		URLScraping.py
chromedriver.exe		chromedriver.exe
main.py		main.py
requirements.txt		requirements.txt

domingo1021/Financial-Statement-Scraping

Folders and files

Latest commit

History

Repository files navigation

Financial statement scraping

Reptile target

Package used

Step to utilize this project, First stage

Result

Reminder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages