Skip to content

domingo1021/Financial-Statement-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial statement scraping

Reptile target

The reptile target of this project is Taiwan Stock Exchange (台灣證交所公開資訊觀測站, TWSE).

Package used

Mainly using Python programming language packages below to help departments to scrape financial statement online.

  • Python selenium
  • HTTP requests
  • Pandas

Step to utilize this project, First stage

  1. Prepare .csv file as input to this project. For the format of csv file, please refer to 測試檔.csv

  2. Input some information about the financial statement you want to search for.

  • Option (1): Set your input csv file path directly in InputLoader to initialize an input object as below:
InputLoader("./input/測試檔.csv")
  • Option (2) --> More flexible: Using tkinter to select .cvs file you want.
import tkinter as tk
from tkinter import filedialog

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename(initialdir = "./input")

loader = InputLoader(file_path)
  1. Type cmd command lines below to prepare for required environment.

    • Building virtual environment: python -m venv virtual

    • Activate virtual environment: virtual\Scripts\activate

    • Install requirements package for this project: pip install -r requirements.txt

  2. Type python main.py in terminal to run the project.

  3. After scraping financial statement, all the financial statement like 2330_108年第四季中文版合併財報.pdf will be stored in a directory belongs to each company respectively. And the directory name is set as the ticker number of the stock in Taiwan stock market, like 2330.

  4. User can also get an output.csv in ./output/輸出_{Input file name} to get access to more information about an individual scrape.

Result

For testing file, we imitate some real time condition, and provide 4 real demands file and 6 fake demands (which are some exceptions we want to catch) in 測試檔.csv file.

This project use only 1~2 minutes depends on your device to get correct answer!

Reminder

  1. The chromedriver.exe software is for Chrome 99 version, make sure you have upgraded Chrome version before using this project, otherwise, download chromedriver.exe for your current Chrome version.

  2. Please set a Time interval, which is time.sleep(10) in Python, about ten seconds between any two times of scrape in order to avoid anti-scraping.

About

Scrape Taiwan financial statement with Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages