TaiwanNewsCrawler

Cooperating with NTHU CTM (科技管理研究所) to collect news from top 3 news company in Taiwan

自由時報
中國時報
聯合新聞網

Introduction

This is a news crawler for 3 Taiwanese mainstream media. The crawlered media is listed below.

Media Type	Meida Name (CN)	Media Name (EN)	ID	Abbreviation
Print Media	自由時報	Liberty News	0	ltn
Print Media	聯合報	UDN News	1	udn
Print Media	中國時報	China Times	2	chinatimes

Usage

Import Model

from model import chinatimes_crawler, ltn_crawler, udn_crawler

Three crawler in model.py for different news company website:

chinatimes_crawler
ltn_crawler
udn_crawler

Constructor

## Build udn news crawler 聯合報
crawler1 = udn_crawler()
## Build ltn news crawler 自由時報
crawler2 = ltn_crawler()
## Build chinatimes news crawler 中時
crawler3 = chinatimes_crawler()

Crawling

Use search(keywords, pages_num, CSV) to retrieve the dataframe conatining news info within page_num. If CSV is set to True (default=False), the searching result will be save automatically as CSV file in the folder called search_result.

## search keywords: covid
## search first 100 pages results
## save the result as udn_covid_100.csv in search_result folder.
crawler1.search('covid', 100, CSV=True)
crawler2.search('covid', 100, CSV=True)
crawler3.search('covid', 100, CSV=True)

## search keywords: covid
## search first 100 results
## Do not save the result as .csv file
crawler1.search('covid', 100, CSV=False)
crawler2.search('covid', 100, CSV=False)
crawler3.search('covid', 100, CSV=False)

Result Example

Please refer to Demo.ipynb

TITLE	TIME	CATEGORY	DESCRIPTION	CONTENT	KEYWORDS	FROM	LINK
北市房仲傳染給新北客戶Ct值19.4 急匡列40人足跡公布	2022-01-25 15:23:00	udn	國內疫情持續延燒，北市日前出現不明感染源，北市男房仲（案18375）曾帶一名客戶（案1851...	國內疫情持續延燒，北市日前出現不明感染源，北市男房仲（案18375）曾帶一名客戶（案1851...	新冠肺炎,COVID-19	聯合新聞網	https://udn.com/news/story/120940/6059544

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
search_result		search_result
Demo.ipynb		Demo.ipynb
LICENSE		LICENSE
README.md		README.md
model.py		model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TaiwanNewsCrawler

Cooperating with NTHU CTM (科技管理研究所) to collect news from top 3 news company in Taiwan

Introduction

Usage

Result Example

About

Releases

Packages

Languages

License

JamesLiao714/TaiwanNewsCrawler

Folders and files

Latest commit

History

Repository files navigation

TaiwanNewsCrawler

Cooperating with NTHU CTM (科技管理研究所) to collect news from top 3 news company in Taiwan

Introduction

Usage

Result Example

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages