Skip to content

chiehwen/ptt-crawler

 
 

Repository files navigation

ptt-crawler

crawl ptt articles from its website

usage:

scraping certain ptt board:

lsc crawler.ls <board-name>

All posts will be downloaded into data//post/ folder. There will also be a data//post-list.json to kepp track of your download history, so you can interrupt your download at any time and resume later.

categorize authors by title:

lsc cat.ls <board-name>

food.ls: example for fetching articles for article generation home-sale.ls: example for categorizing purpose of articles id-stat.ls: analyze users stand point. output to data//id-stat.json id-stat-show.ls: show users statistics, generate suspect.json.

About

crawl ptt articles from its website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published