PttCraweler

This is a repo fork from wy36101299/PTTcrawler
I rewrite it into a module so that it can be extended for further usage.

Usage

This is the basic usage of this script

python3 ptt_crawler [start-page] [end-page] [boardName]

the default value for start-page and end-page are the last page of the board the default value for boardName is Gossiping Note that the sequence matters.

After parsing from PTT, it will generte output.json. The format is as below

    "a_ID": 編號,
    "b_作者": 作者名,
    "c_標題": 標題,
    "d_日期": 發文時間,
    "e_ip": 發文ip,
    "f_內文": 內文,
    "g_推文": {
        "推文編號": {
            "狀態": 推 or 噓 or →,
            "留言內容": 留言內容,
            "留言時間": 留言時間,
            "留言者": 留言者
        }
    },
    "h_推文總數": {
        "all": 推文數目,
        "噓": 噓數,
        "推": 推數,
        "none": →數
    },
    "i_連結": 原始連結

Prerequisites

Python3

pipenv install

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
.mdlintrc		.mdlintrc
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
ptt_crawler.py		ptt_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PttCraweler

Usage

Prerequisites

About

Releases

Packages

Languages

Lee-W/PttCrawler

Folders and files

Latest commit

History

Repository files navigation

PttCraweler

Usage

Prerequisites

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages