AioPTTCrawler (PTT 網路版爬蟲)

This is Python Package use to crawl PTT's article data by using asyncio.

Documentation

PyPi Page

pip install AioPTTCrawler

from AioPTTCrawler import AioPTTCrawler
ptt_crawler = AioPTTCrawler()

Usage

get data from PTT

ptt_crawler = AioPTTCrawler()

BOARD = "Gossiping"
ptt_data = ptt_crawler.get_board_latest_articles(board=BOARD, page_count=10)

ptt_crawler = AioPTTCrawler()

BOARD = "Gossiping"
ptt_data = ptt_crawler.get_board_articles(board=BOARD, start_index=100, end_index=200)

ptt_crawler = AioPTTCrawler()

BOARD = "Gossiping"
ptt_data = ptt_crawler.get_article_by_datetime(
    BOARD,
    datetime(year=2022, month=10, day=1),
    datetime(year=2022, month=10, day=2),
)

ptt_data is a PTTData object. To extract data you need to use get_article_dict(), get_article_dataframe(), get_article_list() etc

get dict from PTTData

article_dict = ptt_data.get_article_dict()
comment_dict = ptt_data.get_comment_dict()

article's dict format

[
    {
        "article" : "Article's ID. ex:M.1663144920.A.A6E",
        "article_title" : "Article's title. ex:[公告] 批踢踢27週年活動宣導公告更新",
        "user_id" : "Author's ID. ex: ubcs",
        "user_name" : "Author's name. ex:(覺★青年超冒險蓋)",
        "board" : "BBS Board ex: Gossiping",
        "datetime" : "Post time. ex: Wed Sep 14 16:41:58 2022.",
        "context" : "Context of article. ex: PTT 27 周年活動開始囉，本篇為置底宣導，詳情參閱下面資料...",
        "ip_address" : "IP address. ex: 59.120.192.119",
        "comment_list" : [
            {"comment_dict"},
            {"comment_dict"},
        ]
    }, {"..."}
]

comment's dict format

[
    {
        "article_id" : "Article's ID. ex:M.1663144920.A.A6E",
        "tag" : "comment's reaction. ex: 推 噓 →",
        "user_id" : "User's ID. ex: bill403777",
        "comment_order" : "order of comment. ex: 1",
        "context" : "Context of comment. ex: 錢",
        "datetime" : "Post time. ex: 09/14 16:42",
        "ip_address" : "27.53.96.42",
    }, {"..."}
]

use this article for example

Comparison

Used time difference between normal method and async method

(unit: second)

Support

You may report bugs, ask for help and discuss various other issues on the issuse

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
AioPTTCrawler		AioPTTCrawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
time-diff.png		time-diff.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AioPTTCrawler

AioPTTCrawler

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

time-diff.png

time-diff.png

Repository files navigation

AioPTTCrawler (PTT 網路版爬蟲)

Documentation

PyPi Page

Usage

get data from PTT

ptt_data is a PTTData object. To extract data you need to use get_article_dict(), get_article_dataframe(), get_article_list() etc

get dict from PTTData

use this article for example

Comparison

Used time difference between normal method and async method

(unit: second)

Support

About

Releases

Packages

Languages

License

DOUIF/aio-ptt-crawler

Folders and files

Latest commit

History

Repository files navigation

AioPTTCrawler (PTT 網路版爬蟲)

Documentation

Usage

get data from PTT

ptt_data is a PTTData object. To extract data you need to use get_article_dict(), get_article_dataframe(), get_article_list() etc

get dict from PTTData

use this article for example

Comparison

Used time difference between normal method and async method

(unit: second)

Support

About

Resources

License

Stars

Watchers

Forks

Languages