Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the crawling of toutiao article urls. #536

Merged
merged 1 commit into from
Aug 27, 2023

Conversation

songzy12
Copy link
Collaborator

@songzy12 songzy12 commented Aug 27, 2023

Fix #518.

主要是两个修改:

  1. 通过搜索“头条文章”,我们发现有两种可能的文本“发布了头条文章”或“我发表了头条文章”
  2. 通过查看相应的链接,我们发现头条文章现在的url形式为 https://weibo.com/ttarticle/

已使用以下user id做了测试:

  1. 6045161833
    NgvSY9mW7,我发表了头条文章:《《小城人物志(2)》》 《小城人物志(2)》  ,https://weibo.com/ttarticle/p/show?id=2309404939358200529130 ,无,无,True,无,无,2023-08-27 14:36,李小李的iPhone 11,0,0,0

  2. 1955190431
    Ng2CODhDi,发布了头条文章:《關於不實報道的重要申明澄清》 關於不實報道的重要申明澄清  ,https://weibo.com/ttarticle/p/show?id=2309404938233426608435 ,无,无,True,无,无,2023-08-24 12:07,微博 weibo.com,42,15,10

@dataabc dataabc merged commit 4b9d66a into dataabc:master Aug 27, 2023
1 check passed
@dataabc
Copy link
Owner

dataabc commented Aug 27, 2023

已merge。头条文章确实改变了很多,修复的很好,赞。

@songzy12 songzy12 deleted the ttarticle branch August 28, 2023 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

无法抓取到头条文章url地址,在网页源码中存在该地址
2 participants