Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions #5

Open
DoctorDream opened this issue Dec 22, 2020 · 7 comments
Open

Some questions #5

DoctorDream opened this issue Dec 22, 2020 · 7 comments
Labels

Comments

@DoctorDream
Copy link

Thank you too much for this repository!I have spent nearly two weeks to research on how to crawl tweets with reply, but all repository like TWINT didn't work.
Do you know TWINT? I'm a developer from China. After using the proxy, TWINT still keeps reporting errors.

WARNING:root:Error retrieving https://twitter.com/: ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='twitter.com', port=443): Read timed out. (read timeout=10)",),), retrying

I saw you said that Twitter has blocked all crawlers during this period of time. Is twitter unable to use it for this reason, or is it because I am in China and set up a proxy incorrectly?

@Altimis
Copy link
Owner

Altimis commented Dec 22, 2020

Hi @DoctorDream, Thank you for your feedback. In fact I'm not sure that twint works these days, at least it didnt work for me and thats why I worked on Scweet. The thing that I'm sure about is that all API based scrapers dont work because they changed it to version 2. Did Scweet meet your requirements ? What should I add to improve it ?

@DoctorDream
Copy link
Author

@Altimis
Thank you very much for your reply, your program basically met my needs, but I also encountered a little bit of problems in the process of using.
I use Twitter crawler to collect conversations for academic research, but the timeline based structure of Twitter has caused me some difficulties.
When I crawl the tweets, there may be two consecutive tweets replying to different tweets, which makes it impossible for me to use them to form a dialogue.
Do you have a way to crawl tweets based on the main tweet, just like browsing on the web?
Thank you very much for your enthusiasm!

@Altimis
Copy link
Owner

Altimis commented Dec 23, 2020

@DoctorDream If I understood correctly, you want to scrape replies of every tweet, is it ? like for this tweet :
image
You want to click on the comments and gather all the replies (1k7 replies) . If that's true, it may be a true challenger for Scweet. Because first, you may be required to sign in to be able to view replies of a giver tweet, and seconde, the process may take too long since the script needs to have access to the replies (click) and scroll to scrape all of them.

@DoctorDream
Copy link
Author

@Altimis
Yes, that's what I means.
For a tweet, I don't have to collect all the responses. I just need to collect the highly praised ones, because those replies tend to be followed by more people.
I expect to spend weeks collecting data, so the length of time it takes won't have a big impact on me.
So, is it convenient for you to implement this function?
Thank you very much!

@Altimis
Copy link
Owner

Altimis commented Dec 23, 2020

@DoctorDream I think it is possible. I'll work on that.

@Altimis
Copy link
Owner

Altimis commented Dec 23, 2020

@DoctorDream I have a question for you. Are you supposed to have the tweet_id of a given tweet that you want to scrape its replies ? or you want to crawl all tweets and get their replies ?

@DoctorDream
Copy link
Author

@Altimis
Thank you very much!
Actually,I just need to crawl tweets with replies to form dialogues,so i dont need to crwal tweet with specific tweet_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants