HTTP Error, Gives 404 but the URL is working #98

sagefuentes · 2020-09-18T01:17:40Z

Hi, I had a script running over the past weeks and earlier today it stopped working. I keep receiving HTTPError 404, but the provided link in the errors still brings me to a valid page.
Code is (all mentioned variables are established and the error specifically happens with the Manager when I check via debugging):
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(term)\ .setMaxTweets(max_count)\ .setSince(begin_timeframe)\ .setUntil(end_timeframe) scraped_tweets = got.manager.TweetManager.getTweets(tweetCriteria)

The error message for this is the standard 404 error
"An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser:" followed by the valid link

As I have changed nothing about the folder, I am wondering if something has happened with my configurations more so than anything else, but wondering if others are experiencing this.

The text was updated successfully, but these errors were encountered:

alberto-valdes · 2020-09-18T01:50:21Z

Hello @sagefuentes, I'm dealing with the exact same issue, I also have been downloading tweets for the past weeks and it suddenly stops working giving me error 404 with a valid link.

I've no idea what might be the cause...

caiyishu · 2020-09-18T02:33:45Z

So like me. I also suddenly encounter this problem today, but all things went well yesterday.

taoyudong · 2020-09-18T03:27:23Z

I am dealing with the same issue here. This is something new today and is caused by some changes/bugs on Twitter server side. If using the command with debug=True, the URL used to get tweets is no longer available. Seeking for solution now.

mwaters166 · 2020-09-18T04:00:05Z

Also started having the same issue today.

MithilaGuha · 2020-09-18T05:01:40Z

I'm having the same issue as well! Does anyone have a solution for it?

baraths92 · 2020-09-18T05:02:59Z

Yes. I am having the same issue. Guess everyone are having the issue.

alastairrushworth · 2020-09-18T06:50:03Z

I'm not sure if it is related to this issue, but some of the user_agents seem to be out of date

    user_agents = [
        'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0',
        'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:62.0) Gecko/20100101 Firefox/62.0',
        'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0',
        'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',
        'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
        'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
        'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15',
    ]

GetOldTweets3/GetOldTweets3/manager/TweetManager.py

Line 13 in 54a8e73

user_agents = [

stevedwards · 2020-09-18T07:22:21Z

same

Sebastokratos42 · 2020-09-18T08:14:05Z

Seems to be a "bigger" problem? Also other scrappers have problems.
twintproject/twint#915 (comment)

Daviey · 2020-09-18T08:54:43Z

Here is debug enabled. It shows the actual url being called, and it seems that twitter has removed the /i/search/timeline endpoint. :(

https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3AREDACTED&src=typd

danielo93 · 2020-09-18T09:15:37Z

Same problem, damn

inactivist · 2020-09-18T11:10:56Z

I'm not sure if it is related to this issue, but some of the user_agents seem to be out of date

I forked and created a branch to allow a user-specified UA, using samples from my current browser doesn't fix the problem.

I notice the search and referrer URL shown in--debug output (https://twitter.com/i/search/timeline) returns a 404 error:

$ GetOldTweets3 --username twitter --debug 
/home/inactivist/.local/bin/GetOldTweets3 --username twitter --debug
GetOldTweets3 0.0.11
Downloading tweets...
https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3Atwitter&src=typd

$ curl -I https://twitter.com/i/search/timeline
HTTP/2 404 
[snip]

EDIT The url used for the internal search, and the one shown in the exception message, aren't the same...

baraths92 · 2020-09-18T11:18:28Z

I tried replacing https://twitter.com/i/search/timeline with https://twitter.com/search?. 404 error is gone but now there is 400 bad request error.

herdemo · 2020-09-18T11:19:23Z

Unfortunately i have same problem, i hope we find a solution as soon as possible.

inactivist · 2020-09-18T11:41:34Z

I tried replacing https://twitter.com/i/search/timeline with https://twitter.com/search?. 404 error is gone but now there is 400 bad request error.

Switching to mobile.twitter.com/search and using a modern User-Agent header seems to get us past the 400 bad request error, but then we get Error parsing JSON...

rennanharo · 2020-09-18T12:45:09Z

Same thing for me. I get an error 404 but the URL is working.

shelu16 · 2020-09-19T02:12:04Z

I have same issue

maldil · 2020-09-19T05:34:48Z

I am experiencing the same issue. Any plan to fix the issue?

alifzl · 2020-09-19T05:59:17Z

same issue, somebody help.

chinmuxmaximus · 2020-09-19T06:39:22Z

Same issue. The same code was working a day back now its giving error 404 with a valid link

GabrielEspeschit · 2020-09-19T14:17:39Z

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

rsafa · 2020-09-19T15:49:03Z

I am having the same issue. It was more robust than Tweepy. I hope we find a solution as soon as possible.

herdemo · 2020-09-19T18:21:52Z

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api.
I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis

Fiyinfoluwa6 · 2020-09-20T12:21:49Z

I have same issue. Need some help here

GabrielEspeschit · 2020-09-20T15:07:20Z

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api.
I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis

I see! I'm fairly new to scrapping, but I'm working on a end of course thesis about sentiment analysis and could really use some newer tweets to help me out.

I've been tinkering with GOT3's code a bit and got it to read the HTML of the search timeline, however it's mostly unformatted. Like I said, I have little experience with scrapping so I'm really struggling to format it correctly. However, I will note my changes, for reference and for someone with more experience to pick-up if they so wish:

updated user_agents (updated with the ones used by TWINT);
updated endpoint (/search?)
some updates to the URL structure:

      url = "https://twitter.com/search?"

        

        url += ("q=%%20%s&src=typd%s"
                "&include_available_features=1&include_entities=1&max_position=%s"
                "&reset_error_state=false")

        if not tweetCriteria.topTweets:
            url += "&f=live"`

Edit: Forgot to say this. Sometimes the application gives me a 400: Bad Request, I run it again, and it outputs the HTML like said before.

burakoglakci · 2020-11-14T18:59:34Z

Thanks you so much @sufyanhamid I'm happy if it helped.
As far as I know, the bounding box query cannot be run on snscrape, as in the Twitter Stream API. You can use the geocode query instead as in Twitter Rest API.
Ex.

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('geocode:40.682299,-73.944852,5mi + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break

With this query, you can collect tweets within 5 miles, surrounding the point coordinate you specify. As far as I know, you can write till 15 miles.

sufyanhamid · 2020-11-14T22:50:56Z

@sbif

Hi guys!
I'm totally lost: how can I use snscrape to extract tweet from a user in a specific time lapse?
I'm a beginner with Python, I have to do this for my thesis: It's three weeks I'm trying to extract this data without success, I tried with tweepy and than with GetOldTweets3 and I've just discovered about this new TwitterApi limit...
Can somebody help me please?

Use this query with snscrape:

import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@billgates + since:2015-12-02 until:2020-11-05-filter:links -filter:replies').get_items()):
if i > maxTweets :
break
csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

@burakoglakci Thanks for reply. One more thing that what is the query to fetch the Number of( Comments, Retweets, Likes ) also where I can learn that how to write the query using snstwitter. Kindly share this point as well.

iuserea · 2020-11-15T13:52:23Z

@burakoglakci Thank you for sharing your code!But when I run it,the error below happened.My computer is in China,and I can get to tweet only by using VPN.Could you help me figure it out?

Error retrieving https://twitter.com/search?f=live&lang=en&q=deprem+%2B+place%3A5e02a0f0d91c76d2+%2B+since%3A2020-10-31+until%3A2020-11-03+-filter%3Alinks+-filter%3Areplies&src=spelling_expansion_revert_click: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=deprem+%2B+place%3A5e02a0f0d91c76d2+%2B+since%3A2020-10-31+until%3A2020-11-03+-filter%3Alinks+-filter%3Areplies&src=spelling_expansion_revert_click (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000020FFB1C8D30>, 'Connection to twitter.com timed out. (connect timeout=10)'))")), retrying

Woolwit · 2020-11-15T19:34:36Z

Anyone have a tip for getting all the tweets in an individual's timeline? Have managed to get user tweets (thank you @burakoglakci for your example) but would like to get the tweets the user retweets as well (tweet.retweetedTweet didn't get it). And for any other noobish coders out there, just in case this helps.

import snscrape.modules.twitter as sntwitter

maxTweets = 10

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@TwitterSupport').get_items()) :
        if i > maxTweets :
            break  
        print(f"the date is {tweet.date}")  
        print(f"the user name is {tweet.user.username}")
        print(f"the tweet content is {tweet.content}")
        print(f"the tweet rendered content is {tweet.renderedContent}")
        print(f"the outlinks are {tweet.outlinks}")  
        print(f"the tco outlinks are {tweet.tcooutlinks}") 
        print(f"the url is {tweet.url}")
        print(f"the retweeted tweet is  {tweet.retweetedTweet}")        
        print(f"the quoted tweet is  {tweet.quotedTweet}")

lorenzopetra96 · 2020-11-16T15:24:05Z

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY
I think I solved the problem. I made a few changes to the lines. I collect tweets using a word and location filter. I'm using Python 3.8.6 on Windows 10 and it works fine right now.
import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()
@burakoglakci thanks for sharing your experience and work with us!! Its really appreciable and its help me a lot.
I want to ask that what will the query string (using snscraper) if we want to get the tweets according to longitude and latitude also how we can find the geo-location of any city/country on twitter.
Thanks in advance :)

Hi!
First of all, I am so grateful for all the support, thank you.
I have a problem with place id. I need the Arizona and Florida place id but I cannot find them. Can anyone tell me how I can take these (and other place Id), please?
Thanks in advance <3

J-t-p · 2020-11-16T21:19:32Z

I found another simple alternative in case people are having trouble with snscrape. It involves the requests and bs4 (Beautiful Soup) libraries
`
import requests
from bs4 import BeautifulSoup

contents = requests.get("https://mobile.twitter.com/username")
soup = BeautifulSoup(contents.text, "html.parser")

tweets = soup.find_all("tr", {"class":"tweet-container"})
latest = tweets[0]
print(latest.text)
`

This will give you a list with the html for, if my counting is correct, the last 20 tweets from that account. Obviously, this will not be very useful if you need more than that, but if you don't, then this should work until GOT3 is fixed.

A few things to note: 1. You have to use the mobile link. It does not work with the normal link. (This code can still be ran on a desktop computer even with the mobile link) 2. You can use .text to print/store the tweet to a variable without all the html code.

As you can see, this code is very bare bones, so feel free to play around with it and add anything I missed or that you think would be useful.

sufyanhamid · 2020-11-16T23:24:08Z

Anyone have a tip for getting all the tweets in an individual's timeline? Have managed to get user tweets (thank you @burakoglakci for your example) but would like to get the tweets the user retweets as well (tweet.retweetedTweet didn't get it). And for any other noobish coders out there, just in case this helps.
import snscrape.modules.twitter as sntwitter

maxTweets = 10

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@TwitterSupport').get_items()) :
        if i > maxTweets :
            break  
        print(f"the date is {tweet.date}")  
        print(f"the user name is {tweet.user.username}")
        print(f"the tweet content is {tweet.content}")
        print(f"the tweet rendered content is {tweet.renderedContent}")
        print(f"the outlinks are {tweet.outlinks}")  
        print(f"the tco outlinks are {tweet.tcooutlinks}") 
        print(f"the url is {tweet.url}")
        print(f"the retweeted tweet is  {tweet.retweetedTweet}")        
        print(f"the quoted tweet is  {tweet.quotedTweet}") 

@Woolwit Thanks for share the more attributes of a tweets. Kindly also share the code/qurey of that how we can get the no.likes, no.retweets, no.comments.
Thanks in advance.

burakoglakci · 2020-11-17T13:57:55Z

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY **Sanırım sorunu **çözdüm
. Hatlarda birkaç değişiklik yaptım. Bir kelime ve konum filtresi kullanarak tweet topluyorum. Windows 10'da Python 3.8.6 kullanıyorum ve şu anda iyi çalışıyor.
import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()
@burakoğlakcı deneyiminizi paylaştığınız ve bizimle çalıştığınız için teşekkürler !! Gerçekten takdire şayan ve bana çok yardımcı oluyor.
Eğer tweetleri enlem ve boylamlara göre almak istersek, ayrıca herhangi bir şehrin / ülkenin coğrafi konumunu twitter üzerinden nasıl bulabileceğimizi sormak istiyorum (snscraper kullanarak) sorgu dizesi ne olacak?
Şimdiden teşekkürler :)
Selam!
Öncelikle tüm desteğiniz için çok minnettarım, teşekkür ederim.
Yer kimliğiyle ilgili bir sorunum var. Arizona ve Florida yer kimliğine ihtiyacım var ama bulamıyorum. Biri bana bunları (ve başka yer kimliğini) nasıl alabileceğimi söyleyebilir mi, lütfen?
Şimdiden teşekkürler <3

Arizona USA id: a612c69b44b2e5da

Florida USA id: 4ec01c9dbc693497
to find these IDs, you have to run geocode query on twitter. Ex. geocode:34.684879,-111.699645,1mi this cooordinates allow you to search for a point location in Arizona. you can use any map service to access coordinates. then click on the content of a tweet that appears as a result of this query. you will see arizona, USA as the place name on this tweet content, if not, review another tweet. after clicking on the place name, you will see the place ID on the link in the search bar.

csbhakat · 2020-11-17T16:19:47Z

What is the code to get tweet likes and ,.retweet count?
I have tried with tweet.favorite_count,tweet.retweet_count, but no luck ?

jscas88 · 2020-11-19T03:02:32Z

Hello all !
I am a beginner with python & coding in general.
Do you think GOT will be updated anytime soon in order to resume timelines' scraping ?
Also, how to get more information out of the tweets currently extractable thanks to @burakoglakci and the use of snscrape ? Is it possible to get the number of likes, replies, etc. to tweets for example ?
I used the following code and it works fine thanks to all of you who offered an alternative to continue scraping Twitter 👍
import` snscrape.modules.twitter as sntwitter
import csv
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@username + since:2009-01-01 until:2020-11-05 -filter:links -filter:replies').get_items()):
    csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()
 

change from:@Username -> keywords:#hashtag to search by keyword as opposed to username

Thanks to all who made this code available! smooth program and helpful for current project!

burakoglakci · 2020-11-19T12:26:32Z

@csbhakat

https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it's useful :) You must have a Twitter developer account to use this method.

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

csbhakat · 2020-11-19T16:05:49Z

@burakoglakci:
Thanks for sharing this

@csbhakat
https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it's useful :) You must have a Twitter developer account to use this method.

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

@burakoglakci:
for this code , I need to get all the links and store into the "Your_Text_File.txt" file? and based on that link , this code will scrape the tweet , right?
Suppose , I want to get all tweets from March, 2020 to Oct,2020 for #amazon , then how can I do that ? is your code help in that case ?

burakoglakci · 2020-11-19T21:16:55Z

@burakoglakci:
Thanks for sharing this
@csbhakat
https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it's useful :) You must have a Twitter developer account to use this method.
import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)
@burakoglakci:
for this code , I need to get all the links and store into the "Your_Text_File.txt" file? and based on that link , this code will scrape the tweet , right?
Suppose , I want to get all tweets from March, 2020 to Oct,2020 for #amazon , then how can I do that ? is your code help in that case ?

first, use snscrape to collect the tweets you want, including tweet id and links. you can collect your tweets in csv or txt file.

Then collect tweet objects using this code.
The code I share here is based on tweepy. querying using tweet IDs and finding and collect the objects you want(like, retweet).

DV777 · 2020-11-20T16:30:19Z

@DV777 Hi!

https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it's useful :) You must have a Twitter developer account to use this method.

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

Thanks for your help @burakoglakci , I'd be lost without this.
Thing is when collecting a timeline, I do not get the retweets, replies and likes of the account I am scraping, and I guess these parameters apply to the tweets which are scraped already. I tried to find a way to scrape the full activity of an account but it seems quite hard. For example, even by using the following code :

import snscrape.modules.twitter as sntwitter
import csv
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@username + since:2009-01-01 until:2020-11-05 -filter:links -filter:replies').get_items()):
    csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

I do not get the retweets / replies / likes made by the account. Only its own created tweets. Is there a way to scrape the whole thing ? Would you have a list of the additional parameters which I could add to the scraping ?
Also, I do have these Twitter Api keys, problem being that tweepy & twitter api only let me collect 3000 tweets maximum when scraping an account's timeline when I was using it in 2019. Is this still the case ?

burakoglakci · 2020-11-21T12:37:52Z

@DV777 Yes, the parameters attached to tweepy apply to tweets that have already been scraped.

On snscrape if you remove the filter:replies parameter, you can get answers. You can also collect retweets by removing the filter:links parameter. But mostly collects the links of the main tweet. I don't know if there's a way to get the number of likes with snscrape.

sufyanhamid · 2020-11-26T16:37:11Z

@burakoglakci is theier any way to find the longitude and latitude of tweets using snscrape!!

elizabethsong · 2020-11-26T17:42:15Z

I just used snscrape to get tweets for individual user accounts, filtering by like count. See code here: https://github.com/elizabethhh/Twitter-Data-Mining-Astro/blob/main/testastroold.py.

DiameterEffect · 2020-11-26T18:08:31Z

I just used snscrape to get tweets for individual user accounts, filtering by like count. See code here: https://github.com/elizabethhh/Twitter-Data-Mining-Astro/blob/main/testastroold.py.

can it get over 200k tweets?

vinaigre552 · 2020-11-28T03:09:11Z

我只是使用snscrape来获取单个用户帐户的推文，并按计数进行过滤。在此处查看代码：https : //github.com/elizabethhh/Twitter-Data-Mining-Astro/blob/main/testastroold.py。

Error retrieving https://twitter.com/search?f=live&lang=en&q=from%3A%40GeminiTerms+%2B+since%3A2015-12-02+until%3A2020-11-10-filter%3Areplies&src=spelling_expansion_revert_click: ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='twitter.com', port=443): Read timed out. (read timeout=10)")), retrying
Have you encountered this problem?If so, how to solve it?

MartinBeckUT · 2020-12-03T22:31:01Z

I don't recommend using Tweepy with snscrape, it's not really efficient, you're basically scraping twice. When you scrape with snscrape there's a tweet object you can interact with that has a lot of information that will cover most use cases. I wouldn't recommend using tweepy's api.statuses_lookup unless you need specific information only offered through tweepy.

For those still unsure about using snscrape I did write an article for scraping with snscrape that I hope clears up any confusion about using that library, there's also python scripts and Jupyter notebooks I've created to build off of. I also have a picture in the article showing all the information accessible in snscrape's tweet object.
https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

Woolwit · 2020-12-03T23:02:26Z

Brilliant, thank you Martin!

axaygaid · 2020-12-05T21:54:54Z

So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ?

Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? :)

thanks !

MartinBeckUT · 2020-12-07T04:08:06Z

So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ?

Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? :)

thanks !

Yes, refer to my article as I mentioned above where I cover the basics of using snscrape instead because GetOldTweets3 is basically obsolete due to changes in Twitter's API https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

In regards to your specific use case, with snscrape you just put whatever query you want inside the quotes inside the TwitterSearchScraper method and adjust the since and until operators to whatever time range you'd want. I created a code snippet for you below. You can take out to i>500 if you don't want to restrict the amount of tweets you want but just want every single tweet.

import snscrape.modules.twitter as sntwitter
import pandas

tweets_list2 = []

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('#ripple since:2015-01-01 until:2016-01-01').get_items()):
    if i>500:
        break
    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
   
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

axaygaid · 2020-12-07T08:42:03Z

So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ?
Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? :)
thanks !

Yes, refer to my article as I mentioned above where I cover the basics of using snscrape instead because GetOldTweets3 is basically obsolete due to changes in Twitter's API https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

In regards to your specific use case, with snscrape you just put whatever query you want inside the quotes inside the TwitterSearchScraper method and adjust the since and until operators to whatever time range you'd want. I created a code snippet for you below. You can take out to i>500 if you don't want to restrict the amount of tweets you want but just want every single tweet.
import snscrape.modules.twitter as sntwitter
import pandas

tweets_list2 = []

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('#ripple since:2015-01-01 until:2016-01-01').get_items()):
    if i>500:
        break
    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
   
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

Hello,

Thank's for your precious answer ! :) i tried your code and i still get a bug, but now it seems to be on my internet config ? do you have an idea to fix it ?

the error msg :
Error retrieving https://twitter.com/search?f=live&lang=en&q=%23ripple+since%3A2015-01-01+until%3A2016-01-01&src=spelling_expansion_revert_click: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=%23ripple+since%3A2015-01-01+until%3A2016-01-01&src=spelling_expansion_revert_click (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7ffb694b28d0>, 'Connection to twitter.com timed out. (connect timeout=10)'))")), retrying

Also when i tried this code on another laptop that works even if it's the same config

Thank's a lot !

stefanocortinovis · 2020-12-07T14:48:33Z

Hey! For the ones struggling to use snscrape, I put together a little library to download tweets using snscrape/tweepy according to customizable queries. Although it's still a work in progress, check this repo if you want to give it a try :)

jis0324 · 2022-02-08T15:22:13Z

Hello, There
So what is the final solution to not meet 404 error status?
Until yesterday, this page worked with python requests. but from today, it does not work for me and it returns 404 errror status.

import requests

headers = {
    'Connection': 'keep-alive',
    'rtt': '300',
    'downlink': '0.4',
    'ect': '3g',
    'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-User': '?1',
    'Sec-Fetch-Dest': 'document',
    'Accept-Language': 'en-US,en;q=0.9,ko;q=0.8',
}

response = requests.get(
    'https://www.amazon.de/sp?marketplaceID=A1PA6795UKMFR9&seller=A135E02VGPPVQ&isAmazonFulfilled=1&ref=dp_merchant_link',
    headers=headers
)
print(response.status_code) # 404

I really will appreciate if I can get any help from you.
Regards.

DiameterEffect · 2022-02-08T20:48:33Z

Hey! For the ones struggling to use snscrape, I put together a little library to download tweets using snscrape/tweepy according to customizable queries. Although it's still a work in progress, check this repo if you want to give it a try :)
Hello, does this one get images and videos?

jajalipiao · 2022-02-12T19:15:00Z

I am having the same issue, does anyone have a solution for it?

libbyseline · 2022-07-29T15:45:05Z

I am having Twitter API errors today, though the usernames I'm searching for appear to be working. Any solutions? I work in R/rtweet, specifically using the tweetbotornot2 package.

LinqLover mentioned this issue Sep 18, 2020

Twitterscraper not working taspinar/twitterscraper#343

Open

LinqLover mentioned this issue Sep 18, 2020

WIP: Use Selenium to Enable Javascript / Real-Browser Scraping + Misc Fixes taspinar/twitterscraper#302

Open

ivanlewin mentioned this issue Sep 18, 2020

CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) twintproject/twint#915

Open

arnim mentioned this issue Sep 19, 2020

HTTP Error for new tweets arnim/COVID-19_PolCom#1

Open

tahmidrashid mentioned this issue Sep 21, 2020

Not getting all the tweets. #97

Open

avivfaraj mentioned this issue Nov 4, 2022

Fixed http error #102

Closed

HTTP Error, Gives 404 but the URL is working #98

HTTP Error, Gives 404 but the URL is working #98

Comments

sagefuentes commented Sep 18, 2020

alberto-valdes commented Sep 18, 2020 • edited Loading

caiyishu commented Sep 18, 2020

taoyudong commented Sep 18, 2020

mwaters166 commented Sep 18, 2020

MithilaGuha commented Sep 18, 2020 • edited Loading

baraths92 commented Sep 18, 2020

alastairrushworth commented Sep 18, 2020

stevedwards commented Sep 18, 2020

Sebastokratos42 commented Sep 18, 2020

Daviey commented Sep 18, 2020

danielo93 commented Sep 18, 2020

inactivist commented Sep 18, 2020 • edited Loading

baraths92 commented Sep 18, 2020

herdemo commented Sep 18, 2020

inactivist commented Sep 18, 2020 • edited Loading

rennanharo commented Sep 18, 2020

shelu16 commented Sep 19, 2020

maldil commented Sep 19, 2020

alifzl commented Sep 19, 2020

chinmuxmaximus commented Sep 19, 2020

GabrielEspeschit commented Sep 19, 2020

rsafa commented Sep 19, 2020

herdemo commented Sep 19, 2020

Fiyinfoluwa6 commented Sep 20, 2020

GabrielEspeschit commented Sep 20, 2020 • edited Loading

burakoglakci commented Nov 14, 2020 • edited Loading

sufyanhamid commented Nov 14, 2020

iuserea commented Nov 15, 2020

Woolwit commented Nov 15, 2020

lorenzopetra96 commented Nov 16, 2020

J-t-p commented Nov 16, 2020

sufyanhamid commented Nov 16, 2020

burakoglakci commented Nov 17, 2020

csbhakat commented Nov 17, 2020

jscas88 commented Nov 19, 2020

burakoglakci commented Nov 19, 2020

csbhakat commented Nov 19, 2020

burakoglakci commented Nov 19, 2020

DV777 commented Nov 20, 2020

burakoglakci commented Nov 21, 2020

sufyanhamid commented Nov 26, 2020

elizabethsong commented Nov 26, 2020

DiameterEffect commented Nov 26, 2020

vinaigre552 commented Nov 28, 2020

MartinBeckUT commented Dec 3, 2020

Woolwit commented Dec 3, 2020

axaygaid commented Dec 5, 2020

MartinBeckUT commented Dec 7, 2020 • edited Loading

axaygaid commented Dec 7, 2020

stefanocortinovis commented Dec 7, 2020 • edited Loading

jis0324 commented Feb 8, 2022

DiameterEffect commented Feb 8, 2022

jajalipiao commented Feb 12, 2022

libbyseline commented Jul 29, 2022

alberto-valdes commented Sep 18, 2020 •

edited

Loading

MithilaGuha commented Sep 18, 2020 •

edited

Loading

inactivist commented Sep 18, 2020 •

edited

Loading

inactivist commented Sep 18, 2020 •

edited

Loading

GabrielEspeschit commented Sep 20, 2020 •

edited

Loading

burakoglakci commented Nov 14, 2020 •

edited

Loading

MartinBeckUT commented Dec 7, 2020 •

edited

Loading

stefanocortinovis commented Dec 7, 2020 •

edited

Loading