Download stops after a lot of tweets #3

JaimeBadiola · 2018-11-06T12:09:12Z

I tried to download tweets with guery-search 'bitcoin' since 2018-02-18 until 2018-02-19. The issue is that the script stoped before the end of the until parameter

The log was too big to put it all, so I deleted the log of the first 31000 tweets.

You can find the log here

Can this be because twitter detects a bot downloading a lot of tweets?

Mottl · 2018-11-06T12:13:06Z

What dates did you expect to see in output_got.csv?

Keep in mind that tweets in output_got.csv are in reversed order (latest tweets are at the beginning)

JaimeBadiola · 2018-11-06T13:44:16Z

This is the date of the first tweet

19/02/2018 0:59

This is the date of the last tweet

18/02/2018 9:03

normally it should finished at

18/02/2018 1:00

Mottl · 2018-11-06T14:44:23Z

I haven't downloaded all the tweets since there are too many, but the first row is 2018-02-18 23:59:58 as expected (--until 2018-02-19)

JaimeBadiola · 2018-11-06T14:50:46Z

So do you have any ideas about why I have this issue?

Mottl · 2018-11-06T15:00:39Z

Are you using the generic version of GetOldTweets3 or you have changed some code?
Seems like timestamps of tweets in your CSV are not in UTC and thus they have 2018-02-19 date instead of 2018-02-18.

I've check with GetOldTweets3 --username barackobama --since X1 --until X2 and it works as expected. X1 is included, X2 is excluded (as in README.md)

JaimeBadiola · 2018-11-07T09:34:03Z

I have only modified the exporter to include lang parameter. Other than that the code is the same as yours.

About the UTC, you are right, I am not sure why, but the scrips saves the code in UTC + 1, I thought it was normal since that is the timezone I am in.

JaimeBadiola · 2018-11-07T09:36:10Z

I just tried again with python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-20 --until 2018-02-21 and the same happend.

The first tweet is

2018-02-20 23:59:53,jmauli,,0,0,"The Bitcoin is dropping going to enjoy short this down to XXXXX",,,,966100124899278848,https://twitter.com/jmauli/status/966100124899278848

the last tweet is

2018-02-20 07:11:38,LibertarianBee,,0,0,"@CoinWarz is not taking in consideration the TX fees that the miners are also receiving. #BCH has less TX than #BTC #Bitcoin.",,@CoinWarz,#BCH #BTC #Bitcoin,965846390088785921,https://twitter.com/LibertarianBee/status/965846390088785921

It downloaded 48757 tweets

Mottl · 2018-11-07T11:28:01Z

I've tried several times and reproduced your error. I will look deeper somewhat later.
Keep in mind that Twitter has per IP limitations. You could be banned for several days if you invoke too many requests.

Mottl · 2018-11-07T11:29:11Z

Twitter gave me the total number of tweets within this period of 49877.

JaimeBadiola · 2018-11-07T11:31:32Z

Ok Thanks a lot!

It is a pitty that you can not summit request by hour, since that would solve the issue.

Maybe we can set in the script that after an x number of tweets, the script will have to sleep for 3mn so the requests looks more natural.

Mottl · 2018-11-07T11:34:29Z

It is a pitty that you can not summit request by hour, since that would solve the issue.

I've tried it a couple of days earlier. Seems like they removed HH:MM:SS specification from since: and until:. You can try it yourself — may be they reverted back that functionality.

JaimeBadiola · 2018-11-07T11:35:56Z

Yeah I tried yesterday aswell, and it doesnt work.

Mottl · 2018-11-07T11:38:15Z

Btw, I've added --lang parameter: dd0924b

JaimeBadiola · 2018-11-07T14:46:29Z

Thanks a lot that is pretty good!

Send me a message if you find the solution to the issue. I will try to test some options as well.

JaimeBadiola · 2018-11-09T16:18:50Z

Hello,

I am leaving you a list of querys that stops during the download consistenly around the same downloads.

python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-18 --until 2018-02-19
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-19 --until 2018-02-20
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-24 --until 2018-02-25
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-02-25 --until 2018-02-26

It is very weird because there are some other days, with more number of tweets, that have no problem.

Some of these queries stop after 2 or 4 thousand tweets (Not a big number)

Just in case this helps solve and see the issue

JaimeBadiola · 2018-11-13T14:12:45Z

Hello!

To add to the list of queries with issues,

python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-02 --until 2018-03-03 | Number of tweets downloaded: 1186
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-03 --until 2018-03-04 | Number of tweets downloaded: 3747
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-04 --until 2018-03-05 | Number of tweets downloaded: 977
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-06 --until 2018-03-07 | Number of tweets downloaded: 24879
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-07 --until 2018-03-08 | Number of tweets downloaded: 19826
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-20 --until 2018-03-21 | Number of tweets downloaded: 29030
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-26 --until 2018-03-27 | Number of tweets downloaded: 1595
python Exporter.py --lang en --querysearch "bitcoin" --since 2018-03-29 --until 2018-03-30 | Number of tweets downloaded: 20469

The first query is relatively small so I runned the Debug option, here is the log
debug.log

I think this has something to do with the message of the tweet, or something that the script downloads, and makes it stop. The downloads fails consistently in with this queries at the same point, while runing other queries of full days (55000 messages) there is no issue. So I do not think Twitter is blocking the request, I think the program reads or scraps something that makes it think he is finished with the query.

Mottl · 2018-11-13T15:28:54Z

Ok, thanks

The problem is this: each response has the min_position value (an id) that should be used to access further tweets using max_position GET parameter. The problem occurs when for some reason the min_position returned by Twitter is invalid (in your log file the invalid min_position is cm+55m-JFJbaXvEDXJsaJEXEa-JFJbEvDDvsbIbXIabF). To fix this issue you can do the following:

Check if query has not returned any tweets and you expect that.
Make a query that was before your previous query and get min_position value from JSON response again. Look for duplicate tweets already saved and save the new tweets if any.
If the new min_position is the same as was given at the point 1. — break loop, since everything ok. Else continue the loop with the new min_position value given at the point 2.

Feel free to make a pull request if you fix this issue. Thanks!

JaimeBadiola · 2018-11-13T15:53:32Z

I am going to try! however, it seems complicated (I only started learning python 4 months ago...)

rahulha · 2018-11-15T16:02:45Z

Hi Jaime.

Twitter do not have any limitations on how much you can scrape, and Twitter do not track based on IP. On Orgneat I was downloading millions of tweets everyday and Twitter never blocked me. Based on Twitter's policy, you can always scrape the search query result without any issue. It is provided in https://twitter.com/robots.txt
The issue here is Min_position and Has_more_items flags. Twitter's legacy timeline caching system Haplo has its limitations. So when you start downloading millions of tweets, it runs out of memory and sometimes returns has_more_items as false. You can read about how twitter cache works in here

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html

Recently Twitter changed the min_position attribute from showing exact position to a hash based position that starts with cm+. Previously it used to come as TWEET-number-number. If you look closely, the structure of min_position is simple. Its the first and last Tweet ID concatenated with TWEET word. Check if you receive min_position starting with cm+ then just create your own min_position. Dont need to go back and get the last successful min_position.
Finally, based on my experience of over 10 billion tweets download, I can say that Twitter is not perfect as any other software system. It would crash and wont respond sometimes. So there is one more thing you need to do, change the until date and restart scraping.

If I summarize the logic this is what it would look like

i) If has_more_items is false, check if Date of last tweet received is same as Since date.
ii) If it is same, your query is complete. If it is not, assume Twitter did not respond. Request Twitter again with same query for 5 or so times. Twitter sometimes do respond.
iii) If twitter do not respond for 5 times, Set query's until date = Date of Last Tweet and request again.
iv) if min_position starts with cm+ set min_position = "TWEET-" + Tweet ID of Last tweet in result + "-" + tweet ID of first tweet in result.

@JaimeBadiola you already know my .Net code which have all of these condition checks. My old Python version code is attached here. I hope this helps.

TC-ProdVersion-Full.txt

aduriseti · 2018-11-20T23:48:37Z

Are you guys sure twitter doesn't block IPs? On my remote machine after a certain point (maybe 1mil tweets in 1000 requests) all my responses come back as zero length (empty).

But I can still query from my laptop.

I'll be honest - I didn't full understand the discussion on min_position - but I don't see how this could be the source of my problem.

rahulha · 2018-11-21T14:09:01Z

Please help understand, what do you mean by 1 million tweets in 1000 requests? Are you talking about Twitter API? Coz there is no requests concept in scraping. Also while scraping every Twitter URL call for json download returns 20 tweets max. If you are using APIs then min_position do not apply to you.

About Twitter policies, check this out

https://twitter.com/en/tos

There is a gray area when it comes to distinguishing between Scraping and Crawling although both might look same they are different. But it depends on how Twitter defines it. In TOS page there is nothing related to blocking of IP address. Second, blocking of IP means detecting IP address which is against Twitter's privacy policy. Third, IP blocking will not work if you are behind DNS when IP is refreshed periodically or Public network so basically IP blocking is not a good solution and companies knows it.

When I was downloading tweets as part of my free assistance on orgneat.com I never had issue with blocking my scraper. To understand this, first we need to understand how Twitter scraper program works. The program mimics a browser and it simply scrolling down the web page in order to get the statuses (tweets). If Twitter blocks the program Twitter has to block all requests coming from your network/system which basically means if you open Twitter.com you should not be able to see anything.

While I was working on scraping requests from all around the world, it helped me a lot with little research on how it works. It make sense that if you start downloading millions of tweets, depending on various factors like Internet connection, glitches, Twitter's handling of requests etc there might be some issues. Please note that a scraper is extremely fast human scrolling down a page constantly possibly every second. I do face same issues now and then, so I have made conditional checks, the logic I explained above. I did this only because I was trying to assist many people with free service and wanted to provide a seamless request/delivery experience with scraper running day-night unattended.

kho7 · 2018-12-22T22:52:53Z

Thanks for addressing the questions. I found the --username query can get full data (e.g. --since 2018-01-01 until 2-18-12-22) but --querysearch (keyword: China tariff) did the best up to 9/13/2018. In that request, I downloaded 142,396 tweets. I did try multiple combinations but still was unable to reach beyond 9/13/2018). Is that memory related? Or IP address related? I did try manually scroll the page that allows me to reach further. Any suggestions will be greatly appreciated!!

Mottl · 2018-12-23T11:13:22Z

@kho7, the issue is with min_position as stated by Rahul:
#3 (comment)

kho7 · 2018-12-23T15:57:40Z

Thanks for your reply and I learnt a great deal. I use the command line method

GetOldTweets3 --querysearch "China tariff" --since 2018-01-01 --until 2018-9-13 --output "tradewar02g.csv"

Shall I modify the TweetManager.py to change min_position? Thanks again big time.

Mottl · 2018-12-23T16:00:13Z

You sure understand what Rahul has written about min_position?

JaimeBadiola · 2018-12-23T19:55:17Z

I am testing this query that only recovers 24 tweets.

"python Exporter.py --lang en --querysearch "bitcoin" --since 2017-08-13 --until 2017-08-14"

And the issue seems to be that the while loop stops here (Line 67 of TweetManager.py)

if len(json['items_html'].strip()) == 0:
break

I tried to Get Json response 10 times before breaking the while loop but twitter doesnt answer accordingly.

Any ideas?

kho7 · 2018-12-24T06:38:49Z

You sure understand what Rahul has written about min_position?

One part I am trying to understand and apply is

iv) if min_position starts with cm+ set min_position = "TWEET-" + Tweet ID of Last tweet in result + "-" + tweet ID of first tweet in result.

Thanks again.

giulionf · 2018-12-24T15:19:09Z

I'm having the same issue, but for some reason the result length is 0 all the time! When I'm using another python script the same search is working without problems...

JaimeBadiola · 2018-12-26T00:39:22Z

What is the other python scrypt @giulionf ?

giulionf · 2018-12-26T15:55:59Z

Basicly, a test script to check if it was working or not... I just set the parameters my other script was fetching manually. On my remote, it's working as well! Really strange...

import GetOldTweets3 as got3

def test():
	criteria = got3.manager.TweetCriteria().setUsername("@desusnIce").setMaxTweets(10).setUntil('2014-01-03').setSince('2013-12-31').setWithin('').setQuerySearch('Desus OR Nice OR Follow OR desusnlce OR wonder OR if OR jay OR ever OR sneaks OR off OR at OR night OR to OR sell OR crack OR just OR to OR see OR if OR he OR still OR got OR it OR PM OR Jan OR Retweets OR Likes')
	tweets = got3.manager.TweetManager.getTweets(criteria)
	for tweet in tweets:
		print(tweet.text)

if __name__ == '__main__':
	test()

JohnDickson5 · 2019-05-09T19:56:11Z

I seemingly triggered this error querying one tweet at a time, <100 times a day, over the course of 2 weeks. For what I gather, that's much less volume, more spread out than what others have reported here.

I'm not grasping most of what is posted here. Will conducting test queries exacerbate the problem? Will switching networks or using a VPN will not help?

gghidiu · 2019-09-03T13:43:05Z

I have similar issue. For multiple queries the download stops at a certain number without any errors. Sometime the number varies, but it always stops before reaching the until date.
@JaimeBadiola have you managed to correct this bug. I am new to Python, so it would be very helpful if you could post the modified code here.

Thanks in advance!

JaimeBadiola · 2019-09-03T13:51:49Z

No, I wasn't able to correct the bug.

…

On Tue, 3 Sep 2019 at 14:43, gghidiu ***@***.***> wrote: I have similar issue. For multiple queries the download stops at a certain number without any errors. Sometime the number varies, but it always stops before reaching the until date. @JaimeBadiola <https://github.com/JaimeBadiola> have you managed to correct this bug. I am new to Python, so it would be very helpful if you could post the modified code here. Thanks in advance! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=AJSZZCJQJBKXBGAOUL6K2KTQHZSWXA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5YHNZQ#issuecomment-527464166>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJSZZCPM76XJNTB52ODOB6TQHZSWXANCNFSM4GCBI7EA> .

gghidiu · 2019-09-03T14:14:49Z

@JaimeBadiola , have you found a working alternative then?

JaimeBadiola · 2019-09-03T14:18:04Z

What I did was to download all the data day by day and if one day there was an error I would mark that day as missing data. In total i downloaded about 500 days and a bit more than 20 were missing.

…

On Tue, 3 Sep 2019 at 15:14, gghidiu ***@***.***> wrote: @JaimeBadiola <https://github.com/JaimeBadiola> , have you found a working alternative then? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=AJSZZCLXWS6NUJ6KZUGDRT3QHZWNVA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5YKXMI#issuecomment-527477681>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJSZZCOTTTN7E4FI4QGODZDQHZWNVANCNFSM4GCBI7EA> .

aduriseti · 2019-09-03T17:06:56Z

Hey - I was able to resolve this problem when writing a crawler (in scala) for work (so I cant just put it on github). The gist of the problem is that sometimes the scroll cursor used to compute index in to the stream of items returned by a query becomes corrupted. I dont think this is volume dependent (i.e. this is not some kind of rate limiting mechanism). I was able to resolve this by saving the previous search cursor after each query and going back to using it if I suspect the search cursor I'm currently using is corrupted. Functionally I implemented this w/ a psuedo BFS where I kept the previous cursor in the explore Q until its child cursor executes a search w/ no errors. Ive been planning to make a PR to this repo where I port my solution but just been busy. Let me know if you guys want it and I'll make it a priority.

gghidiu · 2019-09-03T21:12:05Z

It is a pitty that you can not summit request by hour, since that would solve the issue.

I've tried it a couple of days earlier. Seems like they removed HH:MM:SS specification from since: and until:. You can try it yourself — may be they reverted back that functionality.

There is a workaround to download the tweets from a specific time. It partially solves the problem since you can just continue downloading from where the program has stopped.

The idea is to convert the --until time to the tweet id and insert the tweet id as max_id into the query. The formula for this is: (millisecond_epoch - 1288834974657) << 22 = tweet id.

For example, if the download stoped at 2016-08-24 19:38:13

first convert the date time to millisecond epoch: this would be 1472067493000
then calculate the tweet id: (1472067493000 - 1288834974657) << 22 = 768532884616118272
insert 'max_id:768532884616118272' into the query
the download will start from the 2016-08-24 19:38:13. You should as well decrease the time slightly to avoid the problematic tweet that stoped the download.

In my case the final query will look something like this:

GetOldTweets3 --querysearch 'bitcoin max_id:768532884616118272' --lang de --since 2016-08-24 --maxtweets 300000 --output 'bitcoin_24_08_2016.csv'

since we have the max_id parameter the --until becomes redundant.

gghidiu · 2019-09-03T21:14:07Z

@aduriseti , would be great if we get the working version.

aduriseti · 2019-09-04T01:01:30Z

768532884616118272
Wow - had no idea we could calculate tweet ids like this - wouldve vastly simplified my project had I known. Thanks for the tip.

modatamoproblems · 2019-10-11T13:08:11Z

Did you try running the same script multiple times in hopes of getting more tweets? I noticed I was getting fewer and fewer tweets loaded when I did this... checked my task manager and CPU was through the roof. There were also 20+ Pythons listed...
I restarted and broke the date ranges into smaller ranges, create a data frame, and then save them as CSVs. Seems to have done the trick (aside from the pulls still taking a very very long time). I still have to reboot between pulls

If anyone has Alteryx, there is a Twitter app you can use to pull the data.

rodrigoborgesmachado · 2019-11-15T19:07:33Z

I'm trying to make this search:
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('CVE').setSince("2015-01-01").setUntil("2019-11-15")

But the result is not with all tweets. I'm getting just some of them and stop on some date arround 2018-12-20. Can some body help me?

JaimeBadiola · 2019-11-16T19:56:55Z

My work around was to jump the days that I had issues with. So jump 2018-12-20 and keep downloading after that

…

On Fri, 15 Nov 2019 at 19:07, Rodrigo Borges Machado < ***@***.***> wrote: I'm trying to make this search: tweetCriteria = got.manager.TweetCriteria().setQuerySearch('CVE').setSince("2015-01-01").setUntil("2019-11-15") But the result is not with all tweets. I'm getting just some of them and stop on some date arround 2018-12-20. Can some body help me? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=AJSZZCI226HFD7CWIVTVBMDQT3XPNA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGNJCA#issuecomment-554488968>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJSZZCOHIBXMZ5UGTUGVRLDQT3XPNANCNFSM4GCBI7EA> .

rodrigoborgesmachado · 2019-11-16T21:32:34Z

My work around was to jump the days that I had issues with. So jump 2018-12-20 and keep downloading after that
…
On Fri, 15 Nov 2019 at 19:07, Rodrigo Borges Machado < @.***> wrote: I'm trying to make this search: tweetCriteria = got.manager.TweetCriteria().setQuerySearch('CVE').setSince("2015-01-01").setUntil("2019-11-15") But the result is not with all tweets. I'm getting just some of them and stop on some date arround 2018-12-20. Can some body help me? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=AJSZZCI226HFD7CWIVTVBMDQT3XPNA5CNFSM4GCBI7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGNJCA#issuecomment-554488968>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJSZZCOHIBXMZ5UGTUGVRLDQT3XPNANCNFSM4GCBI7EA .

That was my first ideia, I will try that way so...

chtryanil · 2020-04-24T02:19:44Z

Hey - I was able to resolve this problem when writing a crawler (in scala) for work (so I cant just put it on github). The gist of the problem is that sometimes the scroll cursor used to compute index in to the stream of items returned by a query becomes corrupted. I dont think this is volume dependent (i.e. this is not some kind of rate limiting mechanism). I was able to resolve this by saving the previous search cursor after each query and going back to using it if I suspect the search cursor I'm currently using is corrupted. Functionally I implemented this w/ a psuedo BFS where I kept the previous cursor in the explore Q until its child cursor executes a search w/ no errors. Ive been planning to make a PR to this repo where I port my solution but just been busy. Let me know if you guys want it and I'll make it a priority.

could you please do this.

kho7 · 2020-04-24T02:28:12Z

I have not used Scala but I will be happy to try it. Thanks.

…

On Apr 23, 2020, at 9:19 PM, chtryanil ***@***.***> wrote: Hey - I was able to resolve this problem when writing a crawler (in scala) for work (so I cant just put it on github). The gist of the problem is that sometimes the scroll cursor used to compute index in to the stream of items returned by a query becomes corrupted. I dont think this is volume dependent (i.e. this is not some kind of rate limiting mechanism). I was able to resolve this by saving the previous search cursor after each query and going back to using it if I suspect the search cursor I'm currently using is corrupted. Functionally I implemented this w/ a psuedo BFS where I kept the previous cursor in the explore Q until its child cursor executes a search w/ no errors. Ive been planning to make a PR to this repo where I port my solution but just been busy. Let me know if you guys want it and I'll make it a priority. could you please do this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIFMPUSOWYPSMYDYIRDRHETRODZMZANCNFSM4GCBI7EA>.

joshkwannacode · 2020-06-05T10:49:53Z

Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id?

import GetOldTweets3 as got

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\
                                                           .setUntil("2020-05-01")\
                                                           .setNear('Detroit,Michigan')\
                                                           .setSince("2020-04-03")\
                                                           .setMaxTweets(100)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print(tweet.username)

modatamoproblems · 2020-06-05T11:24:11Z

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine On Jun 5, 2020, at 5:50 AM, joshkwannacode <notifications@github.com> wrote: Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id? import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA>.

joshkwannacode · 2020-06-10T08:44:44Z

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine On Jun 5, 2020, at 5:50 AM, joshkwannacode notifications@github.com wrote: Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id? import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA.

that does not work, i get an error saying list object has no attribute username

vonadz · 2020-06-13T02:25:55Z

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine On Jun 5, 2020, at 5:50 AM, joshkwannacode notifications@github.com wrote: Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id? import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA.

that does not work, i get an error saying list object has no attribute username

That's because you're trying to access a nonexistent attribute of a list of tweets. You should probably learn what a list is.
https://www.w3schools.com/python/python_lists.asp

joshkwannacode · 2020-06-14T11:28:32Z

Hi, you set tweet equal to the first row when you selected the row with index 0 (the part that looks like this [0]). Delete [0] and you should be fine On Jun 5, 2020, at 5:50 AM, joshkwannacode notifications@github.com wrote: Hi guys ive tried this and i only get 1 username/tweet back how do i fix this? do i have to add the max id? import GetOldTweets3 as got tweetCriteria = got.manager.TweetCriteria().setQuerySearch('#detroitrapper')\ .setUntil("2020-05-01")\ .setNear('Detroit,Michigan')\ .setSince("2020-04-03")\ .setMaxTweets(100) tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] print(tweet.username) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNXB3SWRGOZKOSWA5E6HJLRVDEV7ANCNFSM4GCBI7EA.

that does not work, i get an error saying list object has no attribute username

That's because you're trying to access a nonexistent attribute of a list of tweets. You should probably learn what a list is.
https://www.w3schools.com/python/python_lists.asp

deleted my other post, thanks man i guess i didnt understand lists lol.

edit: made a loop and it works

lprayaga · 2020-07-12T00:36:14Z

I am having trouble with Getoldtweets3 on my mac. I can install it ans run the command:
Getoldtweets3 - h. and get all the examples

BUt if I try any other command like
Getoldtweets3 --querysearch "GetOldTweets3 --querysearch "bitcoin" --lang cn --maxtweets 10

then I cannot get it to work. It was working until today, I made no changes, but getting this error: If anyone has ideas please share

Downloading tweets...
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyquery/pyquery.py", line 96, in fromstring
result = getattr(etree, meth)(context)
File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "", line 2
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 2, column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/GetOldTweets3", line 209, in main
got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer, debug=debug)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/GetOldTweets3/manager/TweetManager.py", line 70, in getTweets
scrapedTweets = PyQuery(json['items_html'])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyquery/pyquery.py", line 256, in init
elements = fromstring(context, self.parser)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyquery/pyquery.py", line 100, in fromstring
result = getattr(lxml.html, meth)(context)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/lxml/html/init.py", line 875, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/lxml/html/init.py", line 763, in document_fromstring
raise etree.ParserError(
lxml.etree.ParserError: Document is empty

Document is empty

Done. Output file generated "output_got.csv".

Mottl added the bug Something isn't working label Nov 13, 2018

Mottl mentioned this issue Nov 19, 2018

I get a small amount of tweets Jefferson-Henrique/GetOldTweets-python#167

Open

Mottl mentioned this issue Mar 2, 2019

Remove Warnings from code #22

Closed

giulionf mentioned this issue Mar 5, 2019

Possible to include re-tweets? #2

Open

giulionf mentioned this issue May 9, 2019

I get 0 tweets #27

Closed

Mottl mentioned this issue May 30, 2019

Geocode search #24

Closed

bcornet1 mentioned this issue Mar 4, 2020

Too Many Requests #57

Open

Download stops after a lot of tweets #3

Download stops after a lot of tweets #3

Comments

JaimeBadiola commented Nov 6, 2018

Mottl commented Nov 6, 2018 • edited Loading

JaimeBadiola commented Nov 6, 2018

Mottl commented Nov 6, 2018

JaimeBadiola commented Nov 6, 2018

Mottl commented Nov 6, 2018

JaimeBadiola commented Nov 7, 2018

JaimeBadiola commented Nov 7, 2018 • edited Loading

Mottl commented Nov 7, 2018 • edited Loading

Mottl commented Nov 7, 2018

JaimeBadiola commented Nov 7, 2018

Mottl commented Nov 7, 2018

JaimeBadiola commented Nov 7, 2018

Mottl commented Nov 7, 2018

JaimeBadiola commented Nov 7, 2018 • edited Loading

JaimeBadiola commented Nov 9, 2018

JaimeBadiola commented Nov 13, 2018

Mottl commented Nov 13, 2018 • edited Loading

JaimeBadiola commented Nov 13, 2018

rahulha commented Nov 15, 2018

aduriseti commented Nov 20, 2018

rahulha commented Nov 21, 2018

kho7 commented Dec 22, 2018

Mottl commented Dec 23, 2018

kho7 commented Dec 23, 2018

Mottl commented Dec 23, 2018

JaimeBadiola commented Dec 23, 2018

kho7 commented Dec 24, 2018

giulionf commented Dec 24, 2018 • edited Loading

JaimeBadiola commented Dec 26, 2018

giulionf commented Dec 26, 2018 • edited Loading

JohnDickson5 commented May 9, 2019

gghidiu commented Sep 3, 2019

JaimeBadiola commented Sep 3, 2019 via email

gghidiu commented Sep 3, 2019

JaimeBadiola commented Sep 3, 2019 via email

aduriseti commented Sep 3, 2019

gghidiu commented Sep 3, 2019 • edited Loading

gghidiu commented Sep 3, 2019

aduriseti commented Sep 4, 2019

modatamoproblems commented Oct 11, 2019

rodrigoborgesmachado commented Nov 15, 2019

JaimeBadiola commented Nov 16, 2019 via email

rodrigoborgesmachado commented Nov 16, 2019

chtryanil commented Apr 24, 2020

kho7 commented Apr 24, 2020 via email

joshkwannacode commented Jun 5, 2020

modatamoproblems commented Jun 5, 2020 via email

joshkwannacode commented Jun 10, 2020

vonadz commented Jun 13, 2020

joshkwannacode commented Jun 14, 2020 • edited Loading

lprayaga commented Jul 12, 2020

Mottl commented Nov 6, 2018 •

edited

Loading

JaimeBadiola commented Nov 7, 2018 •

edited

Loading

Mottl commented Nov 7, 2018 •

edited

Loading

JaimeBadiola commented Nov 7, 2018 •

edited

Loading

Mottl commented Nov 13, 2018 •

edited

Loading

giulionf commented Dec 24, 2018 •

edited

Loading

giulionf commented Dec 26, 2018 •

edited

Loading

gghidiu commented Sep 3, 2019 •

edited

Loading

joshkwannacode commented Jun 14, 2020 •

edited

Loading