New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deleted.py #373
Comments
Actually you need to give deletes.py the JSON data for a tweet because it uses it to try to figure out where the tweet lives (or lived on the web).
Does that help? I haven't used it in quite a while and wonder if it still works predictably after Twtter's infrastructure changes... |
Wait, sorry, I'm confused. I have a text file of 20,000 or so user ids and I want to extract the tweets or accounts that have been deleted. |
Ah yes, there is If you are looking to see what user ids in your file have been deleted you can use the |
Would the file of tweet JSON data be the original data or would it be a dehydrated output of the original dataset? It was my understanding that dehydration output a TXT file, not a JSON file. I would use the twarc users command but I get several HTTP errors. Sorry for all of the questions, I'm very new to twarc. |
Actually I misspoke, twarc users will throw an error if it is given a user id that no longer exists. To be able to do the maximum number of lookups it does them in batch lookups of 010. When one of them fails it causes the whole batch lookup to fail, so there's no real way for it to recover. In your case you are best off writing a little program to do the lookup one by one. I've added a utility called deleted_users.py which you can use to lookup user ids, or users in tweet JSON data. It will only output the ids or tweets that have been deleted (no longer available for hydration the Twitter API). Does this help at all? |
Whoops, didn't mean to close! |
I'm trying it out now. The rate limit has been exceeded a couple of times so it's taking awhile to finish running. I think that's understandable considering I have 20,000 tweets. Thank you for your help! |
Oh good, I'm glad it is running. It will take some time since it needs to check each one instead of getting 100 at a time. It's basically a hundred times slower... |
I've run it a couple of times on a dataset of 20,000 tweets and the output is blank which seems unlikely, but I suppose isn't impossible. |
That is unusual. Let me try it out. Can you share the exact command you are using? I can test with my own dataset. |
python utils/deleted_users.py 2016.jsonl > 2016_deleted_users.jsonl |
Hmm, this seems to work fine for me on a file of 100 tweets (it found 5). Do you want to try it too? |
I've got it. Let me try that and I'll get back to you. |
Hmm it worked for me with that file. I also got 5. |
Ok good, that means it's working. Did your job finish running? It it didn't it's possible that the output was buffered and not written to the output file yet. Do you see any activity in the deleted_users.log file? |
It seems to have finished but I don't have a 'deleted_users.log' file anywhere. |
I saw that you were running the utility from the directory above utils.
The log file should be in there.
|
I used the command python utils/deleted_users.py tweets.jsonl > deleted_users.jsonl and it ran in my command line without issues but I don't have any log files. |
Maybe you need to download the current version of deleted_users.py ? |
It's working now and in conjunction with deletes.py. Unfortunately, I still cannot get deleted.py to work. I'd really like to be able to analyze tweets that were deleted, not just users (and consequently their tweets). |
Going through closing some old issues: The same functionality is now available using the new Batch Compliance API, which will process tweet IDs and give you back reasons for deletions. |
What's the usage command for deleted.py? I've been using the command
python utils/deleted.py election_data.txt > election_deleted.jsonl
where election_data is the dehydration output of tweet ids from an election dataset. I keep getting this error:
Traceback (most recent call last):
File "utils/deleted.py", line 31, in
for t in missing(tweets):
File "utils/deleted.py", line 16, in missing
tweet_ids = [t['id_str'] for t in tweets]
File "utils/deleted.py", line 16, in
tweet_ids = [t['id_str'] for t in tweets]
TypeError: 'int' object is not subscriptable
The text was updated successfully, but these errors were encountered: