Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Import tweets from Twitter's new exported tweets files #1413

Open
ginatrapani opened this Issue Dec 16, 2012 · 21 comments

Comments

Projects
None yet
Owner

ginatrapani commented Dec 16, 2012

Great news: Twitter has begun rolling out the ability to export your entire archive of tweets from the site:

http://thenextweb.com/twitter/2012/12/16/twitter-has-started-rolling-out-the-option-to-download-all-your-tweets/

The archive contains a JSON and CSV file. Let's build an importer to suck these old tweets into ThinkUp. It can either be a standalone tool in the language of your choice or built into the ThinkUp interface. Looks like at least @ws has expressed an interest in working on this.

https://twitter.com/ws/status/280340267012333568

Who else?

ws commented Dec 16, 2012

A bit more technical info:

A Verge reader has been nice enough to share his archive with us so we have a base to work off of.

It looks like a bunch of JSON files sorted by month.

ws commented Dec 16, 2012

OK, I'm going to begin working on this here as standalone code (just because I'm not totally familiar with ThinkUp's code architecture)

alper commented Dec 16, 2012

@ws

I did something similar a while back when these archives were available for Europeans (upon special request). I haven't had time to look into the new files yet, but maybe my hackish approach can help you get going quicker?

The gist: https://gist.github.com/2414715
The write-up: http://monsterswell.com/blog/2012/04/a-full-twitter-index-in-your-thinkup/

ws commented Dec 16, 2012

Awesome, that should be a huge help... thank you so much!

ninadsp commented Dec 17, 2012

I'll help with any testing. Following @ws's code.

KTamas commented Jan 19, 2013

Bump? I can help with my archive of ~44k tweets, if that is any help...

Also willing to kick the tires of anything you guys come up with.

tante commented Feb 14, 2013

I could supply my archive as well, would love to integrate them into thinkup

sdenike commented Mar 9, 2013

Any update on this? Would love to be able to pull in all my tweets.

ws commented Mar 9, 2013

Yes... I'm currently away from my development machine, but I'll remote into it and upload once I get to a semi-stable connection

sdenike commented Mar 10, 2013

Awesome, look forward to it!

ws commented Mar 12, 2013

Sorry for the delay guys... working with henriwatson (another TU contributor) to fix a few bugs and pushing live ASAP. It's been a hectic few weeks.

ws commented Mar 13, 2013

OK, it's to the point where it's at least stable. I ended up just using Meekro because MySQL is annoying and stupid.

Massive thanks to henriwatson for making my life easier.

Please backup your archive and database, then give me some feedback.

https://github.com/ws/tu-archive-importer

KTamas commented Mar 16, 2013

@ws I tried it. Couple of things:

  1. spits out lots of warnings in the CLI if there is no place/in_reply_to
  2. the db->close() at the end fails
  3. it duplicates all tweets that exists in the db i.e. the ones thinkup was able to scrape.

The third one is kind of a dealbreaker atm :) but this is a good start.

ws commented Mar 16, 2013

@KTamas Thanks for the feedback!

  1. OK, I'll check that out.
  2. I'll also look into that.
  3. Derp, my bad. I'll fix that ASAP.

keith commented Mar 24, 2013

@KTamas @ws Just submitted a pull for your issue 3 ws/tu-archive-importer#1

keith commented Mar 25, 2013

I have actually successfully imported all my tweets using the modifications to @ws script in this PR ws/tu-archive-importer#1

Any chance this feature will ever land in master?

Owner

ginatrapani commented Jul 8, 2015

We're not actively working on it, but I'd be happy to review a pull request for it!

The link that @keith posted led me to a fork of that script, dholowiski/tu-archive-importer, which handled the import admirably, so that could be used as a starting point.

Sadly, I do not have any time to hack on this right now, but I’ll keep it in mind for the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment