Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Twitter Thread Archiver #345

Open
3 of 7 tasks
shimizurei opened this issue May 31, 2020 · 12 comments
Open
3 of 7 tasks

Feature Request: Twitter Thread Archiver #345

shimizurei opened this issue May 31, 2020 · 12 comments
Labels
status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet why: functionality Intended to improve ArchiveBox functionality or features

Comments

@shimizurei
Copy link

Can something like the Thread Reader App be incorporated into ArchiveBox?

Type

  • Propose a brand new feature

What is the problem that your feature request solves

We can save Twitter threads (NOT individual Twitter posts) as functionally complete articles.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

A nice article pdf like the Thread Reader app.

What hacks or alternative solutions have you tried to solve the problem?

ThreadReader App

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
@shimizurei shimizurei added why: functionality Intended to improve ArchiveBox functionality or features status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet labels May 31, 2020
@pirate
Copy link
Member

pirate commented Jun 1, 2020

Yeah I've wanted this for a long time too. The way it's been implemented on other projects is as a content script that unrolls threads before snapshotting inside of chrome headless.

@mAAdhaTTah
Copy link
Contributor

Would it be possible for the archiver to trigger the ThreadReader app to unroll it then archive the ThreadReader result?

@shimizurei
Copy link
Author

Then it ends up depending on ThreadReader. What if ThreadReader becomes defunct tomorrow?

@mAAdhaTTah
Copy link
Contributor

@shimizurei You'd have an archive of the ThreadReader page in your ArchiveBox.

@shimizurei
Copy link
Author

If it's part of ArchiveBox's code, then it's life depends on the maintainers of ArchiveBox. ThreadReader isn't open source, so if it goes down tomorrow, that's it. Everyone will be scrambling to find a replacement because the code is not easily available. Yes, you'll have your already created archives, but you wouldn't be able to create anymore.

@pirate
Copy link
Member

pirate commented Dec 14, 2020

I'd rather do this via a python library, CLI tool, or puppeteer scripts (once our async playwright worker system is out).

Follow here for updates on puppeteer script support progress: #51

@akmadian
Copy link

I would really like this feature, and I'm willing to contribute code to make it happen, if that's welcome.

@pirate
Copy link
Member

pirate commented Nov 23, 2021

There are still a lot of structural blockers in Archivebox's design to running content scripts directly during archiving.

The most helpful approach might be to write a dedicated extractor in Python that dumps the unrolled thread to a nicer HTML file? Look for existing tools structured like YouTube-dl but for Reddit and Twitter (does a thread-dl exist?), and then clone the YOUTUBEDL extractor code to get started.

@jpaulickcz
Copy link

I've been looking for a box with this functionality for a long while now, with no luck. The closest thing to what I imagine and that I found is https://github.com/weskerfoot/TweetLog – however that does require access to developer API which I don't have.

Regular thread – sequence of tweets making a mini article (my god, what happened to good ol' blogs?) – can be otherwise quite easily archived with Thread Reader App (by calling https://threadreaderapp.com/thread/$TWIDENT.html where $TWIDENT is ID of any of the tweet thats part of the thread; and then downloading it a few minutes later. Although I am looking for something that would be able to archive a tweet OR a thread, including all of the replies to one or more of the tweets included in said thread.

@onemenzel
Copy link

onemenzel commented Apr 27, 2022

ThreadReaderApp has been acquired by twitter and shut down. I think a feasible approach would be to make a config option where a twitter developer token can be entered and then just download the thread and put it into a simple html file with one ˋ<p>ˋaragraph tag per tweet, maybe ˋ<br>ˋ for newlines.

I myself would do it quick and dirty and just pretend the html was made by readability but I can understand if that’s too much of a hack to you 😃

I also think that this feature is now of a higher importance than before because of the acquisition. I just archived ThreadReaderApps links before.

@pirate
Copy link
Member

pirate commented May 3, 2022

How about Nitter?

https://twitter.com/ArchiveBoxApp -> https://nitter.net/ArchiveBoxApp
https://twitter.com/mitchellh/status/1615797167607939072 -> https://nitter.net/mitchellh/status/1615797167607939072
... etc

@pirate
Copy link
Member

pirate commented Oct 20, 2023

FYI we use Mercury (recently renamed postlight) as an extractor already, and they're rapidly adding extractors on their side for many different kinds of sites, so we should get these improvements with no effort required on the archivebox side:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet why: functionality Intended to improve ArchiveBox functionality or features
Projects
None yet
Development

No branches or pull requests

7 participants
@pirate @shimizurei @mAAdhaTTah @onemenzel @akmadian @jpaulickcz and others