Skip to content
This repository has been archived by the owner on May 31, 2023. It is now read-only.

Media Content #1

Closed
entr0p1 opened this issue May 10, 2021 · 1 comment
Closed

Media Content #1

entr0p1 opened this issue May 10, 2021 · 1 comment

Comments

@entr0p1
Copy link

entr0p1 commented May 10, 2021

Hi there,

First of all thank you so much for developing what might be the most polished and scalable reddit archiver I've ever seen! I just had more of a question than a problem, figured this is the best way to ask in case anyone else has the same query.

How does your tool handle media content (e.g. images, videos, etc)? Are there links recorded?

It would be fantastic to be able to download and store the media as well in a future release (just off the top of my head you could use folders with the post ID and a generated UUID for each piece of media, then reference that in the DB).

There are some subreddits that are "at risk" in need of archiving, at the moment we're just focusing on the text but if we could get media as well that would be amazing.

Thanks again for this fantastic tool

@arshadrr
Copy link
Owner

arshadrr commented May 10, 2021

Hi,

The links to media content are recorded, yes. Right now it doesn't store the media itself. You can find details in schema.sql which describes the output database (it's annotated with comments explaining the columns) but I will outline the relevant columns, they're in the posts table.

  1. when the column is_self is false, the post is not a self-post (text-only post) which would mean it's either a link post (a post with that links elsewhere but not to another reddit post) or a crosspost (a post that links to another reddit post). what it points to is stored in the url column.
  2. when the url column doesn't point to another reddit post, that means it isn't a crosspost. So it's a link post, in which case it might link to some media (imgur, i.reddit.com, or v.reddit.com, etc.) or some other website (maybe a news article). This is the column where you'll find the links to media if it isn't a link to some other website.

Hope that helps, feel free to ask if you'd like me to elaborate/clarify.

Because reddit allows you to embed from all sorts of places (i.reddit.com, v.reddit.com, gfycat, imgur, etc) I don't think there's a general solution to saving media, but I haven't really thought about it. Can you maybe share what kind of media you're trying to save? I can try look into it.

Also, thanks for the kind words, I really appreciate it :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants