Reddit Imgur Archive

Disclaimer

This is a personal project and is not affiliated with Imgur or Reddit.

This is a work in progress. I am not a professional programmer, so I am sure there are many things that could be done better. I am open to suggestions.

This was created in haste to automate my own personal archiving of Imgur albums. I am sharing it in case it is useful to anyone else.

It is not intended to be a polished, user-friendly application. It is a script that I run from the command line and keep changing on the fly. It is not intended to be run by anyone else.

I am sharing it as-is. I may update it from time to time, but I am not going to provide support for it. If you have questions, feel free to ask, but I may not be able to help you.

I am not responsible for any damage this script may cause to your computer or your data. Use at your own risk.

Description

This is a collection of functions cobbled together to collect archived reddit submission data from the Pushshift API and compile a list of Imgur links from the submission data.

The links can then be downloaded with the script itself, or compiled in '.crawljob' files for use with JDownloader.

Dependencies

Python 3.11
Wastebin (for uploading lists of urls that would then be crawled by JDownloader)
JDownloader2 (optional)
- You can use the docker image https://hub.docker.com/r/jlesage/jdownloader-2/ to run JDownloader in a container.

See requirements.txt for a list of required Python packages.

Usage

This is a collection of functions that can be used to collect Imgur links from archived Reddit submissions. It is not intended to be run as a standalone application.

Check the documentation for each function for more information.

One way to use this is to create a new Python file and import the functions you want to use.

See main.py for an example of how to use the functions.

Then run the file from the command line:

python main.py

NOTE: A lot of the paths hardcoded in this repo presume that it will be run in a devcontainer with the following volumes mounted:

/app - This is where the code is stored
/data - This is where the crawljob files will be stored and any other data that needs to be persisted
/folderwatch - This is where JDownloader will watch for crawljob files

The devcontainer configuration is not yet included (I am still working on making a generic one).

Workflow

My workflow while developing this script was:

Extract a list of subreddits from my multireddits using the function reddit.get_multireddit_subreddits()
Use the function reddit.archive_subreddit to archive the submissions from each subreddit
Use the function imgur.write_imgur_urls_from_subreddit_to_file to write the Imgur links to a file
Use the function imgur.create_crawljob_file_from_imgur_urls to create a crawljob file for JDownloader2

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

src

src

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Reddit Imgur Archive

Disclaimer

Description

Dependencies

Usage

Workflow

About

Packages 1

Contributors 2

Languages

License

Dragonatorul/reddit_imgur_archive

Folders and files

Latest commit

History

Repository files navigation

Reddit Imgur Archive

Disclaimer

Description

Dependencies

Usage

Workflow

About

Topics

Resources

License

Stars

Watchers

Forks

Languages