Skip to content

amoghmc/data_scraper_for_reddit

Repository files navigation

Data-scraper for Reddit

Data-scraper for reddit made using java and sqlite.

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. License
  6. Acknowledgments

About The Project

  • Data-Scraper for Reddit is a GUI-based scraping tool that allows users to scrape data from Reddit using the official API from Reddit
  • The GUI is implemented using Java Swing and SQLite for storing user data
  • Features 10+ filters and a dedicated sorting panel
  • Results are filtered from upto 250 pages
  • Final results can be neatly saved in a CSV format

(back to top)

Built With

  • Java
  • SQLite
  • Apache Maven

(back to top)

Getting Started

Prerequisites

  • Java

Linux/Mac

You can install java through Homebrew:

brew install java 

Windows

Download Java 18 from Oracle and follow the instructions from installer. Make sure you add the installation directory to your system path.

Installation

  1. Go to Reddit to create a free API app
  2. Select a name such as "dataScraper"
  3. Select "script"
  4. Leave the description and about URL as blank
  5. Use http://localhost:8080 for your redirect URL
  6. Create the app
  7. Next download the latest jar file from Releases
  8. Move the jar file to a new folder
  9. Run using the following command using the terminal,
    java -jar RedditDataScraper-[VERSION].jar 
  10. Click on "Register a new account"
  11. Enter your details and use client_id and client_secret from your API app
  12. For more details visit Oauth2

(back to top)

Usage

Examples:

  • Selecting all posts whose title contains the keywords "putin" and "ukraine" from the subreddit "worldnews"

example_1 CSV Result

  • Selecting all posts from the subreddit Coronavirus that contains numbers in the title. Results must also have a minimum score of 1000, and minimum comment count of 100. Results are finally sorted by their title.

example_1 CSV Result

  • Selecting all controversial posts from all subreddits.

example_1 CSV Result

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.md for more information.

(back to top)

Acknowledgments

(back to top)