Skip to content

Scrapes webpages in (almost) any web format, then sends discord notifications based on extracted data and customizable logic. Node/Typescript/Discord.js, Redis DB

License

Notifications You must be signed in to change notification settings

EllAchE/discord_site_monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Discord Site Monitor and Parser

Scrapes webpages in (almost) any web format, then sends discord notifications based on extracted data and customizable logic.


Credit

Some code borrowed from Noel Vissers' "Site-watcher" project.

Features

  • Support a variety of formats including rss, pdf, html, json and more.
  • Extract specific values from the site, and trigger alerts based on those values
  • Add multiple sites to watcher
  • Checks on a specified interval (cronjob, currently configured in the source code).
  • Update tracked sites via modification of the underlying json or via bot commands
  • Open source!

Setup

  1. Create a discord bot discord.com/developers/applications. A tutorial can be found here.
  2. Install npm packages, compile the typescript project

Configuring the bot:

  1. Open the .env file.
  2. Add DISCORDJS_BOT_TOKEN= followed by your discord bot's token. You can get the token from discord.com/developers/applications.
  3. If you want to change the prefix (default "s!"), you can change it in ./src/types.ts (export const PREFIX = 's!';).

Usage

  1. Invite the bot to your Discord server by replacing 123456789012345678 in the following link with your bot's client id: https://discord.com/oauth2/authorize?client_id=123456789012345678&scope=bot&permissions=8.
  2. Create a site config file called sites.json at the path src/json/sites.json. Follow the example shown in sample-sites.json to populate the file, OR... *** Since the original version of this bot there have been major refactors to change what constitutes a valid site file. JSON arrays now have to be saved as strings so that redis can properly store and retrieve them. The reason redis arrays are not used is to preserve numeric indices wherever those are set.
  3. Run the bot (with node) then add a website with the !add <URL> command. //(Still needs to be implemented)

For all other options, see Commands.

Commands

!help

Show all the available commands.

Parameters
None.


!add <URL>

Adds a website to the list.

Parameters
Required:
URL The URL of the site you want to track.

Example
!add https://google.com/ This tracks changes on https://google.com/.
Note that some sites, including Google.com have dynamic elements (like ads) that cause a change every time its checked. To make sure these dynamic elements are filtered out, use the css selector parameter.

!add https://example.com/ "body > div > h1" This tracks changes in header 1 of the site https://example.com/.

Output
add


!remove <INDEX>

Removes a website from the list.

Parameters
Required:
INDEX The index of the site you want to remove. Use !list to see the number of the site(s). NOTE - the list indexs are 1 indexed but you must pass a zero indexed value to remove a site

Example
!remove 0 This removes the first site in the list (!list).

Output
remove


!list

Sends the list of websites being watched.

Parameters
None.

Example
!list This sends the list of websites being watched.

Output
list


!listv

Sends a verbose message with details for each of websites being watched.

Parameters
None.

Example
!listv This sends a verbose list of websites being watched. Verbose includes the full json configuration for each site.

Output
list


!update

Manually updates the sites that are being watched.

Parameters
None.

Example
!update This manually updates the sites that are being watched.
If a site is updated, it will push the standard update message to the default update channel.

Output
update


!interval <MINUTES>

Set the interval/refresh rate of the watcher. Default 5 minutes.

Parameters
MINUTES The interval in minutes (minimum of 1, maximum of 60).

Example
!interval 10 Sets the interval to 10 minutes.

Output
interval


!start

Start the watcher with the specified interval (default ON with interval of 5 minutes).
This uses cron.

Parameters
None.

Example
!start This starts the watcher with the specified interval.

Output
start


!stop

Stops the watcher from automatically checking the tracked websites. Watcher can be resumed with !start.

Parameters
None.

Example
!stop This stops the watcher from automatically checking the tracked websites.

Contribute

Not actively maintaining this, but if you think of some interesting use cases let me know and we can see about collaborating.

License

This project is licensed under the MIT License - see the LICENSE file for details

Creating a config/sites file

A config/site object looks something like this:

{

"id": "jobless",

"url": "https://www.dol.gov/ui/data.pdf",

"contentSelector": "body",

"lastChecked": "7/22/2021, 3:09:44 AM",

"lastUpdated": "7/22/2021, 3:09:44 AM",

"regex": "(?<= initial claims was ).*(?=, a ..crease)",

"hash": "412adf44f97b7ac387ae276edbd1b8c3",

"match": "360,000",

"sendAnyChange": "true",

"index": null,

"format": "pdf"

}

The arguments to care about are:

id: identifier for the site that is being tracked url: the address of the data source contentSelector: Used for css queries, retrieving nested json or whatever is appropriate for the case that you've defined regex: used to clean up the string retrieved by earlier logic sendAnyChange: If true will send the result of any update (yes or no), if false will only send an alert if the designated condition is met. You would have to define a null return for the failure condition (or false/undefined) to avoid triggering the send index: if using css all will select the nth element as specified format: Options are (currently): json, pdf, rss, css and cssall. They are what they say, cssall simply distinguishes if the queryselector will have multiple returns and if index is therefore necessary(or not)

WIP, many features have been added/adapted since I last updated this page

About

Scrapes webpages in (almost) any web format, then sends discord notifications based on extracted data and customizable logic. Node/Typescript/Discord.js, Redis DB

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published