Skip to content

Commit

Permalink
Released v4.0: API-support, Database-caching, Improved models, Type-h…
Browse files Browse the repository at this point in the history
…ints & More

- New: Using type-checking wherever possible, allowing for more object-oriented functionality, a better API and less spaghetti-code
- New: Get entry from database (e.g., by original_url), allowing for a new way to find canonicals from historic data. It's used as caching and as a back-up.
- New: UrlMeta model for URLs (meta-info).
- Tweak: The Link object now contains 4 objects: The amp_canonical (type: Canonical), canonical (type: Canonical), canonicals (type: array of Canonicals) and origin (type: UrlMeta)
- Tweak: Meta-redirect, guess-and-check and database canonical-finding methods will now only run if the canonical isn't found in the first run, which saves some resources and should improve the accuracy
- Tweak: Streamlined the way entries get saved to the database
- Tweak: Added default values for DEBUG_LEVEL and MAX_DEPTH in static.txt
- Tweak: Merged generate_reply functions for the bots and online
- Tweak: Only update a data file if the value to be added is new
- Tweak: Added canonical_type to database model and logic
- Tweak: Moved some files and methods around
- Tweak: Updated README.md, comment & DM templates, FUNDING.yml (GitHub Sponsor & crypto-options) etc.
- Fixed: Database services now allow for both boolean and NULL values
- Fixed: requirements.txt displayed the wrong version number for Tweepy
- Removed: Link properties: url_clean and url_clean_is_valid, canonical_alt, canonical_alt_domain and canonicals_solved have been removed or replaced with different solutions.

Signed-off-by: KilledMufasa <jsvanderburgh2@gmail.com>
  • Loading branch information
KilledMufasa committed Nov 8, 2021
1 parent aa49e54 commit 0725d6d
Show file tree
Hide file tree
Showing 30 changed files with 723 additions and 574 deletions.
12 changes: 9 additions & 3 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Hi there! The bot and website cost approximately €8.26 a month to host and while that might not seem like much,
# it adds up. All donations will ONLY be used to pay for hosting. You can specify any amount you want, but please
# keep in mind that I only want to try to cover some of the costs. Thank you so much!
# The bots, website and API cost about 10 euros (12 dollars) per month to host.
# I will use all donations strictly to break even.
# You can donate any amount via PayPal, GitHub Sponsors or with crypto (see comments below). Thank you so much!

# Bitcoin (BTC): 1GsspnGwbaXfMP2P6t9Hr5oQwCYZdsPHr
# Cardano (ADA): DdzFFzCqrht1gHfopZ7ddXfJFz9tXkhQERc6dzfP71Ve9NoJYk4jQ1wtW1LNCWokMPoDZ7xr7YvHqvt82tG3MsEukkkcQyvUxrwjLWqx
# Dogecoin (DOGE): D8T2QaHiyUSNRvbu2D4L1W44Ge8NPtpPgy
# BNB, ETH & Binance Smart Chain (BEP20): 0xa705c939c7537984f41e0ad07c5dc3e60ca53691

github: KilledMufasa
custom: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=EU6ZFKTVT9VH2
90 changes: 52 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,70 @@
![#AmputatorBot](/img/amputatorbot_logo_banner.png)

TL;DR: AmputatorBot is a Reddit bot that replies to comments and submissions containing AMP URLs with the canonical URL.
TL;DR: [AmputatorBot](https://github.com/KilledMufasa/AmputatorBot) is a highly-specialised [Reddit](https://www.reddit.com/user/AmputatorBot)
and [Twitter](https://twitter.com/AmputatorBot) bot that automatically replies to comments, submissions
and tweets containing AMP URLs with the canonical link(s). It's also available as a
[website](https://www.amputatorbot.com/) and [REST API](https://documenter.getpostman.com/view/12422626/UVC3kTs2), but these aren't open sourced here.

**[FAQ, About & Why](https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot/)**
[**FAQ, About & Why**](https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot/)

## Features

![#AmputatorBot demo](/img/amputatorbot_demo.png)

Features include:
- Automatically create required log and datafiles
- Scan for comments, submissions, mentions or tweets
- Check each item against specified criteria
- Strip URLs of artifacts
- Check URLs for AMP links
- Find canonicals using 9 different methods
- Calculate which canonical is 'best'
- Return 2 canonicals if the canonicals are from more than 1 domain
- Return an AMP-canonical if the real canonical can't be found
- Generate and send automatic replies to AMP items with the canonical(s) and some info
- Automatically keep track of bans and contributor statuses
- Keep track of items interacted with
- Let users opt-out and opt-back-in
- Send DMs when summoned by users
- Log both locally and to a MySQL database

Please note:
- AmputatorBot works automatically in a select number of subreddits
- AmputatorBot won't work in subreddits where it is banned or forbidden
- The online version of AmputatorBot can be found at [AmputatorBot.com](https://www.amputatorbot.com/)
- You can find the changelog [here](https://www.reddit.com/r/AmputatorBot/comments/ch9fxp/changelog_of_amputatorbot/)!
### Main features:
- **10 specialised canonical-finding methods, allowing for an accuracy rate of +97%**. For example, by:
- Scanning the HTML contents
- Detecting and following redirects
- Guessing, and then checking article similarity with [newspaper](https://github.com/codelucas/newspaper/)
- … and many more!
- Detect AMP links using 14 patterns, and reply to items containing them with the canonical link and some info
- Compare and test canonicals and pick the best
- Stream Reddit comments, submissions, inbox messages and Tweets
- Extensively tested using a (private) database of over 140.000 AMP links and their canonicals, also functioning as caching

### Nice bonuses:
- Detect unique URLs with [URLExtract](https://github.com/lipoja/URLExtract) and strip them of any artifacts
- Object-oriented, allowing for a handy, free and publicly available API
- Allow users to opt out and undo this
- Send DMs when summoned by a user

### Good to know:
- Bans, contributor statuses and items interacted with are automatically being tracked
- Subreddits need to opt-in of AmputatorBot
- Log and datafiles are automatically generated

### See also:
- Online version (recommended): [AmputatorBot.com](https://www.amputatorbot.com/)
- Free and publicly available REST API to convert AMP URLs to canonical links: [API Documentation](https://documenter.getpostman.com/view/12422626/UVC3kTs2) & [Postman](https://www.postman.com/amputatorbot)
- User-oriented, simplified changelog: [Changelog](https://www.reddit.com/r/AmputatorBot/comments/ch9fxp/changelog_of_amputatorbot/)
- Community & Subreddit: [r/AmputatorBot](https://www.reddit.com/r/AmputatorBot/)
## Set up

1. Clone the repository
2. Run `pip install -r requirements.txt` to install dependencies
3. Fill in and change the required values in static.txt (see /static)
4. Change the filename of static.txt to .py
5. Choose which script(s) you want to run (check_comments.py, check_inbox.py or check_submissions.py)
6. Set the 'settings' in the run_bot function of the script. Set everything (guess_and_check, reply_to_post, write_to_database) to False when starting out.
7. Run the script - All logs and required datafiles will be automatically and dynamically created. In /data: allowed_subreddits.txt, comments_failed.txt, comments_success.txt, disallowed_mods.txt, disallowed_subreddits.txt, disallowed_users.txt, mentions_failed.txt, mentions.success.txt, np_subreddits.txt, problematic_domains.txt, submissions_failed.txt, submissions_success.txt and in /logs: check_comments_X.X.log, check_inbox_X.X.log and check_submissions_X.X.log.
8. Stop the script to see and edit the newly generated files. Odds are you want to add subreddits to allowed_subreddits.txt (for example: ,subreddit1,subreddit2)
9. Re-run the script
3. Change the filename of `static.txt` to `.py` (see `/static`)
4. Configure the application by tweaking `static.py` (required)
6. Choose which `check-[...].py` script to run
7. Configure the script's settings in `run_bot()`. Set everything (`guess_and_check`, `reply_to_post`, `save_to_database`) to `False` when starting out. Consider deleting or disabling the database canonical method.
8. Run the script - All logs and required datafiles should be automatically and dynamically created.
9. Stop the script.
10. Check out the new files in `/data` and edit them to your liking. Odds are you want to add subreddits to `allowed_subreddits.txt` (e.g.: `,sub1,sub2`)
11. Re-run the script

## Support the project

**.. By summoning the bot**: If you've spotted an AMP URL on Reddit and [u/AmputatorBot](https://www.reddit.com/u/AmputatorBot/) seems absent, you can summon the bot by mentioning [u/AmputatorBot](https://www.reddit.com/u/AmputatorBot/) in a reply to the comment or submission containing the AMP URL. You'll receive a confirmation through PM. For more details, check out [this post](https://www.reddit.com/r/AmputatorBot/comments/cchly3/you_can_now_summon_amputatorbot/)!

**.. By giving feedback**: Most of the new features were made after suggestions from you guys, so hit me up if you have any feedback! You can contact me on Reddit, [fill an issue](https://github.com/KilledMufasa/AmputatorBot/issues) or [make a pull request](https://github.com/KilledMufasa/AmputatorBot/issues).

**.. By sponsoring**: The bot and website cost approximately €8.26 a month to host and while that might not seem like much, it adds up. All donations will be used ONLY to pay for hosting. You can specify any amount you want, but please keep in mind that I only want to try to cover some of the costs. Thank you so much! - [https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=EU6ZFKTVT9VH2](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=EU6ZFKTVT9VH2)
- **Summon AmputatorBot** on Reddit, like so: [u/AmputatorBot](https://www.reddit.com/u/AmputatorBot/). For more info, [see here](https://www.reddit.com/r/AmputatorBot/comments/cchly3/you_can_now_summon_amputatorbot/).
- **Give feedback**: Most new features and improvements are directly influenced by your feedback. So, hit me up if you have any feedback. [Contact me on Reddit](https://www.reddit.com/message/compose/?to=Killed_Mufasa) or [Fill an issue](https://github.com/KilledMufasa/AmputatorBot/issues).
- **Star**: By starring the project here on GitHub, we can reach more folks and unlock new options. It also gives me something to brag about :p
- **Contribute**: [Pull requests](https://github.com/KilledMufasa/AmputatorBot/issues) are a great way to contribute directly to the code and functionality.
- **Spread the word**: In the end, the only goal of AmputatorBot is to allow people to have an informed choice. You can help by simply spreading the word!

**.. By spreading the word**: In the end, the only goal of AmputatorBot is to allow people to have an informed choice. You can help by spreading the word in whatever way you deem the most appropriate.
### Sponsor
The bots, website and API cost about 10 euros (12 dollars) per month to host. I will use all donations strictly to break even. You can donate any amount via PayPal or with crypto. Thank you so much!
- **PayPal**: [https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=EU6ZFKTVT9VH2](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=EU6ZFKTVT9VH2)
- **Bitcoin (BTC)**: 1GsspnGwbaXfMP2P6t9Hr5oQwCYZdsPHr
- **Cardano (ADA)**: DdzFFzCqrht1gHfopZ7ddXfJFz9tXkhQERc6dzfP71Ve9NoJYk4jQ1wtW1LNCWokMPoDZ7xr7YvHqvt82tG3MsEukkkcQyvUxrwjLWqx
- **Dogecoin (DOGE)**: D8T2QaHiyUSNRvbu2D4L1W44Ge8NPtpPgy
- **BNB, ETH & Binance Smart Chain (BEP20)**: 0xa705c939c7537984f41e0ad07c5dc3e60ca53691

**From the bottom of my heart, thank you so much for the tremendous support you've given me and AmputatorBot <3**
**From the bottom of my heart, huge thanks for the tremendous support! <3**
27 changes: 9 additions & 18 deletions check_comments.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,25 @@

import sys
import traceback
from datetime import datetime
from time import sleep

from prawcore import Forbidden

from datahandlers.local_datahandler import update_local_data
from datahandlers.remote_datahandler import add_data, get_engine_session
from datahandlers.remote_datahandler import save_entry
from helpers import logger
from helpers.comment_generator import generate_reply
from helpers.criteria_checker import check_criteria
from helpers.utils import get_urls, get_urls_info, check_if_banned
from helpers.reddit.reddit_comment_generator import generate_reply
from helpers.reddit.reddit_utils import check_if_banned
from helpers.utils import get_urls_info, get_urls
from models import stream
from models.item import Item
from models.type import Type

log = logger.get_log(sys)


# Run the bot
def run_bot(type=Type.COMMENT, guess_and_check=False, reply_to_item=True, write_to_database=True):
def run_bot(type=Type.COMMENT, use_gac=False, reply_to_item=True, save_to_database=True):
# Get the stream instance (contains session, type and data)
s = stream.get_stream(type)
log.info("Set up new stream")
Expand Down Expand Up @@ -68,16 +67,16 @@ def run_bot(type=Type.COMMENT, guess_and_check=False, reply_to_item=True, write_
log.info(f"{i.id} in r/{i.subreddit} meets criteria")
# Get the urls from the body and try to find the canonicals
urls = get_urls(i.body)
i.links = get_urls_info(urls, guess_and_check)
i.links = get_urls_info(urls, use_gac)

# If a canonical was found, generate a reply, otherwise log a warning
if any(link.canonical for link in i.links) or any(link.amp_canonical for link in i.links):
# Generate a reply
reply_text, reply_canonical_text = generate_reply(
links=i.links,
stream_type=s.type,
np_subreddits=s.np_subreddits,
item_type=i.type,
links=i.links,
subreddit=i.subreddit)

# Try to post the reply
Expand All @@ -97,7 +96,7 @@ def run_bot(type=Type.COMMENT, guess_and_check=False, reply_to_item=True, write_
# Check if AmputatorBot is banned in the subreddit
is_banned = check_if_banned(i.subreddit)
if is_banned:
update_local_data("disallowed_subreddits", i.subreddit)
update_local_data("disallowed_subreddits", i.subreddit, unique=True)
s.disallowed_subreddits.append(i.subreddit)

# If no canonicals were found, log the failed attempt
Expand All @@ -106,15 +105,7 @@ def run_bot(type=Type.COMMENT, guess_and_check=False, reply_to_item=True, write_
update_local_data("comments_failed", i.id)
s.comments_failed.append(i.id)

# If write_to_database is enabled, make a new entry for every URL
if write_to_database:
for link in i.links:
if link.is_amp:
add_data(session=get_engine_session(),
entry_type=type.value,
handled_utc=datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
original_url=link.url_clean,
canonical_url=link.canonical)
save_entry(save_to_database=save_to_database, entry_type=type.value, links=i.links)


while True:
Expand Down
Loading

0 comments on commit 0725d6d

Please sign in to comment.