Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spamming (possibly due to rss feed url redirect) #86

Closed
plowsof opened this issue Jul 15, 2024 · 15 comments
Closed

spamming (possibly due to rss feed url redirect) #86

plowsof opened this issue Jul 15, 2024 · 15 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@plowsof
Copy link

plowsof commented Jul 15, 2024

After hosting a nostrss instance without issue, i ran into a serious spam problem seen here under number of posts: (503k)
https://primal.net/p/5ccf00d8a2b98785a14026e2891c7ec814eb68510dc7f3c5f57629562ca0eeea

I am hosting it for a third party so nothing on my end changed other than the rss url nostrss is pointed at began to redirect to another url:

https://www.revuo-xmr.com/atom.xml -> https://www.revuo-xmr.com/index.xml

Is there a way for me to debug whats going on? for example running it in a dry-run mode?

@rottenwheel
Copy link

We also have this open issue with Primal Caching service for what it is worth.

@Asone
Copy link
Owner

Asone commented Jul 16, 2024

Hi !

First of all, thanks for using nostrss !

What you describe looks a bit odd at first look. In nostrss, the cache mechanism is the following : Each feed gets a job with an uuid, and we create a map for this job in which we store the feed items id (which in the raw rss feed can be the guid).

Nostrss right now does not have a dry run mode but give me a day or two and i'll release a version with this options so we can perform checks over your case.

Meanwhile if you have a local relay you can config nostrss to use only your local relay and run the program with DEBUG mode. the logs will be way more verbose ( maybe too much ? ) but it might provide some hints on what is happening.

If you could share with me the config you used for the feed so i can also check the cron pattern you had on your instance, that'd be great. ( do not forget to remove any sensitive data like private keys )

@Asone Asone added bug Something isn't working enhancement New feature or request good first issue Good for newcomers labels Jul 16, 2024
@Asone Asone self-assigned this Jul 16, 2024
@Asone
Copy link
Owner

Asone commented Jul 16, 2024

Also, for what it's worth, you can check the stats for the account profile here

It says the account published 33 posts.

Note also that in case of spam, most relays have :

  • A maximum parallel process handler
  • A event publishing limitation included to avoid DDOS

When getting spammed, most of the relays will ignore the received events from the spamming source.

@Asone Asone removed the enhancement New feature or request label Jul 16, 2024
@Asone
Copy link
Owner

Asone commented Jul 17, 2024

@rottenwheel @plowsof : after investigating a bit, it seems the problem comes from the cache size used for the feed.

The program allows to define a DEFAULT_CACHE_SIZE env var which if not provided falls back to 100 items to be mapped in the program's memory.

When nostrss gets instanciated, a snapshot of the existing items in the feed is made and stored in memory. Later, on each scheduler tick, when the feed gets re-fetched, the items of the re-fetched xml document gets matched against the memory.

The cache_size can also be defined for each feed you want to broadcast.

This is done in order to avoid bloating program's memory for feeds with tremendous amounts of items.

Most RSS limits the number of items they render as most RSS feeds provides latest items instead of a full index of items of a website.
Looking at the feed URL content you provided in the issue, it seems that there are more than 100 items, which means that the program has been trying to broadcast over and over the same items that weren't retained into the memory.

If the cron rule you defined checks very often the feed and the program or feed hasn't been set, at this time, this means everytime the parser job is run, there are 102 items that gets broadcasted on each run in a very short time, which could be associated as a DDOS or spamming behaviour.

You can take this fixture as an example on how to define the cache size for the feed, or look at the env.dist for default cache size modification.

As a side note i would advise to use a cron pattern that avoids re-fetching the feed too often unless the RSS feed gets updated very regularly (e.g: wikipedia's RSS feeds). For most feeds, having a cron pattern that performs a check every 5, 10 or 15 minutes is okay. e.g : * 1/5 * * * * * for checking every 5 minutes.

@plowsof
Copy link
Author

plowsof commented Jul 18, 2024

Thank you for the quick responses/advice and investigation as to why this would have happened @Asone. So this issue is not reporting a bug in nostrss, rather a feat request for a dry run mode which you immediately added.

After setting an appropriate cache_size and sane cron pattern im confident this wont happen again. I can confirm later today

@Asone
Copy link
Owner

Asone commented Jul 18, 2024

The dry-run mode is almost ready to be released. The feature itself has been merged in the main branch ( see #88 ). I'm taking the opportunity for this new release to update dependencies, but i still have to fix a few changes due to those deps. Once done i'll release 1.0.3 with the dry-run mode included.

@Asone Asone removed the bug Something isn't working label Jul 20, 2024
@Asone
Copy link
Owner

Asone commented Jul 20, 2024

version 1.0.3 with dry-run is out ! 🎉 Closing the issue. Thanks again for using nostrss 🙏

@Asone Asone closed this as completed Jul 20, 2024
@rottenwheel
Copy link

It happened again upon reactivation with new nostrss version, check for yourself on Nostr with the npub in OP. Wonder what it might be... Every 10 minutes it'd go through each of the 200 entries and post one entry, one by one.

@rottenwheel
Copy link

@Asone ^.

@plowsof
Copy link
Author

plowsof commented Jul 23, 2024

current config settings if it helps:

[{
        "id": "revuoxmr",
        "name":"Revuo Monero (XMR)",
        "url": "https://www.revuo-xmr.com/index.xml",
        "schedule":"* 1/10 * * * *",
        "template": "template-file.template",
        "cache_size": 200
}]

@Asone Asone reopened this Aug 4, 2024
@Asone
Copy link
Owner

Asone commented Aug 4, 2024

Hi,

The RSS feed you're consuming have more than 200 items so you have the same problem. Set the cache size to 1000 and you should be good to go

@Asone
Copy link
Owner

Asone commented Aug 7, 2024

@rottenwheel @plowsof i'll close the issue again as it was a config issue.

@Asone Asone closed this as completed Aug 7, 2024
@rottenwheel
Copy link

@Asone Thanks. Question, mind explaining something I might be missing here? It looks like nostrss won't scale, and will cause spam for anyone who uses it at some point or the other.

For instance, you say to set the cache size to 1000, what happens after the 1001 message?

@rottenwheel
Copy link

@Asone any new comments? ^

@plowsof
Copy link
Author

plowsof commented Sep 12, 2024

perhaps there is a usecase im not seeing for a fixed cache. I know people may want to take an < entire rss feed > and sync that somewhere else so the fixed cache would indeed help there if we only wanted to mirror the last 500 items.

Feature request? My K.I.S.S method is only aware of the most recent rss feed item. and stores the actual feed item in a text file. If the 1 feed item we have saved is not == to the most recent item, broadcast, then add this to the text file for comparison later. not perfect, but you will only be wrong once, every so often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants