Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robots.txt blocking search engines from indexing the page #4128

Closed
XANi opened this issue Apr 13, 2023 · 8 comments
Closed

robots.txt blocking search engines from indexing the page #4128

XANi opened this issue Apr 13, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request security

Comments

@XANi
Copy link

XANi commented Apr 13, 2023

Is your feature request related to a problem? Please describe

Soo I was looking for docs about what a given metric does (still have no idea what it exactly does) and google showed that:

Describe the solution you'd like

While I realize it's up to user to not just expose their database to the internet, well, people will use it for their home stuff and random things so this will keep happening, and the fact someone's random test instance shows in search results when looking for docs isn't great.

The first page of that search's results contains 3 private servers in total:

Edit: removed screenshots to avoid exposing public addresses of someone instances

@XANi XANi added the enhancement New feature or request label Apr 13, 2023
@zekker6 zekker6 self-assigned this Apr 18, 2023
zekker6 added a commit that referenced this issue Apr 18, 2023
…dexing

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
@zekker6
Copy link
Contributor

zekker6 commented Apr 18, 2023

Hi @XANi, thank you for the report!

We will add an endpoint to serve /robots.txt contents by all VictoriaMetrics components in #4143, this will instruct search engines to not index contents available at exposed instances.

Also, could you please share name of the metric you've been trying to understand so that I could give you a bit more info? (screenshots were removed to avoid showing IPs of those servers in a public issue tracker)

zekker6 added a commit that referenced this issue Apr 18, 2023
…dexing

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
zekker6 added a commit that referenced this issue Apr 18, 2023
…dexing (#4143)

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
@XANi
Copy link
Author

XANi commented Apr 18, 2023

Yeah I should've blurred those, sorry.

I found what I needed in changelog( vm_next_retention_seconds ), althought I'm still not sure what does "indexdb rotation" entails.

But in general it took a bit of guessing about some stats as there isn't (or I haven't found) "a list of every metric every component produces and what they do". Usually when searching for something I hit changelog or source code.

@zekker6
Copy link
Contributor

zekker6 commented Apr 18, 2023

I found what I needed in changelog( vm_next_retention_seconds ), althought I'm still not sure what does "indexdb rotation" entails.

VictoriaMetrics keeps indexes to speed up queries by having mapping of metric names to their internal representation available. Those indexes are stored in IndexDB. Basically this is a separate set of files whch are used internally.
IndexDB rotation is referring to the process of reconstructing indexes contents with the currently available data and replacing indexes which are currently in use. Internal implementation of IndexDB is performing rotation once per -retentionPeriod.

But in general it took a bit of guessing about some stats as there isn't (or I haven't found) "a list of every metric every component produces and what they do". Usually when searching for something I hit changelog or source code.

Right, we don't have a doc which would cover this. The most useful sources to get this info would be our documentation and github issues I guess. In case you think it will be better to have such doc available feel free to fill a feature request, and we will consider adding this.

Going to close this issue as completed now since initial question have been addressed.

@zekker6 zekker6 closed this as completed Apr 18, 2023
@XANi
Copy link
Author

XANi commented Apr 18, 2023

VictoriaMetrics keeps indexes to speed up queries by having mapping of metric names to their internal representation available. Those indexes are stored in IndexDB. Basically this is a separate set of files whch are used internally.
IndexDB rotation is referring to the process of reconstructing indexes contents with the currently available data and replacing indexes which are currently in use. Internal implementation of IndexDB is performing rotation once per -retentionPeriod.

So if retention is say, five years year, it only have ones that fell out of retention removed once every five years ? Like, if there is a records that will expire in a month and that fires the rotation, they will stay in there for total of 9 years 11 months? I guess at worst it's only 2x waste of space.

@zekker6
Copy link
Contributor

zekker6 commented Apr 19, 2023

So if retention is say, five years year, it only have ones that fell out of retention removed once every five years ?

Yes, this is right.

Like, if there is a records that will expire in a month and that fires the rotation, they will stay in there for total of 9 years 11 months? I guess at worst it's only 2x waste of space.

Yes, that is right as well. Also, note that IndexDB is only storing mapping between time series IDs and respective metric names, so usually it takes a fraction of the overall data size.

valyala pushed a commit that referenced this issue May 8, 2023
…dexing (#4143)

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
valyala pushed a commit that referenced this issue May 8, 2023
…dexing (#4143)

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
valyala pushed a commit that referenced this issue May 8, 2023
…dexing (#4143)

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
valyala pushed a commit that referenced this issue May 8, 2023
…dexing (#4143)

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
valyala pushed a commit that referenced this issue May 8, 2023
…dexing (#4143)

This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
@valyala
Copy link
Collaborator

valyala commented May 18, 2023

FYI, all the VictoriaMetrics components serve /robots.txt, which disallows indexing all its pages by web crawlers, starting from v1.91.0.

@valyala
Copy link
Collaborator

valyala commented May 19, 2023

FYI, the bugfix has been backported to v1.87.6 LTS release.

@valyala
Copy link
Collaborator

valyala commented May 19, 2023

FYI, the bugfix has been also backported to v1.79.13 LTS release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request security
Projects
None yet
Development

No branches or pull requests

3 participants