Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrupt titles since 2.0.4 upgrade #559

Closed
anarcat opened this issue Jan 30, 2022 · 3 comments
Closed

corrupt titles since 2.0.4 upgrade #559

anarcat opened this issue Jan 30, 2022 · 3 comments

Comments

@anarcat
Copy link

anarcat commented Jan 30, 2022

there's something weird about the titles in the database...

image

third line is in chinese (i think?), 阿纳猫-阿纳猫 is not the title of that page.

i don't believe that was a problem before...

@anarcat
Copy link
Author

anarcat commented Jan 31, 2022

also you'll notice that some titles are okay... but not all of them, which is even weirder.

@arp242
Copy link
Owner

arp242 commented Feb 1, 2022

This happens when people use Google Translate or similar tools to translate a page, which also translates the title and sends it to GoatCounter.

I think you updated from 1.x to 2.x? This includes quite a few database changes which make things faster and take up less disk space. Previously, every pageview would store the title:

created_at           | path  | title   | ..
2020-01-01 15:00:00  | /path | Hello!  | ..
2020-01-01 15:10:00  | /path | Hello!  | ..

That's obviously quite redundant and really adds up on disk space after you've got millions of pageviews, so in 2.x it's:

created_at           | path_id
2020-01-01 15:00:00  | 1
2020-01-01 15:10:00  | 1

And then the paths table stores the path and title once.

The way this works is that it will use the first title it sees (for new pages), and then it will update to a new title if it sees a new title ten times in a row (cached in memory, so e.g. a restart will clear the counter on this).

The problem here is that when updating from 1.x to 2.x GoatCounter doesn't really "know" which title is the correct one. One reason I changed this (aside from performance/space) is that there's now one canonical title. I looked a bit at getting the "most frequently used title" during the migration, but that was incredibly slow, so it just uses the last one (or first one? I forgot). It's unfortunate because sometimes it gets the wrong title, but it was the best I could figure out without having migrations for larger sites take days, and it should correct itself eventually.

There isn't a facility to edit the titles manually, but you can change it in the database if you want (paths.title).

@anarcat
Copy link
Author

anarcat commented Feb 1, 2022 via email

@anarcat anarcat closed this as completed Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants