Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should have a scheduled task for pruning log_topics #5657

Closed
sbulen opened this issue May 6, 2019 · 10 comments · Fixed by #5729
Closed

Should have a scheduled task for pruning log_topics #5657

sbulen opened this issue May 6, 2019 · 10 comments · Fixed by #5729
Milestone

Comments

@sbulen
Copy link
Contributor

sbulen commented May 6, 2019

Description

Question: Should we have a scheduled task for pruning log_topics?

I believe 2.1 has the same behavior as 2.0, in that, log_topics can grow out of hand. In 2.0, we have a script that is shared (informally, in a post) to clean that up.

Additional information/references

https://www.simplemachines.org/community/index.php?topic=212330.msg1667071#msg1667071

@MissAllSunday
Copy link
Contributor

Sure, I'll be in favor of it. Seems simple enough to be incorporated.

@Sesquipedalian
Copy link
Member

Makes sense to me.

@live627
Copy link
Contributor

live627 commented May 8, 2019 via email

@sbulen
Copy link
Contributor Author

sbulen commented May 9, 2019

I wasn't aware of this... I was trying to make a fresh copy of my DB for testing, but had issues, in part, because this table had millions of rows in it...

@jdarwood007
Copy link
Member

To be honest, if this is going to become a scheduled task I recommend:

  1. It be disabled by default for upgrades, new installs can have it installed.
  2. Run on a 15 minute schedule
  3. Limit how many it tries to cleanup at a time.

Large forums could be killed otherwise if this isn't done with millions of entries.

@sbulen
Copy link
Contributor Author

sbulen commented May 10, 2019

Agreed. I'd go a step further & say that folks should have to specifically enable it for upgrades & installs. It should not "just run". But I do think it's important to have as an option.

Though I do have a suspicion that most very large forums already know about it...

@sbulen sbulen changed the title Question: Should we have a scheduled task for pruning log_topics? Should have a scheduled task for pruning log_topics May 30, 2019
@Sesquipedalian Sesquipedalian added this to the Final milestone Jun 6, 2019
@sbulen
Copy link
Contributor Author

sbulen commented Jun 20, 2019

I've been reading code & testing ideas here...

The utility shared online (link above):
This utility goes thru log_topics, and for each unique member found, makes a call to the SMF MarkBoardRead function in Subs-Boards.php. A log_boards record & a log_mark_read record is stored for every board for every member. Although this empties out log_topics, it also results in members found x boards x 2 other records being stored.

Note this is exactly what our 'Mark all messages read' button does at the bottom of our board index. I have 242 boards on my forum. Every time a user presses that button, 484 records are stored/updated.

The utility does this for all members who have ever viewed a topic. So... On my test forum with 2.5M log_topics records, this utility gets rid of all log_topic records. But... It creates ~8M log_board & log_mark_read records. Making the problem 3x worse.

What's in SVN:
The SVN code is out of date. It relies on timestamps on these records that don't exist in 2.0 or 2.1. (I suspect at one point we realized that msg_id is effectively a timestamp.)

The algorithm is simple and clean - for each unique member/board found in log_topics it updates log_mark_read & clears out log_topics. I've confirmed this approach works and decreases the record count, since it replaces multiple topic records with one board record.

Proposal:
These record counts get out of hand, and for no good reason. If a user hasn't been around in a year, and returns and navigates the forum, everything will appear new anyway. Why keep ANY log_board, log_topic & log_mark_read data around for folks who have been inactive for a while? It makes no sense at all.

I suggest a 2-tier approach:

  • Tier 1 - anybody who hasn't logged on in 365 days (a setting) or more - Mark everything UNREAD, by deleting all log_boards, log_topics & log_mark_read for that user. Just delete 'em...
  • Tier 2 - anybody left who hasn't logged on in 90 days (a setting) or more - Emulate the SVN code. For every member/board found in log_topics, update log_mark_read & remove the log_topic record.

I am testing a script that does this and the 2.5M mostly useless records turn into about 250K real records.

MY BIGGEST QUESTION:
Is log_boards needed at all? It appears the read/unread logic does all of its real work based on log_mark_read & log_topics.

In my test environments, if I truncate log_boards, all the "new" statuses still work perfectly. I haven't yet found a meaningful use of the log_board data. Feels like I'm missing something...

@sbulen
Copy link
Contributor Author

sbulen commented Jun 26, 2019

I have a proof of concept that I've been testing out there:
https://github.com/sbulen/sjrbTools/blob/master/smf_read_inds_maint.php

This does the full two-stage approach listed above. At the moment, works for both 2.0 & 2.1.
A final 2.1-specific version should keep log_topics records that have the unwatched field set.

This cleans up millions of records in a way that is virtually unseen by the users. I've run it in my production forum.

@live627
Copy link
Contributor

live627 commented Aug 21, 2019

Tier 1 - anybody who hasn't logged on in 365 days (a setting) or more - Mark everything UNREAD, by deleting all log_boards, log_topics & log_mark_read for that user. Just delete 'em...

This won't piss anyone off at all. /s

@sbulen
Copy link
Contributor Author

sbulen commented Aug 21, 2019

It's configurable.

IF there's been any traffic on the forum at all, most posts will truly be unread anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants