Board List Rewrite #456
Okay, so a review of my local commits so far. There are many technical challenges as a part of this task. 8chan deals in a lot of big data.
Everything I'm about to show is non-final and an early preview.
Lets start with the front-end since it's easiest to describe. I've taken a few tips from other imageboards and have arranged something imageboard-esque while adding in new features and organizing information.
This is a complete rewrite. Nothing has been preserved from the original boards.php.
There are two parts to this:
- The full board.php view.
- The board-search.php JSON return.
The full board takes several seconds to load on my computer using my big data copy of 8chan (< 5 seconds). The board-search.php takes almost no time at all. This makes me wonder if the caching is the slowest part of the process.
So far, this feature degrades gracefully. No-JS clients can use this form. I will be adding in a JS widget for easy searching, but this isn't done.
Second half is the back end and there's a lot more to be discussed. I will try to break this apart into several parts.
Data is now recorded in board_stats, a new table. This table records:
- Posts made.
- Post IDs that were created (array, serialized)
- Unique IP count for all posts.
- Hashed IP (using
less_ip) for posts that were created. (array, serialized, uniques)
This is stored on the table using a dual index: stat_uri and stat_hour. stat_hour is a unix timestamp rounded down to the nearest 3600 (the turn of the last hour) in gmtime.
Below is the example of my board_stat table.
This allows for much faster, much more precise enumerating of board activity. It is automatically updated as posts are created and will allow for perfect historic records to be kept. We can use this information to build graphs showing board activity over time. Posts that are deleted or pruned will not affect historic records.
To help facilitate more brute numbering I've also added a posts_total column to the boards table which is also kept up-to-date as posts are created. This is so we can avoid having to count those fucking horrible postss_x tables, which is the least reliable mechanism for doing anything ever.
Now, while these tables and tools are great, there is a lot of old data on 8chan that needs to be migrated. I've added a tools/migrate_board_stats.php. This tool does a lot of shit and is crucial for the upgrade.
- Adds the
boards.posts_totalcolumn. - Sets the value for each
posts_totalcell as being the AUTO_INCREMENT value of their respectiveposts_xtable (minus 1). This is the most accurate way of determining how many posts a board has had. - Adds the
board_statstable. - Goes through every post on the system and retroactively adds them to the
board_statstable. This cannot fill in missing information, but it will make theboard_statstable as accurate as possible with what you have. If no posts have ever been deleted on your board (small board, small website, huge max post number, etc) it will be perfectly accurate. The margin of error only goes up as posts are deleted or pruned.
Latest commit lays the groundwork for js/board-directory.js which will handle all JS-enabled searching.
Look!! It's got a loading graphic!
@ctrlcctrlv Important note: I forgot to mention this before but this is REALLY IMPORTANT and potentially disastrous.
ecfe072 In this commit, I switch the way that timestamps are decided for posts.
Prior, we used the default NOW() MySQL function for defaulting the undefined $post['time'] to the UNIX timecode produced my mysql
Now, post['time'] is hard defined prior to commit as the output of PHP time() which is the UNIX timecode for gmt+0 (timezone independent).
If 8chan's database does not use GMT+0 as its timezone, all posts will have incorrect timestamps.
Latest commit adds tag weight. Tags are sorted by number of boards using them and sized (between 75% and 175% font size) based on how many active users have been in those boards. The example image I'm attaching shows that with all present data (meaning no time constraint), because time constraints will not give any meaningful results on a database with 1 active user.
In a live database, the idea is that the most common tags will rise to the top, and the trending tags (like if /gamergatehq/ or /egy/ has a huge burst in traffic) will become huge.
Additionally, two smaller changes:
- SFW icon removed in favor of a blue briefcase Font Awesome icon.
- Flag representation of language has been changed in favor of language code. (Why?)
AJAX loading is now working and committed. I didn't want to implement an infinite scroller, though. I can revisit this but for now it's a button.
And that's all. The only item not taken care of is language searching / indexing, which is now a part of #458.
I have bugfixing and testing to be done but this is looking good.
Flag representation of language has been changed in favor of language code.
humans aren't computers. at least use the full language name
@Cipherwraith Good eye. I'm going to make a new issue as this is out and people have raised other issues.





(This issue is assigned to me and is being recorded as a reference.)
The board list, accessible at boards.html, is a master index of public boards on 8chan. 8chan is infinitely expanding, and as such, needs a list that can handle
n = ∞boards. There are two parts to this task.Backend
Frontend