Skip to content

Releases: cutonbuminband/rcounting

Add analysis dependencies

24 Mar 07:21
Compare
Choose a tag to compare

The analysis code depends on more packages than I had listed in the configuration file -- this release fixes that.

Manually handle More Comments instances

24 Mar 07:05
Compare
Choose a tag to compare

This is a bugfix release to handle changed reddit/praw behaviour since the middle of February. The "continue this thread" links on reddit (and the corresponding MoreComments instances in praw) are sometimes unmoored from their proper place in the chain, appearing at the bottom of the page instead of under the comment they apply to. That's caused the existing praw logic for handling these comments to break.

Luckily, each comment has enough information to let me manually reconstruct where it should go in the comment tree, and since I am rebuilding this tree anyway, I can just manually expand every MoreComments instance I come across, and add it to the tree myself.

Improve error handling

02 Mar 07:20
Compare
Choose a tag to compare

Sometimes things go wrong; that's OK. We can learn from that and move on.

This release makes it easier for the various parts of the r/counting tools to recover from errors, in particular when reddit starts returning unexpected things

Making a release here to push to pypi

10 Jan 11:45
Compare
Choose a tag to compare

v0.5.4

10 Jan 11:29
Compare
Choose a tag to compare

This release makes the rcounting tools better able to handle the different aliases that some counters use, or have used.

Fix directory errors

10 Jan 09:59
Compare
Choose a tag to compare

This is mainly a bugfix release, to correct two errors

The database schema has changed, so the integer id of submissions is no longer calculated and stored; that means we should use the timestamp of the submissions as the order rather than the non-existent integer id
The logic for finding known threads in the thread directory broke when a new thread reached a get, so it was added to the directory two (or even three times). I've included a hacky fix, but a more robust approach should probably be used.

On top of that, new side threads have been added, and the threshold for when a side thread is classed as inactive has been reduced to 5 counts within the last month

Adding github actions for fun and profit

28 Oct 15:25
Compare
Choose a tag to compare

This release is something of a bookkeeping release. Since 0.5.1 the text of the automatic ftf post has been greatly improved, the aliases and ignored counters have been updated, and a bug in the calculation of the total counts in no repeating digits has been found and fixed.

The requirements have been bumped to a later version, since praw was nagging about that. That meant a some fiddling around with the pre-commit hooks, since previous versions of black are incompatible with the latest version of click. It also meant a number of deprecation warnings in the existing code, which have been fixed.

Finally, two workflows have been added for github actions, one of which should deploy new releases automatically to pypi. This release is made partly to test that out.

Automatically the update thread directory and archive

05 Aug 18:04
Compare
Choose a tag to compare

This release builds on and completes the work that was started in 0.2.3

The program can now update the directory pages fully automatically, with no intermediate files that have to be manually uploaded.

In particular, this means that all tables in the directory are updated when python3 update_thread_directory.py is called. The threads in the Top 25 Long Running Side Threads are sorted by the total number of counts.

Additionally, any new threads are added to a table at the bottom of the page. Any threads which have been archived or revived are also taken care of: the archived threads are moved from the directory to the archive, and revived threads are moved from the archive to the new threads table. The new threads table is sorted alphabetically.

The list of side threads and their associated rules has been greatly expanded. For each side thread it knows about, the program keeps track of

  • What a count in the side thread looks like
  • What special rules there are for the side thread
  • How to calculate the total number of counts in the side thread

On top of adding new threads, these rules have also been made more specific for many of the existing threads. In particular, the code now handles revivals of existing threads well: that revivals are no longer considering when finding the total number of comments, in keeping with existing practice.

Finding the right comment in new threads has been made easier by the addition of an option to search for the deepest comment on a submission, instead of trying to follow a chain down. This is a slower, but more robust approach to getting to the right place, and should probably be included as a fallback for all the other threads if the program gets stuck. The thread walking code itself has been made more robust by allowing it to move past a deleted comment if it's the only reply.

0.2.3

28 Jun 12:26
Compare
Choose a tag to compare

This release adds functionality to update the directory of side threads found at www.reddit.com/r/counting/wiki/directory. It roughly follows the following steps

  1. It gets all submissions to r/counting made within the last six months and finds a link to the parent submission of each submission. It uses this information to construct a chain for each thread from parent submission to child submission
  2. It reads each row of each table in the directory and extracts
  • Which side thread it is, based on the link to the first thread
  • What the previous submission, comment and total count were.
  1. It then uses the chain of submissions to find the latest submission of each thread type, and walks down the comments on each submission to the latest one. At each level of comments it goes to the first valid reply based on
  • A per-thread rule describing when a count is valid
  • A per-thread rule describing what looks like a count (to skip over mid-thread conversations)
  1. If the latest submission is not the same as the previous one, it tries to update the total count field

Once it's done all that, it outputs:

  • directory.md: The updated directory in markdown format

It optionally also outputs the following two files, if they are non-empty:

  • archived_threads.md: A table of all threads which were in the existing directory, but where no non-archived submissions were found in their chain
  • new_threads.txt: A list of all submissions made to r/counting which did not match any threads already found in the directory.

The rows corresponding to archived threads are not included in directory.md.

In order to do this all of this, a side_threads module has been added, which keeps track of the rules for all known side threads. These can be retrieved by calling the get_side_thread(<name>) function, and the known side thread names are listed in side_threads.ini

A lot of changes have also happened behind the scenes in terms of how the code is structured and information is passed around.

First public release

09 Jun 14:27
Compare
Choose a tag to compare
Add option to query pushshift  instead of the reddit api

Pushshift is a huge archive of reddit posts (comments + submissions) which is
more suitable for our purposes than querying the reddit api. Since it is an
archive, it doesn't need to include functionality for adding posts, comments or
votes, and it focusses just on earching for posts and returning them.

In particular, it has quite a nice endpoint at `'submission/comment_ids'` which
returns the comment ids of all comments on a submission. We use this, combined
with the ability to retrieve comments by id in bulk to pull down a complete copy
of the thread we are interested in. In theory, we can get 1000 comments at once,
but doing this gives weird errors, so 100 is used instead. Typically, there's
still a speedup of more than a factor of 2 even with this setting.

In order to proceed from here, it was necessary to add a bunch of code to
recreate post and thread objects locally, so that the thread walking code is
able to work with the offline archive. This has the benefit of making much of
the other code simpler, since the semantics of these objects are clearer than
the strings and lists which were passed around before.

The logging code has been updated to make use of this last feature, combined
with the power of pandas dataframes.