Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add controversial ranking #3205

Conversation

iByteABit256
Copy link
Contributor

@iByteABit256 iByteABit256 commented Jun 19, 2023

Closes #2515
Added controversy rank property to posts and comments, and ability to sort by it.

Out of the ones previously discussed, I used the controversy_rank_ratio method, but this is still open for debate.
Any input is welcome.

@lionirdeadman
Copy link

I can't read this code but this should be disabled on clients (web and api) if the instance doesn't allow downvotes as this ranking method becomes useless.

Is this taken into account like the issue mentions?

@iByteABit256
Copy link
Contributor Author

iByteABit256 commented Jun 20, 2023

My thinking is that this should be done on the front end if possible, I've drafted a PR but I haven't written the hiding logic yet.
I think it would become unnecessarily messy overall if done on the back end.

The front end is going to do that job any way, so it doesn't really matter if the controversy rank exists or not in the API responses, does it?

Comment on lines 11 to 19
if downvotes <= 0 or upvotes <= 0 then
return 0;
else
return floor((upvotes + downvotes) * case when upvotes > downvotes then downvotes::float / upvotes::float else upvotes::float / downvotes::float end);
end if;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that came up in #2515 was what the sorting algorithm looks like. The algorithm used by Reddit (in 2015) looks to me like it would work better. Comparison here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=400c37593a339dcdb53851e685ecaf75

Note that I changed this from the one in the issue to use a float for the Reddit one, which I think works better for controversy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the rest of them as well. I think as long as there is a maximum diagonal where values grow according to how close they are to the diagonal, then it's good enough. The Reddit one is quite larger in the diagonal since it raises the activity to the power of the ratio instead of multiplying, so it might avoid some incorrect sorts in close situations, but both are fine I think. My thinking was that since we're going to be running this function pretty heavily, we might as well make it as performant as possible, and powers are pretty bad in that sense.

@@ -347,6 +347,9 @@ impl<'a> CommentQuery<'a> {

query = match self.sort.unwrap_or(CommentSortType::Hot) {
CommentSortType::Hot => query.then_order_by(comment_aggregates::hot_rank.desc()),
CommentSortType::Controversial => {
query.then_order_by(comment_aggregates::controversy_rank.desc())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't do the Reddit method of controversy calculation, should this add a second ORDER BY for activity as @dcormier suggested? Same applies for the other places that do this ordering.

@@ -100,6 +100,7 @@ impl<'a> PersonQuery<'a> {
SortType::Hot | SortType::Active | SortType::TopAll => {
query.order_by(person_aggregates::comment_score.desc())
}
SortType::Controversial => query.order_by(person_aggregates::comment_score.asc()),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right to me, should we maintain a person_aggregates::controversy_score?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense is a person controversial though? I find that idea pretty controversial itself :P

I just added that as a dumb fallback since there probably won't be such an option ever implemented in the UI, I could also put it along with SortType::Hot even if it's quite a different concept.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, so this was added because the same SortType is used here both for posts and for persons, that seems less than ideal. You could add it to the match pattern above together with hot, active, and top all, which also don't really apply to a person.

Might be better though to split that out, so that there's a PostSortType and PersonSortType, alongside the existing CommentSortType. That's a bigger change though 🙂 WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would make more sense to add a separate sort type for person, and then also get rid of the unused options (newcomments, hot, active)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be quite better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made PersonSortType like this:

pub enum PersonSortType {
  New,
  Old,
  MostComments,
  TopDay,
  TopWeek,
  TopMonth,
  TopYear,
  TopAll,
}

Do we need the same with CommunitySortType? If so, what values should be removed from it?

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue with this PR, is that

scheduler.every(CTimeUnits::minutes(5)).run(move || {
update_hot_ranks(&mut conn_2, true);
update_controversy_ranks(&mut conn_2, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the main issue with this PR, that should get you to rethink it. The only reason hot ranks need to be recalculated, is because its a decaying time function. I forget the term in SQL, but its essentially based on an unstable function.

A controversial sort, which works only on upvotes, downvotes, and score, is not that. It can be done entirely with vote triggers.

Just like the post_aggregates.score column, you need to update the vote triggers to run whatever logic necessary to update that controversial_rank column.

@Nutomic
Copy link
Member

Nutomic commented Jun 20, 2023

I can't read this code but this should be disabled on clients (web and api) if the instance doesn't allow downvotes as this ranking method becomes useless.

If votes are disabled, that comes back with LocalSite, and clients can hide this sort option accordingly.

@Nutomic Nutomic changed the title Closes #2515 Add controversial ranking Jun 22, 2023
@iByteABit256 iByteABit256 marked this pull request as draft June 23, 2023 17:53
@iByteABit256
Copy link
Contributor Author

Marking as draft for now, will open it again once I divide the Sort types and do some more testing

SortType::TopYear => PersonSortType::TopYear,
SortType::TopMonth => PersonSortType::TopMonth,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit awkward that this is needed, but I guess there is no other way to make search all work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the same should be done with CommunitySortType or is it too much?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should, but you don't need to do that in this PR unless you want to. A lot of the Post sorts don't make sense for communities either.

@@ -35,7 +35,7 @@ impl fmt::Display for PostId {
/// The person id.
pub struct PersonId(pub i32);

#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize, Default)]
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, PartialOrd, Serialize, Deserialize, Default)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialOrd seems to be unused

@iByteABit256 iByteABit256 marked this pull request as ready for review June 30, 2023 23:35
crates/db_views_actor/src/person_view.rs Outdated Show resolved Hide resolved
crates/db_schema/src/lib.rs Outdated Show resolved Hide resolved
crates/db_schema/src/lib.rs Outdated Show resolved Hide resolved
New,
Old,
MostComments,
TopDay,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, but yeah I think you can get rid of all these Top sorts, and just add CommentScore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the post_to_person_sort_type util, does it make sense to use CommentScore as a default so there aren't so many mappings because of the SortType's Top sorts?

pub fn post_to_person_sort_type(sort: SortType) -> PersonSortType {
  match sort {
    SortType::Active | SortType::Hot | SortType::Controversial => PersonSortType::CommentScore,
    SortType::New | SortType::NewComments => PersonSortType::New,
    SortType::MostComments => PersonSortType::MostComments,
    SortType::Old => PersonSortType::Old,
    _ => PersonSortType::CommentScore,
  }
}

SortType::TopYear => PersonSortType::TopYear,
SortType::TopMonth => PersonSortType::TopMonth,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should, but you don't need to do that in this PR unless you want to. A lot of the Post sorts don't make sense for communities either.

@iByteABit256 iByteABit256 force-pushed the lemmy-2515-controversial-posts-and-comments branch from c3a7270 to 11579d1 Compare July 3, 2023 15:55
@iByteABit256
Copy link
Contributor Author

@dessalines I've made the changes you requested in case you hadn't noticed

@dessalines
Copy link
Member

My bad for being so slow to get to this, I apologize.

@iByteABit256
Copy link
Contributor Author

No worries 😅

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@dessalines dessalines merged commit c890797 into LemmyNet:main Jul 26, 2023
1 check passed
Nutomic pushed a commit that referenced this pull request Jul 27, 2023
* Added controversy rank property to posts and comments, and ability to sort by it

* Triggers instead of schedules tasks, integer -> double, TODO: comments don't seem to get updated with floats, divide SortTypes

* Created PersonSortType

* PersonSortType::MostComments case

* Removed unused PartialOrd trait

* Added new person sort type mappings

* SortType -> PersonSortType

* fixes

* cargo fmt

* fixes after merge with main

* Fixed bug in controversy rank trigger, removed TopX sorts from PersonSortType and added CommentScore instead

* Uncovered enum case

* clippy

* reset translation changes

* translations

* translations

* Added additional hot ordering on controversial posts and comments

* featured local and featured community added to controversy rank index, additional order_by removed (?), added post_score and post_count to PersonSortType

* Woodpecker rerun

* cargo fmt

* woodpecker rerun

* fixed controversy_rank order

* fix

* Readded migration as latest, removed second update statement for setting controversy rank
dessalines added a commit that referenced this pull request Jul 27, 2023
@phiresky
Copy link
Collaborator

I guess I'm kinda late with this.. but since controversy_rank seems purely computed from other columns in the same table, it looks like this could be done with a GENERATED ALWAYS AS controversial_rank(upvotes, downvotes) column which would have avoided the trigger changes and the expensive migration

@iByteABit256
Copy link
Contributor Author

iByteABit256 commented Jul 30, 2023

Didn't know about GENERATED ALWAYS AS, it would have been quite better indeed :/

It would store less data that way too

The good thing is that if the votes don't change that often for a large amount of posts, then those calculations will be avoided using triggers

@MV-GH MV-GH mentioned this pull request Oct 14, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Controversial posts and comments
6 participants