Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Stackranking results based on recent dates/views #9

Open
cilles opened this issue Jul 5, 2022 · 5 comments
Open

Comments

@cilles
Copy link

cilles commented Jul 5, 2022

When there are different results displaying for the same websites, I am getting searches that display older articles at the top (1 week or more, sometimes months ago), and newer articles from the past day or so at the bottom.

ie: $site=u.today,boost=4 would show articles with the following dates (in order):

  • 3 weeks ago
  • 2 weeks ago
  • 1 day ago
  • 2 days ago

If the goggle has just one website, this wouldn't be as much of a problem. But when I have other sites with the same boost ranking, it will also show older results for them as well, so I'll have to scroll past ~8 results before getting the more recent articles from those same websites.

If there were an attribute to specify how the results for individual sites are displayed (recent date vs views/hits), it would help to more finely tune search results.

For more news-oriented goggles, results could be based on most recent dates. But it would also give the option to specify more static websites like wiki or informational sites, based on views/hits.

@remusao
Copy link
Collaborator

remusao commented Aug 18, 2022

Hi @cilles,

Thanks a lot for reaching out and writing detailed feedback. This is a great idea! If we were to add such a mechanism, how would you see it being implemented? Do you have some ideas on the syntax that would make sense for the instructions? This could be a binary switch like $recencyboost or maybe something more generic like $boost=recency to adjust the boost based on recency of the result (dynamically; I'd need to check how feasible that is, though). Wdyt?

@devidw
Copy link

devidw commented Aug 18, 2022

I think it would be powerful when time would not be bounded to the boost action, but could generally be used as options to filter down the instruction. Like in a search operation.

Taken from Brave search:
search brave

One or more options to define a time span from today into the past, as well as the option to define a custom date range from X to Y would be fantastic.

Really useful when building event based goggles, news goggles, etc.

Syntax Thoughts

  • $timeto Some option to define the end point in time, defaults to today/now
  • $timefrom Another option to define the start date time, defaults to the earliest result, any time in the past
  • $timedur And one to set a time duration, possibly in ISO time duration format, in combination with the fact that the default end date is now, this could act as, one week in the past, one month in the past, but also any custom duration

Examples

Boost News article on specific topic inside specific timespan.

corona$site=nytimes.com,intitle,boost,timefrom=2021-10-15T12:00:00,timeto=2022-01-20T12:00:00

Exclude old entries, show only past year:

$discard
! Period 1 Year, see https://en.wikipedia.org/wiki/ISO_8601#Durations
$timedur=P1Y

@cilles
Copy link
Author

cilles commented Aug 19, 2022

I agree with a lot of what @devidw suggested, however I also see value in adding a couple other instructions on top of his suggestion.

$timeto, $timefrom, and $timedur would allow for very fine tuning of results from specific sites (and could lead to some really cool goggles), however I feel that 2 additional, more generic instructions such as $timesort and $pagesort could appeal a bit more to the masses in creating new goggles.

Syntax Thoughts

  • $timesort Allows users to choose generic time-based sorting of the following options

    • newest Returning search results based on the most recent date containing some or part of the search phrase.
    • oldest Returning search results based on the oldest date containing some or part of the search phrase.
  • $pagesort Allows users to choose a generic relevancy-based sorting of the following options

    • relevant Returning search results based on pages matching most or all of the search phrase.
    • popular Returning search results based on the most popular pages yielding some or part of the search phrase.

The default, if not specified by the user would be

  • $timesort=newest
  • $pagesort=relevant

Conflicting Instructions

In terms of a combination of both suggestions having conflicting instructions, I would stack rank them as the following:

$timeto / $timefrom > $timedur > $pagesort > $timesort

This would allow for the instructions to become more generic as you become less specific with what you are searching for.

  • $timeto/$timefrom would allow for results based on a specific time range.
  • $timedur would allow for more generic results based on a time range of X amount of days.
  • $timesort would allow for the most generic time sorting, only sorting in order of the oldest or newest results.

In terms of what takes precedence between $timesort and $pagesort, I feel that $pagesort should take the cake. If not specifying a specific time range or lookback time, then the next most pressing search option would be relevance/popularity over time.

Examples

Default Search Without Suggested Instructions

Without specifying any of the above suggested instructions, the following would yield the most relevant search results for "replacing a civic headlight" as $pagesort=relevant and $timesort=newest would be the defaults (with $pagesort taking precedence over $timesort) and it would attempt to yield results matching most or all of the search phrase.

$site=youtube.com

This would match how goggles currently perform without any of the suggested instructions by me or @devidw.

Most Popular Videos of the 2010's

This instruction would yield the most popular videos within the set time range of 2010 - 2020, with $timeto and $timefrom taking precedence over $pagesort as we don't care about popular videos outside of the set time range.

$site=youtube.com,timeto=2010-01-01,timefrom=2019-12-31,pagesort=popular

Popular News This Week

If trying to yield the most popular news results from the NYT for this week, using the following it will look for results from this week, then sort based on popularity:

$site=newyorktimes.com,timedur=7d,pagesort=popular

Caveat

The only foreseeable problem I am noticing with this would be wanting to change the stack ranking of instructions to better suite a goggle. For example, if I wanted a "time machine" goggle that always looked for the oldest results first, it would not be possible as $pagesort will always take precedence over $timesort. Unless you are specifying a specific time range, there is no way to yield generic results based on the oldest results. A possible workaround would be functionality to change stack ranking of instructions, though that's getting into a much deeper discussion.

But to at least explore the concept, if there were a means of moving the precedence of an instruction up or down using - or +, one might be able to solve this by using either of the following:

$site=newyorktimes.com,-pagesort=relevant,timesort=oldest

or

$site=newyorktimes.com,+timesort=oldest

Which in theory would then allow for $timesort > $pagesort. In this regard, you could move any instruction up or down as much as you desire in the ranking.

So in another example, let's say I have the following:

$site=newyorktimes.com,boost=3

Using that instruction it will display more results for the NYT even if the results are less relevant than other sites (even with $pagesort=relevant as the default, it ranks lower in the stack ranking). But if we take the overall stack ranking of

$discard > $boost > $downrank > $timeto / $timefrom > $timedur > $pagesort > $timesort

We could select more relevant results by moving $pagesort up in the ranking, and only boost a site if the results yielded are relevant:

$site=newyorktimes.com,boost=3,++++pagesort=relevant

By moving $pagesort to the 2nd highest precedence, only superseded by $discard, and thus making the precedence ranking

$discard > $pagesort > $boost > $downrank > $timeto / $timefrom > $timedur > $timesort

or...

$site=newyorktimes.com,----boost=3

Which would ultimately place $pagesort > $boost by changing the precedence ranking to

$discard > $downrank > $timeto / $timefrom > $timedur > $pagesort > $boost > $timesort

@devidw
Copy link

devidw commented Aug 19, 2022

Having these sorting options on top of the available and suggested time options would be an incredible enrichment @cilles.


I could also imagine having the $timedur option reacting dynamically to both the $timefrom and $timeto option.

  • If $timefrom and $timeto are not set in the goggle instruction line, then $timeto is equal to now and the duration goes back in time.
  • Same if $timeto is specified.
  • If on the other hand $timefrom is set, the duration will go the given time duration into the future instead of the past.
  • If both $timefrom and $timeto are available, the duration should be invalid and ignored or marked as error.

Examples

! One month in the past from today/now
$timedur=P1M

! One month in the past from 22nd Feburary 2022
$timedur=P1M,timeto=2022-02-22

! One month in the future from 22nd Feburary 2022
$timedur=P1M,timefrom=2022-02-22

! Should throw an error
$timedur,timefrom,timeto

In combination with sorting as suggested by @cilles this would open an entire new world:

! Give me all security incidents from the last week sorted from oldest to newest:
$discard
security-incident$intitle,timedur=P1W,timesort=oldest

Extending with date offsets

Inspired by @cilles thoughts on modifications using + and - I am thinking of the option of also adding offset periods/duration to the $timefrom and $timeto options to define offsets into the past or future relative to the current date. You would be able to set a time period instead of a fixed data as value, and the date will be dynamically determined by the interpreter relative to the current date time.

! Results from yesterday only
! Since $timeto is used we go into the past with the offset and duration
! Minus one day, duration for one day into the past
$discard
$timedur=P1D,timeto=P1D

! Results from 2 days and 1 hour in the future relative to the date 1 month ago
! Sine $timefrom is used with a duration value we go one month back in time
! But the duration will go into the future not the past
! Minus one month, duration of plus 2 days and 1 hour
$discard
$timedur=P2D1H,timefrom=P1M

@walking-octopus
Copy link

This makes me dream of a search engine that would allow users to upload tiny self-contained Lisp programs for ranking... Of course, not many people know Lisp, this might end up with a DDoS attack, and it might be not very easy or efficient to embed Lisp, but still, one can dream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@remusao @cilles @walking-octopus @devidw and others