# News item curation and scoring demonstration

The hero area of the BC Randonneurs home page should feature newsletter articles, and which articles are selected should be picked algorithmically.  Criteria for selecting articles should include:

- **Freshness:** Older articles should be automatically dropped after some time.
- **Quality:** We want to show articles that are interesting and well written.
- **Inspiration:** Images of majestic scenery would help to enhance the home page.
- **Diversity:** Images and articles that feature underrepresented populations may encourage reluctanct prospective members to join.

The freshness of each item can automatically be inferred from its publication date.  For the other criteria, I propose that the newsletter editor should assign a *rating* to each article, ranging from 0 (✩✩✩✩✩) to 5 (⭐️⭐️⭐️⭐️⭐️), when uploading it to the server.

When serving the home page, the server would calculate a *score* for each article as follows:
$$ S = \frac{A}{1.5 ^ R - 1} $$
where
$$
\begin{align*}
  S &= \textrm{Score} \\
  A &= \textrm{Age of article (days since publication)} \\
  R &= \textrm{Rating (integer from 0 to 5 inclusive)}
\end{align*}
$$

The articles with the $n$ lowest scores would then be featured.

To understand the formula, you can think of the score as a kind of exponential decay with age, where the rate of decay depends on the rating:
$$ S' = e^{- \frac{A}{1.5 ^ R - 1}} $$

Each additional ⭐️ awarded to an article gives it approximately 1.5× the sustaining power.  The $- 1$ in the denominator makes it so that if $R = 0$ then the article will never be featured since division by zero yields an invalid score.

Since we are raising all these scores to a power $e^{-t}$, we can take the computational shortcut of not doing that and taking the lowest scores $S$ instead of the highest scores $S'$.

---
Here is the core of the code for the demo.

In [1]:
from datetime import date
import numpy as np
import pandas as pd
from ipywidgets import DatePicker, IntSlider, interactive

def scores(item: pd.DataFrame, /, end_date: np.datetime64):
    dates = pd.date_range(max(item.pubdate, end_date - np.timedelta64(91, 'D')), end=end_date, freq='D', inclusive='both')
    ages = dates - item.pubdate
    return pd.Series((ages.days / (1.5 ** item.rating - 1)), index=dates)

def display_results(d: date, n: int):
    global df
    end_date = np.datetime64(d)
    score_data = df.apply(scores, end_date=end_date, axis=1)
    #score_data.set_index(df.title, inplace=True)
    score_data.transpose().plot(logy=True, ylabel="Irrelevance score (lower is better)", legend='reverse', figsize=(15, 8))
    display(score_data.transpose().tail(1).transpose().sort_values(by=end_date).head(n))

Now we import the data.  If you edit the input file `news_items.csv`, you will need to run the notebook again (⏩) to reload the data.

In [2]:
df = pd.read_csv('news_items.csv',
    index_col=['title', 'author'],
    parse_dates=['pubdate'],
)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,pubdate,rating
title,author,Unnamed: 2_level_1,Unnamed: 3_level_1
Williams Lake NDTR 1000,Dara Poon,2024-10-02,5
Bucaneer Report (Permanent #38),Barry Monaghan,2024-10-03,2
Ranchland Randonée,Graham Fishlock,2024-10-07,3
Roll Up Yer Sleeves (Permanent #145),Bob Goodison,2024-11-04,3
Croy Questionnaire: Anna Bonga & Mike Hagen,Mike Croy,2024-11-15,5
Perma-Light Bridal Falls - October,Karen Smith,2024-11-17,0
Perma-Light Bridal Falls - November,Karen Smith,2024-11-18,2
Remembrance Day Ride Report,Murray Tough,2024-11-18,3
Remembrance Day Ride Video,Rob Nygren,2024-11-19,4
Shuswap Shorelines,Bob Goodison,2024-12-01,3


Pick a date, and it computes the scores of the news items.  We would pick the top $n$ items with the lowest scores to be featured on the home page.  The visualization at the bottom shows the evolution of the scores over time (plotted on a semilog scale for clarity).

In [3]:
interactive(display_results,
    d=DatePicker(description="Date", value=date.today()),
    n=IntSlider(description="Top", value=5, min=1, max=10),
)

interactive(children=(DatePicker(value=datetime.date(2024, 12, 8), description='Date', step=1), IntSlider(valu…