## Recommendation Engine

**Goal** : Translate historical NYT bestseller dynamics into clear, rule-based book recommendations that can later be surfaced in Streamlit and explained via an LLM.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
DATA_PATH = "../data/processed/history/nyt_history_weekly.csv"

df = pd.read_csv(
    DATA_PATH,
    parse_dates=["published_date", "bestsellers_date"]
)

df.head()


Unnamed: 0,published_date,bestsellers_date,list_name,title,author,primary_isbn13,publisher,rank,rank_last_week,weeks_on_list,amazon_product_url,book_image,description
0,2025-07-13,2025-06-28,Hardcover Fiction,DON'T LET HIM IN,Lisa Jewell,9781668033876,Atria,1,0,1,https://www.amazon.com/dp/1668033879?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,A man with dark secrets in his past may cause ...
1,2025-07-13,2025-06-28,Hardcover Fiction,ATMOSPHERE,Taylor Jenkins Reid,9780593158715,Ballantine,2,1,4,https://www.amazon.com/dp/0593158717?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,"In the summer of 1980, Joan Goodwin begins tra..."
2,2025-07-13,2025-06-28,Hardcover Fiction,A MOTHER'S LOVE,Danielle Steel,9780593498736,Delacorte,3,0,1,https://www.amazon.com/dp/0593498739?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,After her handbag is stolen during a trip to P...
3,2025-07-13,2025-06-28,Hardcover Fiction,NEVER FLINCH,Stephen King,9781668089330,Scribner,4,3,5,https://www.amazon.com/dp/1668089335?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,Holly Gibney does double duty by helping head ...
4,2025-07-13,2025-06-28,Hardcover Fiction,BURY OUR BONES IN THE MIDNIGHT SOIL,V.E. Schwab,9781250320520,Tor,5,2,3,https://www.amazon.com/dp/1250320526?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,Stories set in Santo Domingo de la Calzada in ...


In [3]:
df = df.sort_values(["title", "published_date"]).reset_index(drop=True)

# rank_change: current - last_week (negative = improvement)
# note: rank_last_week == 0 usually means "new this week"
df["rank_change"] = df["rank"] - df["rank_last_week"].replace(0, np.nan)

df[["title","published_date","rank","rank_last_week","rank_change"]].head(10)


Unnamed: 0,title,published_date,rank,rank_last_week,rank_change
0,A FORBIDDEN ALCHEMY,2025-07-20,5,0,
1,A MOTHER'S LOVE,2025-07-13,3,0,
2,A MOTHER'S LOVE,2025-07-20,15,3,12.0
3,ALCHEMISED,2025-10-12,1,0,
4,ALCHEMISED,2025-10-19,3,1,2.0
5,ALCHEMISED,2025-10-26,5,3,2.0
6,ALCHEMISED,2025-11-02,5,5,0.0
7,ALCHEMISED,2025-11-09,5,5,0.0
8,ALCHEMISED,2025-11-16,5,5,0.0
9,ALCHEMISED,2025-11-23,3,5,-2.0


### Build “Trending Up” candidates

Category 1:

What is the goal of this category?

- We want to identify books that are gaining momentum right now — not just popular overall, but improving compared to last week. This sort of mimics how real recommendation systems surface.

A book is Trending Up if:

1. It appears at least 2 weeks (so it’s not a one-off)
2. It is not new this week (rank_last_week != 0)
3. Its rank improved this week (rank_change < 0)
4. We rank candidates by biggest improvement, then better current rank, then most weeks on list

Latest weekly snapshot per title

In [4]:
latest_per_title = (
    df.sort_values(["title", "published_date"])
      .groupby("title", as_index=False)
      .tail(1)
)

latest_per_title = latest_per_title.reset_index(drop=True)
latest_per_title.head()

Unnamed: 0,published_date,bestsellers_date,list_name,title,author,primary_isbn13,publisher,rank,rank_last_week,weeks_on_list,amazon_product_url,book_image,description,rank_change
0,2025-07-20,2025-07-05,Hardcover Fiction,A FORBIDDEN ALCHEMY,Stacey McEwan,9781668076187,Saga,5,0,1,https://www.amazon.com/dp/1668076187?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,"As war looms, Nina Harrow must make a fate-alt...",
1,2025-07-20,2025-07-05,Hardcover Fiction,A MOTHER'S LOVE,Danielle Steel,9780593498736,Delacorte,15,3,2,https://www.amazon.com/dp/0593498739?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,After her handbag is stolen during a trip to P...,12.0
2,2026-01-18,2026-01-03,Hardcover Fiction,ALCHEMISED,SenLinYu,9780593972700,Del Rey,2,5,15,https://www.amazon.com/dp/0593972708?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,"After the war, an imprisoned alchemist is sent...",-3.0
3,2025-10-26,2025-10-11,Hardcover Fiction,ALCHEMY OF SECRETS,Stephanie Garber,9781250789150,Flatiron,6,0,1,https://www.amazon.com/dp/125078915X?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,Holland St. James must retrieve an ancient obj...,
4,2025-10-05,2025-09-20,Hardcover Fiction,AMONG THE BURNING FLOWERS,Samantha Shannon,9781639738861,Bloomsbury,4,0,1,https://www.amazon.com/dp/1639736018?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,In this installment of the Roots of Chaos seri...,


Appearance count per title

In [5]:
title_counts = df.groupby("title").size().rename("total_appearances")
latest_per_title = latest_per_title.merge(title_counts, on="title", how="left")
latest_per_title[["title","published_date","rank","rank_last_week","rank_change","total_appearances"]].head()

Unnamed: 0,title,published_date,rank,rank_last_week,rank_change,total_appearances
0,A FORBIDDEN ALCHEMY,2025-07-20,5,0,,1
1,A MOTHER'S LOVE,2025-07-20,15,3,12.0,2
2,ALCHEMISED,2026-01-18,2,5,-3.0,14
3,ALCHEMY OF SECRETS,2025-10-26,6,0,,1
4,AMONG THE BURNING FLOWERS,2025-10-05,4,0,,1


Filter Trending Up

In [6]:
trending_up = latest_per_title[
    (latest_per_title["total_appearances"] >= 2) &
    (latest_per_title["rank_last_week"] != 0) &
    (latest_per_title["rank_change"].notna()) &
    (latest_per_title["rank_change"] < 0)
].copy()

trending_up.shape

(6, 15)

Sort & select columns

In [7]:
trending_up = trending_up.sort_values(
    by=["rank_change", "rank", "weeks_on_list"],
    ascending=[True, True, False]
)

trending_up_display = trending_up[[
    "title", "author", "published_date", "rank", "rank_last_week",
    "rank_change", "weeks_on_list", "publisher"
]].reset_index(drop=True)

trending_up_display.head(15)

Unnamed: 0,title,author,published_date,rank,rank_last_week,rank_change,weeks_on_list,publisher
0,AN INSIDE JOB,Daniel Silva,2025-08-24,8,13,-5.0,4,Harper
1,ALCHEMISED,SenLinYu,2026-01-18,2,5,-3.0,15,Del Rey
2,BROKEN COUNTRY,Clare Leslie Hall,2025-10-05,11,13,-2.0,26,Simon & Schuster
3,BRIMSTONE,Callie Hart,2026-01-18,5,6,-1.0,7,Forever
4,QUICKSILVER,Callie Hart,2026-01-18,9,10,-1.0,11,Forever
5,GONE BEFORE GOODBYE,Reese Witherspoon and Harlan Coben,2026-01-18,10,11,-1.0,12,Grand Central


In [None]:
trending_up_display["reason"] = (
    "Rank improved by " + (-trending_up_display["rank_change"]).astype(int).astype(str) +
    " vs last week (from #" + trending_up_display["rank_last_week"].astype(int).astype(str) +
    " to #" + trending_up_display["rank"].astype(int).astype(str) + ")."
)

trending_up_display.head(10)

Unnamed: 0,title,author,published_date,rank,rank_last_week,rank_change,weeks_on_list,publisher,reason
0,AN INSIDE JOB,Daniel Silva,2025-08-24,8,13,-5.0,4,Harper,Rank improved by 5 vs last week (from #13 to #8).
1,ALCHEMISED,SenLinYu,2026-01-18,2,5,-3.0,15,Del Rey,Rank improved by 3 vs last week (from #5 to #2).
2,BROKEN COUNTRY,Clare Leslie Hall,2025-10-05,11,13,-2.0,26,Simon & Schuster,Rank improved by 2 vs last week (from #13 to #...
3,BRIMSTONE,Callie Hart,2026-01-18,5,6,-1.0,7,Forever,Rank improved by 1 vs last week (from #6 to #5).
4,QUICKSILVER,Callie Hart,2026-01-18,9,10,-1.0,11,Forever,Rank improved by 1 vs last week (from #10 to #9).
5,GONE BEFORE GOODBYE,Reese Witherspoon and Harlan Coben,2026-01-18,10,11,-1.0,12,Grand Central,Rank improved by 1 vs last week (from #11 to #...


- These are books whose rank improved since last week, excluding new entries.
- We prioritize bigger improvements and strong current rank.
- This method surfaces “momentum” rather than long-term dominance.

### Consistent Top Performers

What is the goal of this category?

- This category highlights books that are reliably popular over time, not just temporarily spiking. These books don’t just trend — they stay popular.

A Consistent Top Performer is a book that:

- Appears many weeks on the list
- Maintains a strong average rank
- Does not fluctuate wildly

Aggregate weekly performance per title

In [9]:
consistent_df = (
    df.groupby("title")
      .agg(
          total_appearances=("published_date", "count"),
          avg_rank=("rank", "mean"),
          median_rank=("rank", "median"),
          max_weeks_on_list=("weeks_on_list", "max")
      )
      .reset_index()
)
consistent_df.head()

Unnamed: 0,title,total_appearances,avg_rank,median_rank,max_weeks_on_list
0,A FORBIDDEN ALCHEMY,1,5.0,5.0,1
1,A MOTHER'S LOVE,2,9.0,9.0,2
2,ALCHEMISED,14,4.357143,5.0,15
3,ALCHEMY OF SECRETS,1,6.0,6.0,1
4,AMONG THE BURNING FLOWERS,1,4.0,4.0,1


Filter for  consistency

In [10]:
MIN_WEEKS = 5
TOP_RANK_THRESHOLD = 10

consistent_df = consistent_df[
    (consistent_df["total_appearances"] >= MIN_WEEKS) &
    (consistent_df["avg_rank"] <= TOP_RANK_THRESHOLD)
]
consistent_df.head()

Unnamed: 0,title,total_appearances,avg_rank,median_rank,max_weeks_on_list
2,ALCHEMISED,14,4.357143,5.0,15
10,ATMOSPHERE,14,4.928571,3.5,17
15,BRIMSTONE,6,3.5,4.0,7
17,BUCKEYE,5,9.2,7.0,5
28,EXIT STRATEGY,5,9.0,9.0,5


Sort by reliability

In [None]:
consistent_df = consistent_df.sort_values(
    by=["median_rank", "total_appearances", "avg_rank"],
    ascending=[True, False, True]
)
consistent_df.head()

Unnamed: 0,title,total_appearances,avg_rank,median_rank,max_weeks_on_list
116,THE WIDOW,10,2.0,1.5,11
108,THE SECRET OF SECRETS,16,3.0,2.5,17
10,ATMOSPHERE,14,4.928571,3.5,17
43,KATABASIS,6,6.0,3.5,6
15,BRIMSTONE,6,3.5,4.0,7


In [13]:
consistent_df["recommendation_reason"] = (
    "Appeared on the bestseller list for "
    + consistent_df["total_appearances"].astype(str)
    + " weeks with a median rank of #"
    + consistent_df["median_rank"].astype(int).astype(str)
)
consistent_df["recommendation_reason"].head()


116    Appeared on the bestseller list for 10 weeks w...
108    Appeared on the bestseller list for 16 weeks w...
10     Appeared on the bestseller list for 14 weeks w...
43     Appeared on the bestseller list for 6 weeks wi...
15     Appeared on the bestseller list for 6 weeks wi...
Name: recommendation_reason, dtype: object

Top 10

In [14]:
TOP_N = 10
consistent_top_recommendations = consistent_df.head(TOP_N)

consistent_top_recommendations


Unnamed: 0,title,total_appearances,avg_rank,median_rank,max_weeks_on_list,recommendation_reason
116,THE WIDOW,10,2.0,1.5,11,Appeared on the bestseller list for 10 weeks w...
108,THE SECRET OF SECRETS,16,3.0,2.5,17,Appeared on the bestseller list for 16 weeks w...
10,ATMOSPHERE,14,4.928571,3.5,17,Appeared on the bestseller list for 14 weeks w...
43,KATABASIS,6,6.0,3.5,6,Appeared on the bestseller list for 6 weeks wi...
15,BRIMSTONE,6,3.5,4.0,7,Appeared on the bestseller list for 6 weeks wi...
53,NOT QUITE DEAD YET,5,5.6,4.0,5,Appeared on the bestseller list for 5 weeks wi...
2,ALCHEMISED,14,4.357143,5.0,15,Appeared on the bestseller list for 14 weeks w...
62,REMAIN,6,7.0,7.0,6,Appeared on the bestseller list for 6 weeks wi...
63,RETURN OF THE SPIDER,6,7.5,7.0,7,Appeared on the bestseller list for 6 weeks wi...
17,BUCKEYE,5,9.2,7.0,5,Appeared on the bestseller list for 5 weeks wi...


- By filtering for minimum list appearances and strong average ranks, we capture reliable, sustained popularity.
- Median rank is used alongside average rank to reduce the impact of short-term rank fluctuations.
- These titles represent safe recommendations for readers who prefer proven bestsellers.

### High Momentum / Rising Books

This category identifies books that are actively climbing the bestseller list, not just performing well overall.

- Rank improvement over time (week-over-week)
- Negative rank_change values -> book moved up the list
- Multiple improvements across weeks (not a one-off jump)

Sort

In [16]:
df_sorted = df.sort_values(
    ["title", "published_date"]
).copy()
df_sorted.head()

Unnamed: 0,published_date,bestsellers_date,list_name,title,author,primary_isbn13,publisher,rank,rank_last_week,weeks_on_list,amazon_product_url,book_image,description,rank_change
0,2025-07-20,2025-07-05,Hardcover Fiction,A FORBIDDEN ALCHEMY,Stacey McEwan,9781668076187,Saga,5,0,1,https://www.amazon.com/dp/1668076187?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,"As war looms, Nina Harrow must make a fate-alt...",
1,2025-07-13,2025-06-28,Hardcover Fiction,A MOTHER'S LOVE,Danielle Steel,9780593498736,Delacorte,3,0,1,https://www.amazon.com/dp/0593498739?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,After her handbag is stolen during a trip to P...,
2,2025-07-20,2025-07-05,Hardcover Fiction,A MOTHER'S LOVE,Danielle Steel,9780593498736,Delacorte,15,3,2,https://www.amazon.com/dp/0593498739?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,After her handbag is stolen during a trip to P...,12.0
3,2025-10-12,2025-09-27,Hardcover Fiction,ALCHEMISED,SenLinYu,9780593972700,Del Rey,1,0,1,https://www.amazon.com/dp/0593972708?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,"After the war, an imprisoned alchemist is sent...",
4,2025-10-19,2025-10-04,Hardcover Fiction,ALCHEMISED,SenLinYu,9780593972700,Del Rey,3,1,2,https://www.amazon.com/dp/0593972708?tag=thene...,https://static01.nyt.com/bestsellers/images/97...,"After the war, an imprisoned alchemist is sent...",2.0


Rank change (week-over-week)

In [17]:
df_sorted["rank_change"] = (
    df_sorted.groupby("title")["rank"].diff()
)
df_sorted["rank_change"].head()

0     NaN
1     NaN
2    12.0
3     NaN
4     2.0
Name: rank_change, dtype: float64

- rank_change < 0 → rank improved (moved up)
- rank_change > 0 → rank worsened
- NaN → first appearance (no previous week)

Momentum per book

In [19]:
momentum_df = (
    df_sorted
    .dropna(subset=["rank_change"])
    .groupby("title")
    .agg(
        appearances=("rank", "count"),
        avg_rank=("rank", "mean"),
        total_rank_change=("rank_change", "sum"),
        best_rank=("rank", "min")
    )
    .reset_index()
)
momentum_df.head()


Unnamed: 0,title,appearances,avg_rank,total_rank_change,best_rank
0,A MOTHER'S LOVE,1,15.0,12.0,15
1,ALCHEMISED,13,4.615385,1.0,2
2,AN INSIDE JOB,3,9.0,7.0,6
3,ATMOSPHERE,13,5.153846,13.0,1
4,BILLION-DOLLAR RANSOM,1,11.0,5.0,11


Rising books

In [22]:
rising_books = momentum_df[
    (momentum_df["appearances"] >= 2) &     # avoid noise
    (momentum_df["total_rank_change"] < 0)  # net upward movement
].sort_values("total_rank_change")

In [None]:
rising_books.head(10)


Unnamed: 0,title,appearances,avg_rank,total_rank_change,best_rank
38,THE CORRESPONDENT,8,6.875,-11.0,1
23,MONA'S EYES,5,7.6,-6.0,4
13,DUNGEON CRAWLER CARL,8,12.375,-3.0,8
45,THE KNIGHT AND THE MOTH,2,12.5,-2.0,12


- Only a small number of titles show sustained upward momentum, indicating that true breakouts are rare.
- “THE CORRESPONDENT” stands out as the strongest performer, with consistent upward movement and a peak rank of #1.
- Most rising books improve gradually rather than sharply, suggesting steady word-of-mouth growth instead of viral spikes.
- Books with fewer appearances but negative rank change may represent early-stage momentum worth monitoring.
- This category is best suited for “Trending Now” recommendations, highlighting books that are actively gaining popularity rather than already established.

---