# Poppy Universe – Layer 3: Star Matrix Model

Welcome to the **Poppy Universe Layer 3 – Star Matrix notebook**!  
The star dataset is already fully correct. Here, we focus on **building a matrix-based recommendation model** using simulated user interactions and star types. This is a **sandbox environment** to test collaborative filtering before the engine consumes it.

> Note: This notebook currently uses **simulated user interactions** to test the Star matrix.  
> Once we have enough real interactions, the same pipeline will process actual user data for production recommendations.

---

## Goals

1. **Prepare interaction data for matrix factorization**  
   - Map users to star types  
   - Include weighted interactions (views, clicks, favorites, ratings)  
   - Normalize scores for ML input

2. **Build the User × Star_Type matrix**  
   - Users in rows, star types in columns  
   - Populate with interaction strengths  

3. **Perform matrix factorization / prediction**  
   - Generate predicted scores for each user × star_type  
   - Save intermediate CSV for engine integration

4. **Analyze results**  
   - Identify top star types per user  
   - Visualize patterns across users and star types

---

## Folder & File References

- **../../Input_Data/Stars.csv** → Star dataset  
- **../../Input_Data/Simulated_User_Interactions.csv** → User interaction dataset  
- **../../Output_Data/Layer3_Star_Predictions.csv** → Final predictions for engine  
- **Plots/** → Optional heatmaps or visualizations

---

> Note: This notebook focuses **on the star component** of Layer 3. Planets and moons will have separate notebooks, then merged later.


## 0) Imports

In [1]:
import pandas as pd
import numpy as np
import os
from datetime import datetime, timedelta

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import TruncatedSVD

## 1) Load Data

In [3]:
# --- Load interaction dataset ---
# 'backend_df' is injected via papermill by the master notebook if backend data passed the checks
try:
    interactions = backend_df
    print("Using backend-provided interactions")
except NameError:
    # fallback to CSV if running standalone
    interactions = pd.read_csv("../../../Input_Data/MF_Semantic_Type_Interactions.csv")
    print("Using simulated CSV interactions")

# Ensure Timestamp is datetime
interactions['Timestamp'] = pd.to_datetime(interactions['Timestamp'])

# Preview
interactions.head()

Using simulated CSV interactions


Unnamed: 0,Interaction_ID,User_ID,Category_Type,Category_Value,Strength,Timestamp
0,1,86,Star,K,4,2025-12-05 18:42:31.865100
1,2,50,Planet,Terrestrial,1,2025-11-28 04:57:19.070873
2,3,96,Planet,Dwarf Planet,1,2025-12-04 01:06:16.214604
3,4,54,Planet,Ice Giant,3,2025-11-20 07:14:48.239208
4,5,22,Planet,Ice Giant,5,2025-11-14 01:40:03.455302


**Explanation:**  
We’re loading the simulated user × type interaction data to see what we have. The key columns are:
 
- `Interaction_ID`: unique identifier for each interaction  
- `User_ID`: the user who performed the interaction  
- `Category_Type`: the type of category the interaction belongs to (e.g., Star_Type, Planet_Type, Moon_Parent)  
- `Category_Value`: the specific value within the category (e.g., G for Star_Type, Dwarf Planet for Planet_Type)  
- `Strength`: numerical interaction strength (1–5), used as a matrix factorization target  
- `Timestamp`: when the interaction occurred  
 
This gives us the base data we’ll use to compute features like user-type preferences, recency-weighted strengths, and the semantic matrices for the third layer of the recommendation engine.

## 2) Filter out planet and moon data

In [4]:
# Keep only rows where Category_Type is "Star"
star_interactions = interactions[interactions['Category_Type'] == 'Star']

star_interactions.head()

Unnamed: 0,Interaction_ID,User_ID,Category_Type,Category_Value,Strength,Timestamp
0,1,86,Star,K,4,2025-12-05 18:42:31.865100
6,7,25,Star,B,4,2025-12-05 08:54:21.193153
7,8,75,Star,A,4,2025-11-18 02:15:05.960719
11,12,69,Star,F,5,2025-12-03 00:49:45.966382
13,14,32,Star,A,2,2025-12-04 16:32:54.008194


## 3) Create User × Category Matrix

In [5]:
# Pivot: rows = users, cols = category values, values = max strength (or sum/mean if multiple)
user_category_matrix = star_interactions.pivot_table(
    index='User_ID', 
    columns='Category_Value', 
    values='Strength', 
    aggfunc='max',   # could also be sum or mean
    fill_value=0     # fills missing interactions with 0
)

# Optional: reset column names if you want a flat DataFrame
user_category_matrix = user_category_matrix.reset_index()

print(user_category_matrix.head())


Category_Value  User_ID  A  B  F  G  K  M  O
0                     1  5  5  2  4  5  5  4
1                     2  3  5  4  4  5  5  5
2                     3  3  3  5  5  4  5  4
3                     4  4  5  4  4  5  5  5
4                     5  5  4  4  5  5  2  3


## 4) Matrix Factorization with SGD

In [6]:
# Convert pivot table to numpy array (exclude User_ID column)
R = user_category_matrix.drop('User_ID', axis=1).values
num_users, num_items = R.shape
K = 3  # number of latent features

# Initialize user and item latent matrices randomly
np.random.seed(42)
U = np.random.rand(num_users, K)  # Users × Features
V = np.random.rand(num_items, K)  # Items × Features

# Hyperparameters
alpha = 0.01   # learning rate
beta = 0.02    # regularization term
iterations = 1000

# SGD loop
for it in range(iterations):
    for i in range(num_users):
        for j in range(num_items):
            if R[i, j] > 0:  # only consider observed interactions
                # Predict
                pred = U[i, :].dot(V[j, :].T)
                # Error
                e_ij = R[i, j] - pred
                # Update latent features
                U[i, :] += alpha * (2 * e_ij * V[j, :] - beta * U[i, :])
                V[j, :] += alpha * (2 * e_ij * U[i, :] - beta * V[j, :])

# Reconstruct approximate matrix
R_hat = U.dot(V.T)

print("Original matrix:\n", R)
print("Approximated matrix:\n", R_hat)

Original matrix:
 [[5 5 2 4 5 5 4]
 [3 5 4 4 5 5 5]
 [3 3 5 5 4 5 4]
 [4 5 4 4 5 5 5]
 [5 4 4 5 5 2 3]
 [5 5 5 5 5 5 5]
 [5 5 5 5 5 5 5]
 [4 5 5 5 3 5 0]
 [5 5 5 5 5 5 5]
 [5 4 5 1 5 5 4]
 [5 5 5 5 5 5 3]
 [3 5 5 5 5 5 5]
 [4 3 5 5 4 5 5]
 [4 5 5 4 5 5 4]
 [5 2 5 3 5 3 5]
 [5 0 3 5 4 5 4]
 [5 5 4 0 5 5 3]
 [4 5 5 0 5 5 5]
 [5 5 5 5 4 5 4]
 [5 4 4 5 5 4 5]
 [5 4 5 5 5 5 5]
 [4 0 5 5 5 5 4]
 [5 5 5 5 5 5 5]
 [4 5 5 5 4 4 4]
 [5 5 4 5 4 4 5]
 [5 5 5 5 5 5 5]
 [5 5 0 0 5 5 5]
 [5 5 5 5 5 5 5]
 [5 4 4 5 5 5 4]
 [4 5 5 4 5 5 5]
 [5 5 4 5 5 5 5]
 [5 5 0 5 4 5 5]
 [5 0 5 4 5 0 5]
 [3 5 5 4 5 5 5]
 [5 5 2 0 5 4 5]
 [0 5 5 5 5 5 5]
 [5 5 4 4 5 5 5]
 [4 0 5 5 5 5 5]
 [5 4 5 5 5 4 5]
 [4 0 5 4 5 5 5]
 [5 5 3 4 4 5 4]
 [2 2 5 5 3 2 5]
 [5 4 5 4 5 5 5]
 [5 3 5 5 5 5 5]
 [5 5 5 4 5 5 5]
 [5 3 4 5 5 3 4]
 [5 4 5 5 5 5 4]
 [4 5 5 5 4 5 5]
 [5 1 3 4 4 5 5]
 [4 4 5 5 5 4 5]
 [5 5 5 5 5 5 5]
 [5 4 2 5 5 5 5]
 [5 4 5 5 5 4 4]
 [5 4 5 4 5 4 4]
 [5 4 5 5 5 3 5]
 [5 4 5 3 4 4 5]
 [5 3 5 5 5 5 5]
 [5 5 5 5 5 5

## 5) Convert Approximated Matrix Back to DataFrame

In [7]:
# Convert R_hat back to DataFrame
R_hat_df = pd.DataFrame(R_hat, columns=user_category_matrix.columns[1:])  # skip User_ID
R_hat_df['User_ID'] = user_category_matrix['User_ID'].values
# Optional: reorder columns so User_ID is first
cols = ['User_ID'] + [c for c in R_hat_df.columns if c != 'User_ID']
R_hat_df = R_hat_df[cols]

R_hat_df.head()


Category_Value,User_ID,A,B,F,G,K,M,O
0,1,5.261151,4.628608,3.639428,3.149804,4.789282,4.88213,3.897102
1,2,3.834061,5.071705,4.671911,4.235918,4.116431,5.227182,4.415666
2,3,3.190381,4.3748,4.894918,4.908559,3.67774,4.098439,4.2706
3,4,4.489275,5.047683,4.541268,4.132878,4.522362,5.181424,4.415533
4,5,5.105428,3.062459,4.024846,4.637619,4.64083,2.260102,3.661674


## 6) Save Predicted Matrix to CSV

In [8]:
# Save as CSV for master notebook
R_hat_df.to_csv('../Files/Layer3_Star_Predictions.csv', index=False)

# Optional: preview
print(R_hat_df.head())

Category_Value  User_ID         A         B         F         G         K  \
0                     1  5.261151  4.628608  3.639428  3.149804  4.789282   
1                     2  3.834061  5.071705  4.671911  4.235918  4.116431   
2                     3  3.190381  4.374800  4.894918  4.908559  3.677740   
3                     4  4.489275  5.047683  4.541268  4.132878  4.522362   
4                     5  5.105428  3.062459  4.024846  4.637619  4.640830   

Category_Value         M         O  
0               4.882130  3.897102  
1               5.227182  4.415666  
2               4.098439  4.270600  
3               5.181424  4.415533  
4               2.260102  3.661674  
