# Poppy Universe – Simulated User × Type Interaction Data (Layer 3)

This notebook generates **synthetic semantic interaction data** for the third layer of the recommendation engine.  
Unlike layer 2 (which tracks object-level popularity), layer 3 focuses on **type-level patterns**, similar to Netflix’s genre matrices.

---

## Goals

1. **Generate type-based interaction data**
   - Users interact with *categories*, not specific objects  
     - Star Types → O/B/A/F/G/K/M  
     - Planet Types → Terrestrial / Gas Giant / Ice Giant / Dwarf  
     - Moon Parent Planet → All planets that have moons (e.g., Earth, Jupiter, etc.)

2. **Create a matrix-friendly dataset**
   - One row = one interaction: `(User_ID, Category_Type, Category_Value, Strength, Timestamp)`
   - Strength ∈ [1–5] → usable for matrix factorization

3. **Keep behavior realistic**
   - Users have type preferences  
   - Repeated interactions  
   - More recent activity gets more weight  
   - Higher chance to interact with “big categories” (e.g., Gas Giants > Dwarf Planets)

4. **Export dataset for C# MF layer**
   - CSV output  
   - Clean schema  
   - No dependency on object-level datasets

---

## Folder Structure

- **Data/Stars/Stars.csv** → Star dataset  
- **Data/Solar_System/Planets.csv** → Planet dataset  
- **Data/Solar_System/Moons.csv** → Moon dataset  
- **Data/processed/** → For merged or cleaned interaction datasets  
- **Input_Data/csv/** → Simulated user interactions CSV  
- **Outputs/figures/** → Plots showing interaction patterns 

---


> Note: This dataset **ONLY powers Layer 3’s semantic ranking logic** (matrix factorization).  
The actual recommendation engine still runs in C#.


In [None]:
# Imports
import os 
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

## Configuration
Set the number of users, interactions, types, and weights

In [12]:
num_users = 100
num_interactions = 8000   # type interactions → fewer than object interactions

# Star types (spectral classes)
star_types = ['O', 'B', 'A', 'F', 'G', 'K', 'M']

# Planet types
planet_types = ['Terrestrial', 'Gas Giant', 'Ice Giant', 'Dwarf Planet']

# Moon parent-planet categories (fictional + real)
moon_parents = [
    'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto', 'Eris', 'Haumea', 'Makemake'
]

category_types = ['Star', 'Planet', 'Moon']

# Weighted selection (more common types get more interactions)
category_weights = [0.45, 0.4, 0.15]  

## Probability configs

In [13]:
# Some users have a preferred category type (like Netflix genre affinity)
user_preferences = {}
for uid in range(1, num_users + 1):
    if np.random.random() < 0.35:
        user_preferences[uid] = np.random.choice(category_types)

# Each user stores past interacted values per category
memory = {
    uid: {
        'Star': set(),
        'Planet': set(),
        'Moon': set()
    } for uid in range(1, num_users + 1)
}

# Type-specific value lists
value_map = {
    'Star': star_types,
    'Planet': planet_types,
    'Moon': moon_parents
}

# Repeat-interaction probability
repeat_prob = 0.35

# Timestamp hour distribution
hour_probs = np.array([0.02]*6 + [0.035]*6 + [0.06]*6 + [0.085]*6)
hour_probs = hour_probs / hour_probs.sum()

## GENERATE INTERACTIONS

In [14]:
data = []

for i in range(1, num_interactions + 1):
    user_id = np.random.randint(1, num_users + 1)
    
    # If user prefers a category, bias toward it
    if user_id in user_preferences and np.random.random() < 0.4:
        chosen_cat = user_preferences[user_id]
    else:
        chosen_cat = np.random.choice(category_types, p=category_weights)
    
    # Repeat interaction behavior
    seen_values = list(memory[user_id][chosen_cat])
    if seen_values and np.random.random() < repeat_prob:
        chosen_val = np.random.choice(seen_values)
    else:
        chosen_val = np.random.choice(value_map[chosen_cat])
        memory[user_id][chosen_cat].add(chosen_val)
    
    # Interaction strength (1–5, matrix-friendly)
    strength = np.random.choice([1, 2, 3, 4, 5], p=[0.1, 0.15, 0.25, 0.25, 0.25])
    
    # Timestamp
    days_ago = min(np.random.exponential(scale=12), 45)
    hour = int(np.random.choice(range(24), p=hour_probs))
    timestamp = datetime.now() - timedelta(
        days=days_ago,
        hours=hour,
        minutes=np.random.randint(0, 60)
    )
    
    data.append([
        i,
        user_id,
        chosen_cat,
        chosen_val,
        strength,
        timestamp
    ])

## Create DataFrame
Convert the interaction list into a DataFrame with appropriate column names.

In [15]:
cols = ['Interaction_ID', 'User_ID', 'Category_Type', 'Category_Value', 'Strength', 'Timestamp']
df_semantic = pd.DataFrame(data, columns=cols)
df_semantic.head()

Unnamed: 0,Interaction_ID,User_ID,Category_Type,Category_Value,Strength,Timestamp
0,1,86,Star,K,4,2025-12-05 18:42:31.865100
1,2,50,Planet,Terrestrial,1,2025-11-28 04:57:19.070873
2,3,96,Planet,Dwarf Planet,1,2025-12-04 01:06:16.214604
3,4,54,Planet,Ice Giant,3,2025-11-20 07:14:48.239208
4,5,22,Planet,Ice Giant,5,2025-11-14 01:40:03.455302


## Save to CSV
Save the silumated interaction data as a CSV file for use in the recommendation engine.

In [None]:
df_semantic.to_csv(os.path.join(os.path.dirname(__file__), '../Input_Data/MF_Semantic_Type_Interactions.csv'), index=False)
print("CSV generated: Semantic_Type_Interactions.csv")

CSV generated: Semantic_Type_Interactions.csv


- Users interact with **types**, not individual objects  
- Perfect for matrix factorization (User × Category matrix)  
- Users show consistent preference patterns  
- Category diversity ensures embeddings don’t collapse  
- Timestamps skew recent (as real users behave)  
- Interaction strength 1–5 → training target for MF  
- Balanced between repeats and discoveries  