# Poppy Universe – Simulated User Interaction Data

Welcome to the **Poppy Universe user interaction notebook**!  
The star, planet, and moon datasets are already complete and correct. Here, our focus is on **generating realistic simulated user interaction data** to start testing and exploring ML models in Python. This is a **sandbox environment** for development—once we have enough real user data, the models will be trained on that instead. The recommendation engine itself runs separately in C#.

---

## Goals

1. **Generate simulated user interactions**  
   - Simulate clicks, views, favorites, and ratings for stars, planets, and moons  
   - Include timestamps and varied user activity patterns  
   - Provide a starting dataset since we currently have no real user interactions  

2. **Explore interaction patterns**  
   - Count interactions per object and per user  
   - Analyze rating distributions and engagement trends  
   - Check for realistic patterns, e.g., some users more active than others  

3. **Prepare data for ML modeling**  
   - Ensure IDs and object types match your objects datasets  
   - Build a CSV file that can be fed into Python ML models later  
   - Validate that popularity trends and user behaviors look reasonable  

---

## Folder & File References

- **Data/Objects/Stars.csv** → Star dataset  
- **Data/Objects/Planets.csv** → Planet dataset  
- **Data/Objects/Moons.csv** → Moon dataset  
- **Data/processed/** → For merged or cleaned interaction datasets  
- **Outputs/csv/** → Simulated user interactions CSV  
- **Outputs/figures/** → Plots showing interaction patterns  

---

> Note: This notebook is focused **on creating and exploring the simulated user interactions** as a starting point. Later, once we have enough real user data, the models will be trained on that. The actual recommendation engine runs separately in C#.


In [None]:
# Imports
import os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

## Configuration
Set the number of users, objects, interactions, and define possible interaction types and weights.

In [220]:
# ===== CONFIG =====
num_users = 100
num_stars = 200
num_planets = 100
num_moons = 100
num_interactions = 10000

object_types = ['Star', 'Planet', 'Moon']
interaction_types = ['View', 'Click', 'Favorite', 'Rate']
interaction_weights = [0.5, 0.3, 0.15, 0.05]  # realistic likelihood
rating_scale = [1, 2, 3, 4, 5]

## Generate Object IDs
Assign unique IDs for stars, planets, and moons to link interactions.

In [221]:
# ===== Generate object IDs =====
star_ids = np.arange(1, num_stars+1)
planet_ids = np.arange(1, num_planets+1)
moon_ids = np.arange(1, num_moons+1)
object_id_map = {'Star': star_ids, 'Planet': planet_ids, 'Moon': moon_ids}

## Generate Interactions
Create simulated interactions by randomly sampling users, object types, object IDs, interaction types, ratings (if applicable), and timestamps. Timestamps are clustered toward recent activity using an exponential distribution.

In [222]:
# ===== Generate interactions =====
interactions = []

# Pre-calculate rating probabilities
rating_probs = [0.05, 0.1, 0.2, 0.3, 0.35]

# Hour probabilities that sum to 1.0 (24 hours with evening peak)
hour_probs = np.array([0.02]*6 + [0.03]*6 + [0.05]*6 + [0.065]*6)
hour_probs = hour_probs / hour_probs.sum()

# Weight object type selection by number of objects (so each individual object gets similar attention)
total_objects = num_stars + num_planets + num_moons
object_type_weights = [num_stars/total_objects, num_planets/total_objects, num_moons/total_objects]

# Create some user preference patterns (some users prefer certain object types)
user_preferences = {}
for user_id in range(1, num_users + 1):
    if np.random.random() < 0.3:
        user_preferences[user_id] = np.random.choice(object_types)

# Track user interaction history BY OBJECT TYPE for realistic repeat behavior
user_interacted_objects = {uid: {obj_type: set() for obj_type in object_types} for uid in range(1, num_users + 1)}

for i in range(1, num_interactions + 1):
    user_id = np.random.randint(1, num_users + 1)
    
    # Users with preferences interact MORE with their preferred type, but not exclusively
    if user_id in user_preferences and np.random.random() < 0.25:
        pref_type = user_preferences[user_id]
        other_types = [t for t in object_types if t != pref_type]
        obj_type = np.random.choice(
            [pref_type] + other_types,
            p=[0.5, 0.25, 0.25]
        )
    else:
        obj_type = np.random.choice(object_types, p=object_type_weights)  # Weighted by object count
    
    # 30% chance to re-interact with a previously seen object OF THE SAME TYPE
    if user_interacted_objects[user_id][obj_type] and np.random.random() < 0.3:
        obj_id = np.random.choice(list(user_interacted_objects[user_id][obj_type]))
    else:
        obj_id = int(np.random.choice(object_id_map[obj_type]))  # Ensure Python int
        user_interacted_objects[user_id][obj_type].add(obj_id)
    
    interaction_type = np.random.choice(interaction_types, p=interaction_weights)

    if interaction_type == 'Rate':
        rating = np.random.choice(rating_scale, p=rating_probs)
    else:
        rating = np.nan

    # Timestamp with realistic patterns
    days_ago = min(np.random.exponential(scale=10), 30)
    hour = int(np.random.choice(range(24), p=hour_probs))
    timestamp = datetime.now() - timedelta(
        days=days_ago,
        hours=hour,
        minutes=np.random.randint(0, 60)
    )

    interactions.append([i, user_id, obj_type, obj_id, interaction_type, rating, timestamp])

## Create DataFrame
Convert the interaction list into a DataFrame with appropriate column names.

In [223]:
# ===== Create DataFrame =====
cols = ['Interaction_ID','User_ID','Object_Type','Object_ID','Interaction_Type','Interaction_Rating','Timestamp']
df = pd.DataFrame(interactions, columns=cols)
df.head()

Unnamed: 0,Interaction_ID,User_ID,Object_Type,Object_ID,Interaction_Type,Interaction_Rating,Timestamp
0,1,10,Moon,79,View,,2025-10-27 14:24:22.944331
1,2,6,Planet,47,View,,2025-11-17 18:42:21.193449
2,3,31,Star,52,View,,2025-10-27 22:54:05.245487
3,4,76,Moon,92,View,,2025-11-14 04:45:09.154196
4,5,75,Planet,36,View,,2025-11-08 22:50:31.860243


## Save to CSV
Save the silumated interaction data as a CSV file for use in the recommendation engine.

In [None]:
# ===== Save to CSV =====
df.to_csv(os.path.join(os.path.dirname(__file__), '../Input_Data/Simulated_User_Interactions.csv'), index=False)
print('CSV generated: Simulated_user_interactions.csv')


CSV generated: Simulated_user_interactions.csv


### Summary of Patterns
- Some users interact more than others.
- **User preferences**: ~30% of users have a preferred object type (Star/Planet/Moon) and interact with it 60% of the time.
- **Repeat behavior**: 40% of interactions are with objects the user has already seen, simulating realistic re-engagement.
- **Rating bias**: Individual users have subtle rating tendencies (some consistently rate higher/lower by ~0.5 points).
- **Temporal patterns**: Activity peaks during evening hours (6 PM - midnight), with exponential decay favoring recent interactions.
- `View` is the most common interaction, followed by `Click`, `Favorite`, and `Rate`.
- Ratings are slightly biased toward higher values (3-5 stars more common).
- Enough structure to test collaborative filtering, user segmentation, and temporal recommendation logic.