# Poppy Universe – Simulated Interaction Data for Neural Network Layer (Layer 4)

This notebook generates **synthetic user × category interactions** designed
specifically for **Layer 4**, the neural network–driven semantic recommender.

Layer 3 (Matrix Factorization) used a clean User × Type matrix.
But **Layer 4 models deeper patterns** using embeddings + an MLP.
---

## Goals (Layer 4 – Neural Network Data)

1. **Generate training-ready interactions**
    - (User_ID, Category_Type, Category_Value, Strength, Timestamp)
    - Encodable into embeddings for NN input

2. **Preserve meaningful structure**
    - Users form habits (repeat interactions)
    - Some users have category preferences
    - More interactions toward bigger/interesting classes

3. **NN-friendly target variable**
    - Strength ∈ [1–5] → regression or 5-class classification

4. **Produce a realistic, diverse dataset**
    - NN learns: user embeddings, type embeddings, temporal patterns, affinity

---

## Folder Structure (Layer 4 I/O)

 - **Input_Data/csv/** → Saved NN training dataset  
 - **Output_Data/figures/** → Optional plots 

---

> Note: Layer 4 learns **semantic preference patterns** via a neural network.  
It forms embeddings for: User_ID, Category_Type, Category_Value, Hour, Day.


In [None]:
# Imports
import os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

## Configuration
Set the number of users, interactions, types, and weights

In [27]:
num_users = 120
num_interactions = 12000   # NN benefits from more data than MF

# Category Types
category_types = ['Star', 'Planet', 'Moon']

# Category Values (embeddable vocabularies)
star_types = ['O', 'B', 'A', 'F', 'G', 'K', 'M']
planet_types = ['Terrestrial', 'Gas Giant', 'Ice Giant', 'Dwarf Planet']
moon_parents = [
    'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune',
    'Pluto', 'Eris', 'Haumea', 'Makemake'
]

value_map = {
    'Star': star_types,
    'Planet': planet_types,
    'Moon': moon_parents
}

# Weighted preference (like IRL: stars + planets dominate)
category_weights = [0.4, 0.4, 0.2]

## User Preference Bias

In [28]:
# Some users prefer a specific category
user_preferences = {}
for uid in range(1, num_users + 1):
    if np.random.random() < 0.40:   # 40% of users have a preference
        user_preferences[uid] = np.random.choice(category_types)

## Memory (for repeat interactions)

In [29]:
memory = {
    uid: {
        'Star': set(),
        'Planet': set(),
        'Moon': set()
    } for uid in range(1, num_users + 1)
}

repeat_prob = 0.33

## Timestamp Distributions

In [30]:
hour_probs = np.array([0.02]*6 + [0.04]*6 + [0.07]*6 + [0.12]*6)
hour_probs /= hour_probs.sum()

# Generate Interactions

In [31]:
data = []

for i in range(1, num_interactions + 1):
    user_id = np.random.randint(1, num_users + 1)

    # Bias toward user preference
    if user_id in user_preferences and np.random.random() < 0.45:
        chosen_cat = user_preferences[user_id]
    else:
        chosen_cat = np.random.choice(category_types, p=category_weights)

    # Repeat interaction behavior
    seen_vals = list(memory[user_id][chosen_cat])
    if seen_vals and np.random.random() < repeat_prob:
        chosen_val = np.random.choice(seen_vals)
    else:
        chosen_val = np.random.choice(value_map[chosen_cat])
        memory[user_id][chosen_cat].add(chosen_val)

    # NN regression/classification target
    strength = np.random.choice([1, 2, 3, 4, 5],
                                p=[0.10, 0.15, 0.25, 0.25, 0.25])

    # Timestamp generation
    days_ago = min(np.random.exponential(scale=14), 60)
    hour = int(np.random.choice(range(24), p=hour_probs))

    timestamp = datetime.now() - timedelta(
        days=days_ago,
        hours=hour,
        minutes=np.random.randint(0, 60)
    )

    data.append([
        i,
        user_id,
        chosen_cat,
        chosen_val,
        strength,
        timestamp
    ])

## Create DataFrame

In [32]:
cols = [
    'Interaction_ID',
    'User_ID',
    'Category_Type',
    'Category_Value',
    'Strength',
    'Timestamp'
]

df_nn = pd.DataFrame(data, columns=cols)
df_nn.head()

Unnamed: 0,Interaction_ID,User_ID,Category_Type,Category_Value,Strength,Timestamp
0,1,50,Star,M,1,2025-11-30 02:02:11.902417
1,2,74,Star,A,5,2025-12-06 20:59:09.242640
2,3,24,Moon,Saturn,2,2025-10-26 01:39:58.938016
3,4,101,Planet,Dwarf Planet,3,2025-11-10 03:54:02.183020
4,5,110,Star,M,1,2025-11-25 05:24:18.392856


## Save Output

In [None]:
df_nn.to_csv(os.path.join(os.path.dirname(__file__), '../Input_Data/NN_Semantic_Interactions.csv'), index=False)
print("CSV generated: NN_Semantic_Interactions.csv")

CSV generated: NN_Semantic_Interactions.csv


# Summary (Layer 4 – Neural Net Data)

- Fully synthetic dataset for neural recommender training  
- User embeddings + type embeddings + value embeddings  
- Strength value supports regression or classification  
- Temporal features available (hour/day for embedding)  
- Realistic user behavior:
   - Preferences  
   - Repeats  
   - Popularity biases  

This dataset now powers your Layer 4 Neural Network in the Poppy Universe engine.
Train your NN directly on this CSV.
---