# Relational Deep Learning

## Relational Tables to Graphs

Given the tables:

- Users

    | user_id |
    | ------- |
    | U1      |
    | U2      |

- Products

    | product_id |
    | ---------- |
    | P1         |
    | P2         |

- Sales

    | sale_id | user_id | product_id | price |
    | ------- | ------- | ---------- | ----- |
    | S1      | U1      | P1         | 20    |
    | S2      | U1      | P2         | 80    |
    | S3      | U2      | P1         | 15    |

1. Construct the relational entity graph. What are the nodes and edges?

1. What node types and edge types exist?

1. Where does the `price` live in the graph representation?

## SQL Join Operation

Consider the SQL query:

In [None]:
SELECT *
FROM Users U
JOIN Sales S ON U.user_id = S.user_id

1. What does this join correspond to in the graph? (Hint: Think about the "SQL joins vs Graph edges" slide #49.)

1. Suppose we change the join condition to `JOIN Sales S ON U.signup_month = S.month` where `U.signup_month` is the month that the user signed up and `S.month` is the month that the sales took place. In this case, how does the graph change?

1. Suppose we are interested in the following task: "Which products are bought together by the same user?". We intend to use this information to recommend product B to users that bought product A. Given this, write the sketch of the SQL join operation (we are not interested in the exact syntax, just the idea is enough).

1. Considering the task in (3), how many GNN layers do we need?

## What about AGG?

We consider a 1-layer GNN operating on the entity graph.

Initial node features:

- User nodes: $h_u^{(0)} = 1$
- Sale nodes: $h_s^{(0)} = \text{price}$
- Product nodes: ignored for now

Aggregation rule (simple, no activation):

$$h_u^{(1)} = W \cdot \sum_{s \in \mathcal{N}(u)} h_s^{(0)}$$

with scalar $W = 0.1$.

1. What classical SQL feature does this GNN layer resemble?

1. What would change if we replaced `SUM` with `MEAN`? (Hint: consider the node degrees)

1. Construct two different purchase histories that result in the same $h_u^{(1)}$. What does this tell you about the expressive power of this GNN layer?

## Temporal Tasks

We consider the same entity graph as before, but now each sale has a timestamp. Each sale node $s$ has:

- feature: $h_s^{(0)} = \text{price}$
- timestamp: $\tau_s$

We want to predict whether a user will churn at time $t$.

1. For a fixed user $u$ and prediction time $t$, which sale nodes should be included in the neighborhood $\mathcal{N}_t(u)$?

1. Suppose we mistakenly include all sales of a user, what would be the issue?

1. Given our task, we believe that recent purchases should matter more than older ones. The current aggregation rule is:

    $$h_u^{(1)} = W \cdot \sum_{s \in \mathcal{N}_t(u)} v_s$$

    which ignores timestamps.

    Propose a modified aggregation rule that takes timestamps into account and assigns higher importance to more recent sales. Briefly explain how your aggregation captures recency.