# Knowledge Graphs

## Warm-up: What is actually learned?

In KG embedding models like TransE:

1. What are the trainable parameters?

1. Suppose a new entity $e_{\text{new}}$ is added to the KG after training. Can TransE produce an embedding for it without retraining? Why or why not?

1. Let's assume that you observe one triple $(h, r, e_{\text{new}})$. Could you heuristically assign an embedding to $e_{\text{new}}$? What are the limitations? Hint: Think of the translation $\mathbf{h} + \mathbf{r} \approx \mathbf{t}$.

1. Given a trained TransE model and a query $(h, r, ?)$, inference is performed by computing the query embedding $\mathbf q = \mathbf h + \mathbf r$. How can we get semantic understanding of what the embedding vector $\mathbf{q}$ corresponds to?

## TransE Mechanics

Consider the following 2D embeddings:

- $\mathbf{h} = (1, 0)$
- $\mathbf{r} = (1, 1)$
- $\mathbf{t_1} = (2, 1)$
- $\mathbf{t_2} = (2, 2)$

1. Compute the TransE score $f_r(h,t) = -\lVert h + r - t \rVert_2$ for $t_1$ and $t_2$. Which triple is more plausible?

1. Assume that relation $r$ is symmetric (e.g., "siblingOf"), meaning: $(h,r,t_1)$ and $(t_1,r,h)$ are both true. Write down the TransE equations implied by symmetry. What's the issue here?

1. Assume that relation $r$ is 1-to-N (e.g., "studentOf"): $(h,r,t_1)$ and $(h,r,t_2)$ are true with $t_1 \neq t_2$. Write the TransE equations for both triples. What geometric constraint does this impose on $t_1$ and $t_2$?

1. Suppose we increase the embedding dimension from $k=2$ to $k=100$. Does this resolve the issue in (2) and (3)? Justify mathematically.

## Path Queries

Consider a knowledge graph with the following entity types:

- Person
- Company
- City
- University
- Country

and the following directed relations:

- worksAt(Person → Company)
- locatedIn(Company → City)
- studiedAt(Person → University)
- locatedIn(University → City)
- basedIn(City → Country)
- foundedBy(Company → Person)

1. For each of the following natural-language queries, write the formal path query.

    - $Q_1$: Which company does Alice work at?
    - $Q_2$: In which city is the company where Alice works located?
    - $Q_3$: In which country is the company where Alice works based?
    - $Q_4$: Which people work at companies located in Berlin?
    - $Q_5$: Which people studied at universities located in the same city as the company where Alice works?
    - $Q_6$: Which people studied at universities located in cities where companies founded by Bob are based?
    - $Q_7$: Which people both studied at universities in Berlin and work at companies based in Germany?

1. Which of the queries in (1) can TransE represent?

1. What if we simply use $-r$ to represent inverse relations. Would that work?