# PyDough Knowledge Graph Creation: From Relational Data to Graph-Based Metadata

## **Introduction**

This document serves as a comprehensive tutorial on constructing a knowledge graph in PyDough, based on relational data. The tutorial is designed for converting relational database schemas into PyDough metadata, ensuring a structured and efficient approach to representing data. For this example, we will be working with a schema written in SQLite.  

This tutorial takes a hands-on approach, walking you through the entire workflow of creating a PyDough knowledge graph. Along the way, we will integrate key concepts from the PyDough metadata documentation to ensure a thorough understanding of how different components interact.  

The guide is structured as follows:
- Understanding the Data Schema
- Defining Collections and Properties
- Establishing Simple Relationships (Joins)
- Implementing Compound Relationships
- Validating the Graph Structure

In the next cell we will import PyDough and initialize it for several tests later.

In [1]:
%load_ext pydough.jupyter_extensions

import pydough

%load_ext pydough.jupyter_extensions

The pydough.jupyter_extensions extension is already loaded. To reload it, use:
  %reload_ext pydough.jupyter_extensions


## **Understanding the Data Schema**

To construct our knowledge graph in PyDough, we first need to define the underlying relational database schema. In this tutorial, we will use a university database that models departments, professors, students, courses, and study partnerships. This schema contains a variety of relationships, including one-to-one, one-to-many, many-to-many, and self-referencing relationships to showcase how the graphs convert all these kinds of relationships.

### Relational Database Schema

Below is the SQL schema that we will use as the foundation for our PyDough knowledge graph:

```sql
-- One-to-Many (Department → Professors)
CREATE TABLE Departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100) NOT NULL,
    founded DATE
);

CREATE TABLE Professors (
    professor_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    department_id INT,
    is_active BOOLEAN,
    FOREIGN KEY (department_id) REFERENCES Departments(department_id)
);

-- One-to-One (Professor ↔ Office)
CREATE TABLE ProfessorOffices (
    professor_id INT PRIMARY KEY,
    office_number INT NOT NULL,
    building VARCHAR(100) NOT NULL,
    FOREIGN KEY (professor_id) REFERENCES Professors(professor_id)
);

-- Many-to-Many (Students ↔ Courses, Through Enrollments)
CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    course_name VARCHAR(100) NOT NULL
);

CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade INT,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

### Schema Overview and Explanation

This schema models the university system with the following entities and relationships:

- **One-to-Many:** Each department can have multiple professors. The `Professors` table contains a foreign key `department_id` referencing the `Departments` table.
- **One-to-One:** Each professor has exactly one office. The `ProfessorOffices` table uses `professor_id` as both the primary key and foreign key, ensuring a one-to-one relationship.
- **Many-to-Many:** Students enroll in multiple courses, and courses have multiple students. The `Enrollments` table serves as a bridge table to connect `Students` and `Courses`.
- **Self-referencing Many-to-Many:** Students can form study partnerships with other students. The `StudentStudyPartner` table creates self-referencing relationships, ensuring students can be partners but not with themselves.

This structured schema will serve as the basis for transforming relational data into a PyDough knowledge graph, which we will cover in the next sections.

## **Defining Collections and Properties**

### Understanding the Metadata Structure

In PyDough, knowledge graphs are defined using JSON metadata files. These files store structured data representing the entities, attributes, and relationships of a dataset. A metadata file consists of:
- **Graphs:** Logical groupings of related collections.
- **Collections:** Corresponding to SQL tables, collections represent entities.
- **Properties:** Attributes of the tables and their relationships between other tables.

The most basic structure for our example is shown below:

```json
{
    "UniversityGraph": {
        "Students": {...},
        "Professors": {...},
        "Courses": {...},
        "Departments": {...},
        "Enrollments": {...},
        "ProfessorOffices": {...},
    }
}
```

As you can see this part represents the Graph and it's collections. This structure also allows defining multiple logical datasets within the same metadata file.

### Defining Collections

Each SQL table corresponds to a collection in PyDough. Initially, we define empty collections with:

- **type**: Always set to `"simple_table"` for standard SQL tables.
- **table_path**: The SQL table name and location.
- **unique_properties**: The primary key(s) of the table. This can be:
  - A single unique attribute: `"unique_properties": ["attribute"]`
  - A combination of attributes that create a unique key: `"unique_properties": ["attribute1", "attribute2"]`
- **properties**: Initially left empty. This will be defined in the next step.

Here is how the metadata graph would look after defining the collections based on SQL tables:

In [None]:
{
    "University": {
        "Departments": {
            "type": "simple_table",
            "table_path": "main.departments",
            "unique_properties": ["department_id"],
            "properties": {}
        },
        "Professors": {
            "type": "simple_table",
            "table_path": "main.professors",
            "unique_properties": ["professor_id"],
            "properties": {}
        },
        "ProfessorOffices": {
            "type": "simple_table",
            "table_path": "main.professor_offices",
            "unique_properties": ["professor_id"],
            "properties": {}
        },
        "Students": {
            "type": "simple_table",
            "table_path": "main.students",
            "unique_properties": ["student_id"],
            "properties": {}
        },
        "Courses": {
            "type": "simple_table",
            "table_path": "main.courses",
            "unique_properties": ["course_id"],
            "properties": {}
        },
        "Enrollments": {
            "type": "simple_table",
            "table_path": "main.enrollments",
            "unique_properties": [["student_id", "course_id"]],
            "properties": {}
        },
        "StudentStudyPartner": {
            "type": "simple_table",
            "table_path": "main.student_study_partner",
            "unique_properties": [["student_id", "partner_id"]],
            "properties": {}
        }
    }
}

At this stage, collections exist but lack properties.

### Defining Properties

Each property in a collection corresponds to a specific characteristic of an entity. In PyDough, properties can be categorized into:

- **Attributes**: These represent SQL table columns, storing direct data about an entity.
- **Relationships**: These define connections between collections, linking entities through joins or compound relationships.

#### Attributes

At this stage, we focus on defining **attributes**, which map directly to SQL table columns. Each attribute includes:

- **type**: Always `"table_column"` for a SQL table column.
- **column_name**: The SQL column name.
- **data_type**: The data type.

Now, we add properties to collections:


In [None]:
{
    "University": {
        "Departments": {
            "type": "simple_table",
            "table_path": "main.departments",
            "unique_properties": ["department_id"],
            "properties": {
                "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
                "department_name": {"type": "table_column", "column_name": "department_name", "data_type": "string"},
                "founded": {"type": "table_column", "column_name": "founded", "data_type": "date"}
            }
        },
        "Professors": {
            "type": "simple_table",
            "table_path": "main.professors",
            "unique_properties": ["professor_id"],
            "properties": {
                "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
                "name": {"type": "table_column", "column_name": "name", "data_type": "string"},
                "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
                "is_active": {"type": "table_column", "column_name": "is_active", "data_type": "bool"}
            }
        },
        "ProfessorOffices": {
            "type": "simple_table",
            "table_path": "main.professor_offices",
            "unique_properties": ["professor_id"],
            "properties": {
                "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
                "office_number": {"type": "table_column", "column_name": "office_number", "data_type": "int32"},
                "building": {"type": "table_column", "column_name": "building", "data_type": "string"}
            }
        },
        "Students": {
            "type": "simple_table",
            "table_path": "main.students",
            "unique_properties": ["student_id"],
            "properties": {
                "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
                "name": {"type": "table_column", "column_name": "name", "data_type": "string"}
            }
        },
        "Courses": {
            "type": "simple_table",
            "table_path": "main.courses",
            "unique_properties": ["course_id"],
            "properties": {
                "course_id": {"type": "table_column", "column_name": "course_id", "data_type": "int32"},
                "course_name": {"type": "table_column", "column_name": "course_name", "data_type": "string"}
            }
        },
        "Enrollments": {
            "type": "simple_table",
            "table_path": "main.enrollments",
            "unique_properties": [["student_id", "course_id"]],
            "properties": {
                "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
                "course_id": {"type": "table_column", "column_name": "course_id", "data_type": "int32"}
            }
        }
      }
    }

If we copy and paste that in to a graph.json file to load it, we can check the graph current structure with PyDough utilities. We can do this constantly in any part of the process to check the structure.

In [6]:
pydough.active_session.load_metadata_graph("../metadata/graph.json", "University")
graph = pydough.active_session.metadata
print(pydough.explain_structure(graph))

Structure of PyDough graph: University

  Courses
  ├── course_id
  └── course_name

  Departments
  ├── department_id
  ├── department_name
  └── founded

  Enrollments
  ├── course_id
  └── student_id

  ProfessorOffices
  ├── building
  ├── office_number
  └── professor_id

  Professors
  ├── department_id
  ├── is_active
  ├── name
  └── professor_id

  StudentStudyPartner
  ├── partner_id
  └── student_id

  Students
  ├── name
  └── student_id


## **Simple Relationships**

In PyDough, simple relationships are properties on the graph that work as direct mappings between tables that reflect the foreign key constraints in relational databases. These relationships are used to define one-to-one and one-to-many connections between collections.

Each simple relationship is a **property** defined using:

- **type**: `"simple_join"`, indicating that this is a direct relationship.
- **other_collection_name**: The name of the related collection.
- **singular**: A boolean indicating whether the relationship is one-to-one (`true`) or one-to-many (`false`).
- **no_collisions**: A boolean ensuring uniqueness of the relationship. It is `true` when the reverse relationship is singular.
- **keys**: A mapping of columns between the two collections.
- **reverse_relationship_name**: The name given to the reverse relationship.

#### One-to-One Relationship:

A **one-to-one relationship** means that each record in a table corresponds to exactly one record in another table. In our schema, each professor has a unique office.

```sql
CREATE TABLE Professors (
    professor_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    department_id INT,
    is_active BOOLEAN,
    FOREIGN KEY (department_id) REFERENCES Departments(department_id)
);

CREATE TABLE ProfessorOffices (
    professor_id INT PRIMARY KEY,
    office_number INT NOT NULL,
    building VARCHAR(100) NOT NULL,
    FOREIGN KEY (professor_id) REFERENCES Professors(professor_id)
);
```

In PyDough, we represent this as a **simple relationship**, the relationship is defined in `"ProfessorOffices"` in a property called **professor**.


In [None]:
{
    "Professors": {
      "type": "simple_table",
      "table_path": "main.professors",
      "unique_properties": ["professor_id"],
      "properties": {
        "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
        "name": {"type": "table_column", "column_name": "name", "data_type": "string"},
        "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
        "is_active": {"type": "table_column", "column_name": "is_active", "data_type": "boolean"}
      }
    },
    "ProfessorOffices": {
      "type": "simple_table",
      "table_path": "main.professor_offices",
      "unique_properties": ["professor_id"],
      "properties": {
        "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
        "office_number": {"type": "table_column", "column_name": "office_number", "data_type": "int32"},
        "building": {"type": "table_column", "column_name": "building", "data_type": "string"},

        "professor": {
          "type": "simple_join",
          "other_collection_name": "Professors",
          "singular": true,
          "no_collisions": true,
          "keys": { "professor_id": ["professor_id"] },
          "reverse_relationship_name": "office"

        }
      }
    }
  }

**Explanation:**
- The `ProfessorOffices` collection includes the `professor` relationship, linking each office to a single professor.
- The `Professors` collection does not contain an explicit relationship, as PyDough automatically infers it from `ProfessorOffices`.
- **singular: true** ensures that each office is associated with only one professor.
- **no_collisions: true** enforces uniqueness, meaning one office cannot belong to multiple professors.

Here we can check the structure after adding the relationship in our graph.json document.

In [7]:
pydough.active_session.load_metadata_graph("../metadata/graph.json", "University")
graph = pydough.active_session.metadata
print(pydough.explain_structure(graph))

Structure of PyDough graph: University

  Courses
  ├── course_id
  └── course_name

  Departments
  ├── department_id
  ├── department_name
  └── founded

  Enrollments
  ├── course_id
  └── student_id

  ProfessorOffices
  ├── building
  ├── office_number
  ├── professor_id
  └── professor [one member of Professors] (reverse of Professors.office)

  Professors
  ├── department_id
  ├── is_active
  ├── name
  ├── professor_id
  └── office [one member of ProfessorOffices] (reverse of ProfessorOffices.professor)

  StudentStudyPartner
  ├── partner_id
  └── student_id

  Students
  ├── name
  └── student_id


As one can see, now we have a one-to-one relationship between Professors and their respective office.

Alternatively, we could define the relationship in the `Professors` collection instead of `ProfessorOffices`, and it would still work correctly. In this case, PyDough would infer the reverse relationship in `ProfessorOffices`. 

### One-to-Many Relationship:

A **one-to-many** relationship occurs when a single record in one table relates to multiple records in another. In SQL, this is commonly represented by a foreign key.

For example, in our SQL schema, each department has multiple professors, but each professor belongs to only one department:

```sql
CREATE TABLE Departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100) NOT NULL,
    founded DATE
);

CREATE TABLE Professors (
    professor_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    department_id INT,
    is_active BOOLEAN,
    FOREIGN KEY (department_id) REFERENCES Departments(department_id)
);
```

In PyDough, this is represented using a simple relationship on `"Professors"` as a property called **department**. Very similar to the one-to-one, but with different singular and no_colision values:

In [None]:
{
  "Departments": {
    "type": "simple_table",
    "table_path": "main.departments",
    "unique_properties": ["department_id"],
    "properties": {
        "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
        "department_name": {"type": "table_column", "column_name": "department_name", "data_type": "string"},
        "founded": {"type": "table_column", "column_name": "founded", "data_type": "date"},
    }
},
"Professors": {
    "type": "simple_table",
    "table_path": "main.professors",
    "unique_properties": ["professor_id"],
    "properties": {
        "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
        "name": {"type": "table_column", "column_name": "name", "data_type": "string"},
        "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
        "is_active": {"type": "table_column", "column_name": "is_active", "data_type": "boolean"},
        
        "department": {
            "type": "simple_join",
            "other_collection_name": "Departments",
            "singular": true,
            "no_collisions": false,
            "keys": { "department_id": ["department_id"] },
            "reverse_relationship_name": "professors"
        }

    }
  }
}

**Explanation:**
- The `Professors` collection defines a `department` relationship, which connects each professor to a single `Departments` record.
- The relationship is not explicitly defined in `Departments` because it is automatically inferred from `Professors`.
- **singular: true** in `Professors` ensures that each professor belongs to only one department.
- **no_collisions: false** allows multiple professors to reference the same department.

Alternatively, we could define the relationship in the `Departments` collection instead of `Professors`. In this case, we would add a `professors` property to `Departments`, making the relationship plural. However, this means:

- **singular: false** – A department can have multiple professors.
- **no_collisions: true** – Each professor belongs to only one department, ensuring uniqueness from the reverse perspective.

Both approaches correctly define the relationship, but choosing which one to use depends on how we prefer to structure the data in PyDough.

Here we can check the structure after adding the relationship in our graph.json document.

In [28]:
pydough.active_session.load_metadata_graph("../metadata/graph.json", "University")
graph = pydough.active_session.metadata
print(pydough.explain_structure(graph))

Structure of PyDough graph: University

  Courses
  ├── course_id
  └── course_name

  Departments
  ├── department_id
  ├── department_name
  ├── founded
  └── professors [multiple Professors] (reverse of Professors.department)

  Enrollments
  ├── course_id
  └── student_id

  ProfessorOffices
  ├── building
  ├── office_number
  ├── professor_id
  └── professor [one member of Professors] (reverse of Professors.office)

  Professors
  ├── department_id
  ├── is_active
  ├── name
  ├── professor_id
  ├── department [one member of Departments] (reverse of Departments.professors)
  └── office [one member of ProfessorOffices] (reverse of ProfessorOffices.professor)

  Students
  ├── name
  └── student_id


Here is how the whole graph.json looks after adding the simple relationships:

In [None]:
{
  "University": {
      "Departments": {
          "type": "simple_table",
          "table_path": "main.departments",
          "unique_properties": ["department_id"],
          "properties": {
              "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
              "department_name": {"type": "table_column", "column_name": "department_name", "data_type": "string"},
              "founded": {"type": "table_column", "column_name": "founded", "data_type": "date"}
          }
      },
      "Professors": {
          "type": "simple_table",
          "table_path": "main.professors",
          "unique_properties": ["professor_id"],
          "properties": {
              "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
              "name": {"type": "table_column", "column_name": "name", "data_type": "string"},
              "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
              "is_active": {"type": "table_column", "column_name": "is_active", "data_type": "bool"},
              "department": {
                  "type": "simple_join",
                  "other_collection_name": "Departments",
                  "singular": true,
                  "no_collisions": false,
                  "keys": { "department_id": ["department_id"] },
                  "reverse_relationship_name": "professors"
              }
          }
      },
      "ProfessorOffices": {
          "type": "simple_table",
          "table_path": "main.professor_offices",
          "unique_properties": ["professor_id"],
          "properties": {
            "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
            "office_number": {"type": "table_column", "column_name": "office_number", "data_type": "int32"},
            "building": {"type": "table_column", "column_name": "building", "data_type": "string"},
            "professor": {
              "type": "simple_join",
              "other_collection_name": "Professors",
              "singular": true,
              "no_collisions": true,
              "keys": { "professor_id": ["professor_id"] },
              "reverse_relationship_name": "office"
            }
          }
      },
      "Students": {
          "type": "simple_table",
          "table_path": "main.students",
          "unique_properties": ["student_id"],
          "properties": {
              "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
              "name": {"type": "table_column", "column_name": "name", "data_type": "string"}
          }
      },
      "Courses": {
          "type": "simple_table",
          "table_path": "main.courses",
          "unique_properties": ["course_id"],
          "properties": {
              "course_id": {"type": "table_column", "column_name": "course_id", "data_type": "int32"},
              "course_name": {"type": "table_column", "column_name": "course_name", "data_type": "string"}
          }
      },
      "Enrollments": {
          "type": "simple_table",
          "table_path": "main.enrollments",
          "unique_properties": [["student_id", "course_id"]],
          "properties": {
              "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
              "course_id": {"type": "table_column", "column_name": "course_id", "data_type": "int32"}
          }
        }
      }
    }


## **Implementing Compound Relationships in PyDough**

Now that all the atributes and simple relationships are done, we can continue with the compound relationships.

### Many-to-Many Relationships: 

A **many-to-many** relationship means that multiple records in one table relate to multiple records in another. In a relational database, this is managed using a **bridge table** that connects both entities.

In this case we got an example with `Students, Courses and Enrollements`.

```sql
-- Students Table
CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

-- Courses Table
CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    course_name VARCHAR(100) NOT NULL
);

-- Enrollments Table (Bridge Table)
CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
```

To create this **many-to-many** relationship in PyDough, we need:

1. A simple relationship in Students that connects to Enrollments.
2. A simple relationship in Enrollments that connects to Courses.
3. A compound relationship in Students that allows direct access to Courses through Enrollments.


#### Step 1: Adding a Simple Relationship in Students. 

The `Students` collection needs a simple relationship to Enrollments. This helps link a student to all their enrollments. We do this with a **enrollments** property on Students. 

In [None]:
{
 "Students": {
    "type": "simple_table",
    "table_path": "main.students",
    "unique_properties": ["student_id"],
    "properties": {
        "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
        "name": {"type": "table_column", "column_name": "name", "data_type": "string"},

        "enrollments": {
            "type": "simple_join",
            "other_collection_name": "Enrollments",
            "singular": false,
            "no_collisions": true,
            "keys": {"student_id": ["student_id"]},
            "reverse_relationship_name": "student"
            }
            
        }
    }
}

**Explanation**

- Students now has an **enrollments** property.
- The propety **enrollments** links each student to multiple enrollments.
- **singular: false** → One student can have multiple enrollments.
- **no_collisions: true** → Each enrollment belongs to only one student.

#### Step 2: Adding a Simple Relationship in Enrollments.

Now, we add a simple relationship in `Enrollments` to connect it to `Courses`. This is done with a **course** property on the `Enrollments` table.  

In [None]:
{
    "Enrollments": {
        "type": "simple_table",
        "table_path": "main.enrollments",
        "unique_properties": [["student_id", "course_id"]],
        "properties": {
            "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
            "course_id": {"type": "table_column", "column_name": "course_id", "data_type": "int32"},

            "course": {
                "type": "simple_join",
                "other_collection_name": "Courses",
                "singular": true,
                "no_collisions": false,
                "keys": {"course_id": ["course_id"]},
                "reverse_relationship_name": "enrollments"
                }
            }
        }
    }

**Explanation**

- Enrollments now has a **course** property.
- The property **course** links each enrollment to one specific course.
- **singular: true** → Each enrollment belongs to exactly one course.
- **no_collisions: false** → Multiple enrollments can reference the same course, allowing many students to enroll in the same course.


#### Step 3: Adding a Compound Relationship in Students.

Now that we have connected `Students` to `Enrollments` using a **simple relationship** and `Enrollments` to `Courses` using another simple relationship, we can now create a direct link from `Students` to `Courses` using a **compound relationship**.

A **compound relationship** in PyDough allows us to combine two relationships into one, effectively "skipping" the intermediary table (`Enrollments`) when querying.

#### Components of a Compound Relationship
A compound relationship consists of:
- **primary_property**: The first simple relationship that connects the current collection (`Students`) to an intermediary collection (`Enrollments`).
- **secondary_property**: The second simple relationship that connects the intermediary collection (`Enrollments`) to another target collection (`Courses`).
- **singular**: If **true**, each record would match only one record (which is not the case here). Since a student can enroll in multiple courses, this should be **false**.
- **no_collisions**: If **true**, multiple records can match onto the same record, ensuring no duplication. Since a course can have multiple students, this is **true**.
- **inherited_properties**: Allows transferring properties from `Enrollments` to `Students` when accessing courses. This can include metadata from Enrollments like enrollment_date, grade, or any other relevant details.
- **reverse_relationship_name**: The name of the reverse relationship in `Courses` that allows querying  students from courses.

Here, we apply the compound relationship as the **courses** property on the `Students` table.


In [None]:
{
        "Students": {
            "type": "simple_table",
            "table_path": "main.students",
            "unique_properties": ["student_id"],
            "properties": {
                "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
                "name": {"type": "table_column", "column_name": "name", "data_type": "string"},
                "enrollments": {
                    "type": "simple_join",
                    "other_collection_name": "Enrollments",
                    "singular": false,
                    "no_collisions": true,
                    "keys": {"student_id": ["student_id"]},
                    "reverse_relationship_name": "student"
                },
                "courses": {
                    "type": "compound",
                    "primary_property": "enrollments",
                    "secondary_property": "course",
                    "singular": false,
                    "no_collisions": false,
                    "inherited_properties": {},
                    "reverse_relationship_name": "students"
                    
                }
            }
        }
    }

**Explanation**

- Primary Property: enrollments → Connects `Students` to `Enrollments`.
- Secondary Property: course → Connects `Enrollments` to `Courses.`
- Directly links students to courses, bypassing `Enrollments`.
- singular: false → A student can enroll in multiple courses.
- no_collisions: false → A course can have multiple students, maintaining the many-to-many structure.

With that, we have established the structure for a compound relationship between `Students` and `Courses`. 

Now we can check the graph structure to see the compound relationship.

In [27]:
pydough.active_session.load_metadata_graph("../metadata/graph.json", "University")
graph = pydough.active_session.metadata
print(pydough.explain_structure(graph))

Structure of PyDough graph: University

  Courses
  ├── course_id
  ├── course_name
  ├── enrollments [multiple Enrollments] (reverse of Enrollments.course)
  └── students [multiple Students] (reverse of Students.courses)

  Departments
  ├── department_id
  ├── department_name
  ├── founded
  └── professors [multiple Professors] (reverse of Professors.department)

  Enrollments
  ├── course_id
  ├── student_id
  ├── course [one member of Courses] (reverse of Courses.enrollments)
  └── student [one member of Students] (reverse of Students.enrollments)

  ProfessorOffices
  ├── building
  ├── office_number
  ├── professor_id
  └── professor [one member of Professors] (reverse of Professors.office)

  Professors
  ├── department_id
  ├── is_active
  ├── name
  ├── professor_id
  ├── department [one member of Departments] (reverse of Departments.professors)
  └── office [one member of ProfessorOffices] (reverse of ProfessorOffices.professor)

  Students
  ├── name
  ├── student_id
  ├── c

In this structure, a compound relationship is used to efficiently connect `Students` to `Courses`, skipping the intermediate `Enrollments` table. This maintains the many-to-many relationship while simplifying data access.

- `Students` now directly reference `Courses` through the property **courses**, even though the relationship is stored in `Enrollments`.

- `Courses` can access their enrolled students via the property **students**, effectively creating a bidirectional connection between `Students` and `Courses`.

## **Validating Graph.**

Here we have the whole **graph.json** with all the relationships we've done until now:

In [None]:
{
    "University": {
        "Departments": {
            "type": "simple_table",
            "table_path": "main.departments",
            "unique_properties": ["department_id"],
            "properties": {
                "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
                "department_name": {"type": "table_column", "column_name": "department_name", "data_type": "string"},
                "founded": {"type": "table_column", "column_name": "founded", "data_type": "date"}
            }
        },
        "Professors": {
            "type": "simple_table",
            "table_path": "main.professors",
            "unique_properties": ["professor_id"],
            "properties": {
                "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
                "name": {"type": "table_column", "column_name": "name", "data_type": "string"},
                "department_id": {"type": "table_column", "column_name": "department_id", "data_type": "int32"},
                "is_active": {"type": "table_column", "column_name": "is_active", "data_type": "bool"},
                "department": {
                    "type": "simple_join",
                    "other_collection_name": "Departments",
                    "singular": true,
                    "no_collisions": false,
                    "keys": { "department_id": ["department_id"] },
                    "reverse_relationship_name": "professors"
                }
            }
        },
        "ProfessorOffices": {
            "type": "simple_table",
            "table_path": "main.professor_offices",
            "unique_properties": ["professor_id"],
            "properties": {
              "professor_id": {"type": "table_column", "column_name": "professor_id", "data_type": "int32"},
              "office_number": {"type": "table_column", "column_name": "office_number", "data_type": "int32"},
              "building": {"type": "table_column", "column_name": "building", "data_type": "string"},
              "professor": {
                "type": "simple_join",
                "other_collection_name": "Professors",
                "singular": true,
                "no_collisions": true,
                "keys": { "professor_id": ["professor_id"] },
                "reverse_relationship_name": "office"
              }
            }
        },
        "Students": {
            "type": "simple_table",
            "table_path": "main.students",
            "unique_properties": ["student_id"],
            "properties": {
                "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
                "name": {"type": "table_column", "column_name": "name", "data_type": "string"},
                "enrollments": {
                    "type": "simple_join",
                    "other_collection_name": "Enrollments",
                    "singular": false,
                    "no_collisions": false,
                    "keys": {"student_id": ["student_id"]},
                    "reverse_relationship_name": "student"
                    },
                "courses": {
                    "type": "compound",
                    "primary_property": "enrollments",
                    "secondary_property": "course",
                    "singular": false,
                    "no_collisions": false,
                    "inherited_properties": {},
                    "reverse_relationship_name": "students"
                    }
            }
        },
        "Courses": {
            "type": "simple_table",
            "table_path": "main.courses",
            "unique_properties": ["course_id"],
            "properties": {
                "course_id": {"type": "table_column", "column_name": "course_id", "data_type": "int32"},
                "course_name": {"type": "table_column", "column_name": "course_name", "data_type": "string"}
            }
        },
        "Enrollments": {
            "type": "simple_table",
            "table_path": "main.enrollments",
            "unique_properties": [["student_id", "course_id"]],
            "properties": {
                "student_id": {"type": "table_column", "column_name": "student_id", "data_type": "int32"},
                "course_id": {"type": "table_column", "column_name": "course_id", "data_type": "int32"},
                "course": {
                    "type": "simple_join",
                    "other_collection_name": "Courses",
                    "singular": true,
                    "no_collisions": false,
                    "keys": {"course_id": ["course_id"]},
                    "reverse_relationship_name": "enrollments"
                    }
            }
        }
    }
}


And here we have the final structure.

In [26]:
pydough.active_session.load_metadata_graph("../metadata/graph.json", "University")
graph = pydough.active_session.metadata
print(pydough.explain_structure(graph))

Structure of PyDough graph: University

  Courses
  ├── course_id
  ├── course_name
  ├── enrollments [multiple Enrollments] (reverse of Enrollments.course)
  └── students [multiple Students] (reverse of Students.courses)

  Departments
  ├── department_id
  ├── department_name
  ├── founded
  └── professors [multiple Professors] (reverse of Professors.department)

  Enrollments
  ├── course_id
  ├── student_id
  ├── course [one member of Courses] (reverse of Courses.enrollments)
  └── student [one member of Students] (reverse of Students.enrollments)

  ProfessorOffices
  ├── building
  ├── office_number
  ├── professor_id
  └── professor [one member of Professors] (reverse of Professors.office)

  Professors
  ├── department_id
  ├── is_active
  ├── name
  ├── professor_id
  ├── department [one member of Departments] (reverse of Departments.professors)
  └── office [one member of ProfessorOffices] (reverse of ProfessorOffices.professor)

  Students
  ├── name
  ├── student_id
  ├── c

With this structure, the University Knowledge Graph in PyDough is now fully defined, incorporating one-to-one, one-to-many, and many-to-many relationships. Each entity and its relationships have been thoroughly validated, ensuring correctness and consistency.

Validation Checklist:
- Unique Identifiers: Every collection has correctly assigned primary keys, ensuring data integrity.
- Simple Relationships: All one-to-one and one-to-many relationships have been established in the correct direction, avoiding redundancy.
- Many-to-Many Relationships: Enrollments (Students ↔ Courses) and Study Partnerships (Students ↔ Students) correctly model many-to-many relationships using an intermediate table with compound relationships where applicable.
- Reverse Relationships: All reverse relationship names are distinct from their corresponding properties, avoiding conflicts.
- Correct Use of singular & no_collisions: Each relationship correctly defines whether it is one-to-one, one-to-many, or many-to-many ensuring accurate data modeling.