# Subqueries and Query Patterns

**Subqueries** are queries nested inside other queries. In DataJoint, subqueries emerge naturally when you use query expressions as restriction conditions. This chapter explores common patterns for answering complex questions using composed queries.

## Understanding Subqueries in DataJoint

In DataJoint, you create subqueries by using one query expression to restrict another. The restriction operator (`&` or `-`) accepts query expressions as conditions, effectively creating a semijoin or antijoin.

### Basic Concept

```python
# Outer query restricted by inner query (subquery)
result = OuterTable & InnerQuery
```

The `InnerQuery` acts as a subquery—its primary key values determine which rows from `OuterTable` are included in the result.

## Pattern 1: Existence Check (IN)

Find entities that have related records in another table.

### Pattern

```python
# Find A where matching B exists
result = A & B
```

### Example: Students with Enrollments

```python
# Find all students who are enrolled in at least one course
enrolled_students = Student & Enroll
```

**SQL Equivalent**:
```sql
SELECT * FROM student
WHERE student_id IN (SELECT student_id FROM enroll);
```

### Example: Students with Math Majors

```python
# Find students majoring in math
math_students = Student & (StudentMajor & {'dept': 'MATH'})
```

**SQL Equivalent**:
```sql
SELECT * FROM student
WHERE student_id IN (
    SELECT student_id FROM student_major WHERE dept = 'MATH'
);
```

## Pattern 2: Non-Existence Check (NOT IN)

Find entities that do NOT have related records in another table.

### Pattern

```python
# Find A where no matching B exists
result = A - B
```

### Example: Students Without Enrollments

```python
# Find students who are not enrolled in any course
unenrolled_students = Student - Enroll
```

**SQL Equivalent**:
```sql
SELECT * FROM student
WHERE student_id NOT IN (SELECT student_id FROM enroll);
```

### Example: Students Without Math Courses

```python
# Find students who have never taken a math course
no_math_students = Student - (Enroll & {'dept': 'MATH'})
```

**SQL Equivalent**:
```sql
SELECT * FROM student
WHERE student_id NOT IN (
    SELECT student_id FROM enroll WHERE dept = 'MATH'
);
```

## Pattern 3: Multiple Conditions (AND)

Find entities that satisfy multiple conditions simultaneously.

### Pattern

```python
# Find A where both B1 and B2 conditions are met
result = (A & B1) & B2
# Or equivalently
result = A & B1 & B2
```

### Example: Students Speaking Both Languages

```python
# Find people who speak BOTH English AND Spanish
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
bilingual = english_speakers & spanish_speakers
```

**SQL Equivalent**:
```sql
SELECT * FROM person
WHERE person_id IN (
    SELECT person_id FROM fluency WHERE lang_code = 'en'
)
AND person_id IN (
    SELECT person_id FROM fluency WHERE lang_code = 'es'
);
```

### Example: Students with Major AND Current Enrollment

```python
# Find students who have declared a major AND are enrolled this term
active_declared = (Student & StudentMajor) & (Enroll & CurrentTerm)
```

## Pattern 4: Either/Or Conditions (OR)

Find entities that satisfy at least one of multiple conditions.

### Pattern Using List Restriction

For simple OR on the same attribute:

```python
# Find A where condition1 OR condition2
result = A & [condition1, condition2]
```

### Pattern Using Union

For OR across different relationships:

```python
# Find A where B1 OR B2 condition is met
result = (A & B1) + (A & B2)
```

### Example: Students in Multiple States

```python
# Find students from California OR New York (simple OR)
coastal_students = Student & [{'home_state': 'CA'}, {'home_state': 'NY'}]

# Or using SQL syntax
coastal_students = Student & 'home_state IN ("CA", "NY")'
```

### Example: Students Speaking Either Language

```python
# Find people who speak English OR Spanish (cross-relationship OR)
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
either_language = english_speakers + spanish_speakers
```

## Pattern 5: Exclusion with Condition

Find entities that have some relationship but NOT a specific variant of it.

### Pattern

```python
# Find A where B exists but B with specific condition does not
result = (A & B) - (B & specific_condition)
```

### Example: Non-Fluent Speakers

```python
# Find people who speak Japanese but are NOT fluent
japanese_speakers = Person & (Fluency & {'lang_code': 'ja'})
fluent_japanese = Person & (Fluency & {'lang_code': 'ja', 'fluency_level': 'fluent'})
non_fluent_japanese = japanese_speakers - fluent_japanese
```

**SQL Equivalent**:
```sql
SELECT * FROM person
WHERE person_id IN (
    SELECT person_id FROM fluency WHERE lang_code = 'ja'
)
AND person_id NOT IN (
    SELECT person_id FROM fluency 
    WHERE lang_code = 'ja' AND fluency_level = 'fluent'
);
```

### Example: Students with Incomplete Grades

```python
# Find students enrolled in current term without grades yet
currently_enrolled = Student & (Enroll & CurrentTerm)
graded_this_term = Student & (Grade & CurrentTerm)
awaiting_grades = currently_enrolled - graded_this_term
```

## Pattern 6: All-or-Nothing (Universal Quantification)

Find entities where ALL related records meet a condition, or where NO related records fail a condition.

### Pattern: All Match

```python
# Find A where ALL related B satisfy condition
# Equivalent to: A with B, minus A with B that doesn't satisfy condition
result = (A & B) - (B - condition)
```

### Example: All-A Students

```python
# Find students who have received ONLY 'A' grades (no non-A grades)
students_with_grades = Student & Grade
students_with_non_a = Student & (Grade - {'grade': 'A'})
all_a_students = students_with_grades - students_with_non_a
```

**SQL Equivalent**:
```sql
SELECT * FROM student
WHERE student_id IN (SELECT student_id FROM grade)
AND student_id NOT IN (
    SELECT student_id FROM grade WHERE grade <> 'A'
);
```

### Example: Languages with Only Fluent Speakers

```python
# Find languages where all speakers are fluent (no non-fluent speakers)
languages_with_speakers = Language & Fluency
languages_with_non_fluent = Language & (Fluency - {'fluency_level': 'fluent'})
all_fluent_languages = languages_with_speakers - languages_with_non_fluent
```

## Pattern 7: Reverse Perspective

Sometimes you need to flip the perspective—instead of asking about entities, ask about their related entities.

### Example: Languages Without Speakers

```python
# Find languages that no one speaks
languages_spoken = Language & Fluency
unspoken_languages = Language - languages_spoken
```

### Example: Courses Without Enrollments

```python
# Find courses with no students enrolled this term
courses_with_enrollment = Course & (Enroll & CurrentTerm)
empty_courses = Course - courses_with_enrollment
```

### Example: Departments Without Majors

```python
# Find departments that have no declared majors
departments_with_majors = Department & StudentMajor
departments_without_majors = Department - departments_with_majors
```

## Examples from the University Database

### Example 1: Students with Ungraded Enrollments

Find students enrolled in the current term who haven't received grades yet:

```python
# Students enrolled this term
enrolled_current = Student & (Enroll & CurrentTerm)

# Students with grades this term
graded_current = Student & (Grade & CurrentTerm)

# Students awaiting grades
awaiting_grades = enrolled_current - graded_current
```

### Example 2: Students in Specific Courses

```python
# Students enrolled in Introduction to CS (CS 1410)
cs_intro_students = Student & (Enroll & {'dept': 'CS', 'course': 1410})

# Students who have taken both CS 1410 and CS 2420
cs_1410 = Student & (Enroll & {'dept': 'CS', 'course': 1410})
cs_2420 = Student & (Enroll & {'dept': 'CS', 'course': 2420})
both_courses = cs_1410 & cs_2420
```

### Example 3: High-Performing Students

```python
# Students with only A or B grades (no C or below)
students_with_grades = Student & Grade
students_with_low_grades = Student & (Grade & 'grade NOT IN ("A", "B")')
honor_roll = students_with_grades - students_with_low_grades
```

## Self-Referencing Patterns

Some tables reference themselves through foreign keys, creating hierarchies like management structures or prerequisite chains.

### Management Hierarchy Example

Consider a schema where employees can report to other employees:

```python
@schema
class Employee(dj.Manual):
    definition = """
    employee_id : int
    ---
    name : varchar(60)
    """

@schema
class ReportsTo(dj.Manual):
    definition = """
    -> Employee
    ---
    -> Employee.proj(manager_id='employee_id')
    """
```

### Finding Managers

```python
# Employees who have direct reports (are managers)
managers = Employee & ReportsTo.proj(employee_id='manager_id')
```

### Finding Top-Level Managers

```python
# Employees who don't report to anyone
top_managers = Employee - ReportsTo
```

### Finding Non-Managers

```python
# Employees with no direct reports
non_managers = Employee - ReportsTo.proj(employee_id='manager_id')
```

## Building Queries Systematically

Complex queries are best built incrementally. Follow this approach:

### Step 1: Identify the Target Entity

What type of entity do you want in your result?

### Step 2: List the Conditions

What criteria must the entities satisfy?

### Step 3: Build Each Condition as a Query

Create separate query expressions for each condition.

### Step 4: Combine with Appropriate Operators

- Use `&` for AND conditions
- Use `-` for NOT conditions
- Use `+` for OR conditions across different paths

### Step 5: Test Incrementally

Verify each intermediate result.

### Example: Building a Complex Query

**Goal**: Find CS majors who are enrolled this term but haven't received any grades yet.

```python
# Step 1: Target entity is Student
# Step 2: Conditions:
#   - Has CS major
#   - Enrolled in current term
#   - No grades in current term

# Step 3: Build each condition
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
enrolled_current = Student & (Enroll & CurrentTerm)
graded_current = Student & (Grade & CurrentTerm)

# Step 4: Combine
result = cs_majors & enrolled_current - graded_current

# Step 5: Verify counts
print(f"CS majors: {len(cs_majors)}")
print(f"Enrolled current term: {len(enrolled_current)}")
print(f"CS majors enrolled, no grades: {len(result)}")
```

## Summary of Patterns

| Pattern | DataJoint | SQL Equivalent |
|---------|-----------|----------------|
| Existence (IN) | `A & B` | `WHERE id IN (SELECT ...)` |
| Non-existence (NOT IN) | `A - B` | `WHERE id NOT IN (SELECT ...)` |
| AND (both conditions) | `A & B1 & B2` | `WHERE ... AND ...` |
| OR (either condition) | `(A & B1) + (A & B2)` | `WHERE ... OR ...` |
| Exclusion | `(A & B) - B_condition` | `WHERE IN (...) AND NOT IN (...)` |
| Universal (all match) | `(A & B) - (B - condition)` | `WHERE IN (...) AND NOT IN (NOT condition)` |

Key principles:
1. **Build incrementally** — construct complex queries from simpler parts
2. **Test intermediate results** — verify each step before combining
3. **Think in sets** — restriction filters sets, not individual records
4. **Primary key is preserved** — restrictions never change the entity type

## Practice Exercises

### Exercise 1: Existence

**Task**: Find all departments that have at least one student major.

```python
active_departments = Department & StudentMajor
```

### Exercise 2: Non-Existence

**Task**: Find students who have never taken a biology course.

```python
no_bio = Student - (Enroll & {'dept': 'BIOL'})
```

### Exercise 3: AND Conditions

**Task**: Find students who major in MATH AND have taken at least one CS course.

```python
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
took_cs = Student & (Enroll & {'dept': 'CS'})
math_majors_with_cs = math_majors & took_cs
```

### Exercise 4: All-A Students

**Task**: Find students who have received only 'A' grades.

```python
has_grades = Student & Grade
has_non_a = Student & (Grade - {'grade': 'A'})
all_a = has_grades - has_non_a
```

### Exercise 5: Complex Query

**Task**: Find departments where all students have a GPA above 3.0.

```python
# Students with GPA (computed via aggregation)
student_gpa = Student.aggr(
    Course * Grade * LetterGrade,
    gpa='SUM(points * credits) / SUM(credits)'
)

# Students with low GPA
low_gpa_students = student_gpa & 'gpa < 3.0'

# Departments with low-GPA students
depts_with_low_gpa = Department & (StudentMajor & low_gpa_students)

# Departments where all students have GPA >= 3.0
all_high_gpa_depts = (Department & StudentMajor) - depts_with_low_gpa
```

:::{seealso}
For more subquery examples, see the [University Queries](../80-examples/016-university-queries.ipynb) example.
:::