# Query Operators: The Foundation of Database Queries

**Relational Algebra** is the formal mathematical foundation for database querying. It provides a systematic way to manipulate data using a small set of fundamental operations. Understanding these operators is essential for mastering database queries in both DataJoint and SQL.

## What is Relational Algebra?

Relational algebra is a **formal language** for representing and manipulating relational data. It treats data as mathematical relations (tables) and provides operations that transform these relations into new relations. This mathematical foundation ensures that database queries are:

- **Precise**: Each operation has a well-defined mathematical meaning
- **Composable**: Operations can be chained together systematically  
- **Optimizable**: The mathematical properties enable query optimization
- **Universal**: The same concepts apply across different database systems

## The Core Query Operators

From the lecture transcript, we focus on the fundamental operations that form the building blocks of all database queries:

### 1. **Restriction** (`&` in DataJoint, `WHERE` in SQL)
**Purpose**: Selects rows that satisfy specific conditions
**Mathematical concept**: Filters tuples based on predicates
**Key insight**: Restriction never changes the primary key or entity type

```python
# DataJoint
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})

# SQL equivalent
SELECT DISTINCT p.*
FROM person p
WHERE p.person_id IN (
    SELECT f.person_id FROM fluency f WHERE f.lang_code = 'ENG'
);
```

### 2. **Projection** (`.proj()` in DataJoint, `SELECT` in SQL)
**Purpose**: Manipulates columns by selecting, renaming, or computing new attributes
**Mathematical concept**: Transforms attributes while preserving entity integrity
**Key insight**: Primary key is always preserved, enabling algebraic closure

```python
# DataJoint
people_with_age = Person.proj(..., age='TIMESTAMPDIFF(YEAR, date_of_birth, NOW())')

# SQL equivalent
SELECT person_id, name, date_of_birth,
       TIMESTAMPDIFF(YEAR, date_of_birth, NOW()) AS age
FROM person;
```

### 3. **Join** (implicit in DataJoint, `JOIN` in SQL)
**Purpose**: Combines related data from multiple tables
**Mathematical concept**: Creates Cartesian products filtered by matching conditions
**Key insight**: Foreign key relationships enable automatic joins

```python
# DataJoint (automatic join via foreign keys)
person_fluency = Person * Fluency

# SQL equivalent
SELECT p.*, f.*
FROM person p
JOIN fluency f ON p.person_id = f.person_id;
```

### 4. **Union** (`+` in DataJoint, `UNION` in SQL)
**Purpose**: Combines tuples from compatible relations
**Mathematical concept**: Set union operation
**Key insight**: Relations must have compatible schemas

```python
# DataJoint
all_speakers = english_speakers + spanish_speakers

# SQL equivalent
SELECT * FROM english_speakers
UNION
SELECT * FROM spanish_speakers;
```

### 5. **Aggregation** (`.aggr()` in DataJoint, `GROUP BY` in SQL)
**Purpose**: Summarizes data using functions like COUNT, SUM, AVG
**Mathematical concept**: Groups tuples and applies aggregate functions
**Key insight**: Reduces multiple tuples to summary statistics

```python
# DataJoint
language_counts = Fluency.aggr(language_count='COUNT(*)', group_by='lang_code')

# SQL equivalent
SELECT lang_code, COUNT(*) as language_count
FROM fluency
GROUP BY lang_code;
```

## Algebraic Closure: The Key Principle

**Algebraic closure** is the fundamental property that makes relational algebra powerful. It means:

> **The result of any relational operation is itself a relation**

This property enables:

### 1. **Composition of Operations**
```python
# Each step produces a relation that can be used in the next step
step1 = Person                                    # Relation
step2 = step1 & 'date_of_birth > "1990-01-01"'   # Still a relation
step3 = step2.proj(age='TIMESTAMPDIFF(...)')     # Still a relation
step4 = step3 & 'age > 25'                       # Still a relation
```

### 2. **Infinite Chaining**
```python
# You can chain operations indefinitely
complex_query = Person & condition1 & condition2 & condition3.proj(...).aggr(...)
```

### 3. **Consistent Entity Types**
```python
# All of these are still "Person" relations
original = Person
filtered = Person & condition
projected = Person.proj(...)
computed = Person.proj(age='TIMESTAMPDIFF(...)')
```

## Entity Integrity: The Foundation

From the lecture, **entity integrity** is crucial for understanding query operators:

### The Three Questions of Entity Integrity
When designing queries, you must be able to answer:

1. **How do I prevent duplicate records?** - Ensure unique representation
2. **How do I prevent entities sharing the same record?** - Maintain 1:1 correspondence  
3. **How do I match entities?** - Find corresponding records

### Primary Key Preservation
**Critical principle**: Query operators preserve entity integrity by maintaining primary keys:

```python
# Restriction preserves primary key
people = Person                                    # Primary key: person_id
adults = Person & 'age > 18'                      # Primary key: person_id

# Projection preserves primary key  
names = Person.proj('name')                       # Primary key: person_id
ages = Person.proj(age='TIMESTAMPDIFF(...)')     # Primary key: person_id
```

## Order of Operations: Critical Understanding

The lecture emphasized that **order matters** in query construction:

### DataJoint Approach (Separate Operations)
```python
# Clear separation of concerns
people = Person & 'date_of_birth > "1990-01-01"'     # Restriction first
people_with_age = people.proj(age='TIMESTAMPDIFF(...)')  # Projection second
adults = people_with_age & 'age > 25'                 # Restriction using computed attribute
```

### SQL Approach (Combined Operations)
```sql
-- WHERE executes before SELECT internally
SELECT name, TIMESTAMPDIFF(YEAR, date_of_birth, NOW()) AS age
FROM person
WHERE date_of_birth > '1990-01-01'    -- Can use original columns
  AND TIMESTAMPDIFF(YEAR, date_of_birth, NOW()) > 25;  -- Must repeat calculation
```

### Common Mistake
```python
# WRONG: Trying to use computed attribute before it exists
# Person & 'age > 25'.proj(age='TIMESTAMPDIFF(...)')

# CORRECT: Compute attribute first, then use it
people_with_age = Person.proj(age='TIMESTAMPDIFF(...)')
adults = people_with_age & 'age > 25'
```

## Working with the Languages Database

Using the lecture's main example to demonstrate operators:

```python
import datajoint as dj
schema = dj.Schema('languages_demo')

@schema
class Person(dj.Manual):
    definition = """
    person_id : int
    ---
    name : varchar(60)
    date_of_birth : date
    """

@schema
class Language(dj.Lookup):
    definition = """
    lang_code : char(4)
    ---
    language : varchar(30)
    """
    contents = [
        ('ENG', 'English'),
        ('SPA', 'Spanish'),
        ('JPN', 'Japanese')
    ]

@schema
class Fluency(dj.Manual):
    definition = """
    -> Person
    -> Language
    ---
    fluency_level : enum('beginner', 'intermediate', 'fluent')
    """
```

### Example: Systematic Query Building

```python
# Step 1: Restriction - Find English speakers
english_speakers = Person & (Fluency & {'lang_code': 'ENG'})

# Step 2: Projection - Add age calculation
english_speakers_with_age = english_speakers.proj(
    'name',
    age='TIMESTAMPDIFF(YEAR, date_of_birth, NOW())'
)

# Step 3: Restriction - Find adults
adult_english_speakers = english_speakers_with_age & 'age >= 18'

# Step 4: Projection - Clean output
result = adult_english_speakers.proj('name', 'age')
```

## Operator Relationships and Dependencies

Understanding how operators work together:

### 1. **Restriction + Projection**
```python
# Restrict first, then project
filtered = Person & condition
projected = filtered.proj(...)

# Or chain together
result = Person & condition.proj(...)
```

### 2. **Projection + Restriction** 
```python
# Project first to create computed attributes, then restrict
with_computed = Person.proj(age='TIMESTAMPDIFF(...)')
filtered = with_computed & 'age > 25'
```

### 3. **Join + Restriction**
```python
# Join tables, then restrict
joined = Person * Fluency
filtered = joined & {'lang_code': 'ENG'}
```

## SQL Translation Patterns

Every DataJoint operator has systematic SQL equivalents:

| DataJoint Pattern | SQL Equivalent |
|-------------------|----------------|
| `Table & condition` | `SELECT * FROM table WHERE condition` |
| `Table.proj('col')` | `SELECT primary_key, col FROM table` |
| `Table.proj(computed='expr')` | `SELECT primary_key, expr AS computed FROM table` |
| `Table1 * Table2` | `SELECT * FROM table1 JOIN table2 ON ...` |
| `Table1 + Table2` | `SELECT * FROM table1 UNION SELECT * FROM table2` |
| `Table.aggr(count='COUNT(*)')` | `SELECT COUNT(*) as count FROM table` |

## Best Practices from the Lecture

### 1. **Think Systematically**
- Start with the simplest operation
- Build complexity incrementally
- Verify each step before proceeding

### 2. **Understand Entity Integrity**
- Always consider primary key preservation
- Maintain 1:1 correspondence between records and entities
- Use foreign key relationships for joins

### 3. **Master Order of Operations**
- Compute attributes before using them in restrictions
- Understand when attributes are available
- Test your logic incrementally

### 4. **Use Algebraic Closure**
- Chain operations naturally
- Remember that each result is a valid relation
- Build complex queries from simple building blocks

## Summary

Query operators form the mathematical foundation of database querying. Key takeaways:

1. **Relational algebra** provides precise, mathematical operations on data
2. **Algebraic closure** enables composition of complex queries from simple operations
3. **Entity integrity** ensures consistent representation of real-world entities
4. **Order of operations** determines when attributes are available for use
5. **Systematic thinking** leads to correct, maintainable queries

Master these operators and you can express any database query as a composition of fundamental operations. The mathematical foundation ensures your queries are correct, optimizable, and maintainable.