- DBMS has a query processor which includes a query evaluation engine
- the query processor translates every SQL query submitted by an application to a concrete evaluation plan
- the evaluation engine then executues the plan by interacting with the storage manager layer
- the storage manager is responsible for fetching rows from tables, inserting and updating rows, enforcing integrity constraints, and performing concurrency control and managing transactions

<img src="img/Snip20191104_114.png"/>

# Query Processing

<img src="img/Snip20191104_115.png" width=80%/>

1. Parsing and translation
2. Optimization
3. Evaluation

# Query Optimization and Evaluation

- one relational algebra expression may have many equivalent expressions
- each relational algebra operation can be evaluated using one of several different algorithms
    - there are many ways to evaluate a given relational algebra expression
- an **evaluation plan** is an annotated expression specifying the detailed evaluation strategy for a given query
    - e.g., join instructor with teaches first, then join with course
    - e.g., use an index on salary to find instructors with salary < 75000
- optimizing a query entails reasoning about the following
    - equivalent relational algebra expressions for the query
    - possible evaluation plans for each candidate RA expression
    - cost of each candidate evaluation plan
- assuming that we can settle on a meaningful definition of *cost*, optimizing a query entails finding the equivalent RA expression and evaluation plan that *minimize the cost*
- **Cost** is **estimated** in practice using statistical information recorded in the system catalog, examples:
    - \# of tuples in a table, # of entries in an index
    - size of each tuple or index entry, height of a B+tree index
    - **cadinality** of an attribute: # distinct attribute values stored
    - **selectivity** of an index: cadinality of indexed attribute(s) / total # of index entries



# Engineer's Dilemma

- adding indexes can speed up some queries drastically
- indexing slows down insertions and updates
- looking up an index repeatedly during a join is not necessarily faster than scanning the inner relation
    - indexed nested loops joins vs block nested loops join
- DBMS may create the most important indexes automatically and it may not offer many choices of index structure 
- it is difficult to outsmart a good query optimizer, which has access to detailed statistics about tables and uses elaborate optimization algorithms

## Properly Designing Good Physical Schemas
- using a basic understanding of query evaluation, making an educated guess as to what index or indexes might benefit the most important queries
- inspect the evaluation plan for a given query to understand how an index is used by the query optimizer
- determining the impact of adding an index by measuring performancne differences empirically


# Index Extensions

- **index extension**: when the DBMS automatically appends the primary key to each secondary index enrty
    - example: a secondary index on department.building has entries of the form <building, dept_name>
- advantages
    - make it possible to physically relocate table rows without having to update all secondary indexes
    - enable efficient evaluation of queries that refer to both the primary key and non-key attributes
- disadvantages
    - retrieving a department record given the building requires an additional primary index lookup
    - the secondary index becomes larger and less likely to remain buffered in main memory


# Covering Indexes

- **Covered Query**: a query that can evaluated using indexes only, without accessing the tables
- **Covering Index** (with respect to a covered query): an index that is used to evaluate the covered query

- example: `SELECT ID FROM instructor WHERE dept_name = "Finance" AND salary < 80000`
    - suppose that a secondary B+tree index is defined on <dept_name, salary>, due to the extension, the entries of this index are triples of the form (dept_name, salary, ID), such an index is a covering index for the above query



# Index Size

- the smaller the index, the more likely it is to remain buffered in main memory and the lower the cost of accessing that index
- techinques to reduce the size of an index
    - *shorten the primary key*
        - instead of making a VARCHAR the primary key, create a shorter fixed-length **surrogate key** by adding an auto-increment ID attribute to department (replace VARCHART with integers)
        - if index extension is used, this optimization also benefits all secondary indexes on department
    - **index only a prefix of a column**
        - instead of indexing the VARCHAR attribute in full, index only the first n characters
        