# Lab 10: Indices & Optimisation II

### 2. Execution Plans

In [1]:
%load_ext sql
%config SqlMagic.displaylimit = 30
%sql postgresql+psycopg://bank:bank@postgres/bank_index

3. Consider two queries: one to obtain the accounts with a balance equal to â‚¬1000, and another to obtain the maximum balance.

4. Run the queries and note the time it takes the system to execute each command.

In [2]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT account_number FROM account WHERE balance = 1000;

QUERY PLAN
Seq Scan on account (cost=0.00..1931.00 rows=20 width=10) (actual time=0.201..12.735 rows=22 loops=1)
Filter: (balance = '1000'::numeric)
Rows Removed by Filter: 99978
Buffers: shared hit=681
Planning:
Buffers: shared hit=33
Planning Time: 0.116 ms
Execution Time: 12.746 ms


Note the time it takes the system to execute this command

In [3]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT MAX(balance) FROM account;

QUERY PLAN
Aggregate (cost=1931.00..1931.01 rows=1 width=32) (actual time=12.302..12.303 rows=1 loops=1)
Buffers: shared hit=681
-> Seq Scan on account (cost=0.00..1681.00 rows=100000 width=4) (actual time=0.003..4.475 rows=100000 loops=1)
Buffers: shared hit=681
Planning:
Buffers: shared hit=19
Planning Time: 0.109 ms
Execution Time: 12.338 ms


Note the time it takes the system to execute this command

5. Create an index for the balance column with the command:

In [4]:
%%sql

CREATE INDEX balance_idx ON account(balance);

What kind of index is it? Is this index primary or secondary? Why?

6. Repeat step 4 and note the time. For both queries, how do you explain the possible time difference?

In [5]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT account_number FROM account WHERE balance = 1000;

QUERY PLAN
Bitmap Heap Scan on account (cost=4.57..74.54 rows=20 width=10) (actual time=0.078..0.103 rows=22 loops=1)
Recheck Cond: (balance = '1000'::numeric)
Heap Blocks: exact=21
Buffers: shared hit=21 read=3
-> Bitmap Index Scan on balance_idx (cost=0.00..4.57 rows=20 width=0) (actual time=0.072..0.073 rows=22 loops=1)
Index Cond: (balance = '1000'::numeric)
Buffers: shared read=3
Planning:
Buffers: shared hit=19 read=1
Planning Time: 0.236 ms


In [6]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT MAX(balance) FROM account;

QUERY PLAN
Result (cost=0.45..0.46 rows=1 width=32) (actual time=0.090..0.091 rows=1 loops=1)
Buffers: shared hit=2 read=2
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.45 rows=1 width=4) (actual time=0.088..0.088 rows=1 loops=1)
Buffers: shared hit=2 read=2
-> Index Only Scan Backward using balance_idx on account (cost=0.42..2862.42 rows=100000 width=4) (actual time=0.087..0.087 rows=1 loops=1)
Index Cond: (balance IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=2 read=2
Planning Time: 0.086 ms


7. Delete the index created previously in step 5

In [7]:
%%sql

DROP INDEX balance_idx;

8. Create a HASH index for the balance column with the command:

In [8]:
%%sql

CREATE INDEX balance_idx ON account USING HASH(balance);

9. Repeat step 4 and note the time. How do you explain the possible time difference?

In [9]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT account_number FROM account WHERE balance = 1000;

QUERY PLAN
Bitmap Heap Scan on account (cost=4.16..74.12 rows=20 width=10) (actual time=0.014..0.037 rows=22 loops=1)
Recheck Cond: (balance = '1000'::numeric)
Heap Blocks: exact=21
Buffers: shared hit=23
-> Bitmap Index Scan on balance_idx (cost=0.00..4.15 rows=20 width=0) (actual time=0.009..0.009 rows=22 loops=1)
Index Cond: (balance = '1000'::numeric)
Buffers: shared hit=2
Planning:
Buffers: shared hit=17
Planning Time: 0.084 ms


In [10]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT MAX(balance) FROM account;

QUERY PLAN
Aggregate (cost=1931.00..1931.01 rows=1 width=32) (actual time=13.094..13.095 rows=1 loops=1)
Buffers: shared hit=681
-> Seq Scan on account (cost=0.00..1681.00 rows=100000 width=4) (actual time=0.006..4.550 rows=100000 loops=1)
Buffers: shared hit=681
Planning Time: 0.072 ms
Execution Time: 13.115 ms


10. Delete the index created in step 8:

In [11]:
%%sql

DROP INDEX balance_idx;

### 3. Query Optimisation

Given a table

In [None]:
%%sql

DROP TABLE IF EXISTS employee;
    
CREATE TABLE employee (
  eid INTEGER PRIMARY KEY, 
  ename VARCHAR(40) NOT NULL,
  address VARCHAR(255) NOT NULL, 
  salary NUMERIC(12,4) NOT NULL, 
  bdate DATE NOT NULL
);

Which indexes can you create to improve the efficiency of the execution of each of the following queries (supposing that each of them is quite common).
_Tip_: Consider writing down the SQL query and then analysing which indices would be more advantageous.  

a) What is the identifier, ename, and address of employees born within a certain range of dates?

In [None]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT eid, ename, address, age(bdate) FROM employee
    WHERE bdate BETWEEN '1988-01-01' AND '1993-12-31';

In [None]:
%%sql

CREATE INDEX bdate_idx ON employee USING BTREE(bdate);

In [None]:
%%sql

EXPLAIN (ANALYZE, BUFFERS)
    SELECT eid, ename, address, age(bdate) FROM employee
    WHERE bdate BETWEEN '1988-01-01' AND '1993-12-31';

In [None]:
%%sql

DROP INDEX bdate_idx;

b) What is the identifier and address of employees with a given name?

c) What is the maximum salary for employees?

d) What is the average salary of employees by age?