# Query performance
#### Execution plan, role of index, query optimizer
#### 'with', 'windows' and 'full text index'
#### Misc and new solutions to 5.4 and 5.5

In [1]:
!docker container ls -a

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                               NAMES
1df11d80e18a        mysql               "docker-entrypoint.s…"   7 hours ago         Up 7 hours          33060/tcp, 0.0.0.0:3366->3306/tcp   inject_mysql
2cb3e46e87ce        mysql               "docker-entrypoint.s…"   45 hours ago        Up 45 hours         0.0.0.0:3306->3306/tcp, 33060/tcp   my_mysql


In [2]:
%%bash
docker run \
--rm \
--name my_mysql \
-v $(pwd)/mysql_databasefiles:/var/lib/mysql \
-v $(pwd)/mysql_databasefiles/xmlimport/xmlimport.cnf:/etc/mysql/conf.d/xmlimport.cnf \
-p 3306:3306 \
-e MYSQL_ROOT_PASSWORD=deterentysker!42snapsnap \
-d \
mysql
echo "MySQLRunning"

2cb3e46e87cec115ffe1bcdef5a21463fa4848cb6fadef21e8c834b167c7b4b1
MySQLRunning


In [2]:
import sys
import mysql.connector

def rootconnect():
    try:
        pw = 'deterentysker!42snapsnap'
        conn = mysql.connector.connect( host='localhost', database='coffeeflow',user='root', password=pw)
        conn.autocommit = True
        return conn;
    except Exception as ex:
        print(str(ex), file=sys.stderr)
    

conn = rootconnect()

def sqlQuery(sqlString):
    global conn
    try:
        if not conn.is_connected():
            conn = rootconnect()
        cursor = conn.cursor()
        cursor.execute(sqlString)
        res = cursor.fetchall()
        return res
    except Exception as ex:
        print(str(ex), file=sys.stderr)
    finally:    
        cursor.close()

def sqlDo(sqlString):
    global conn
    try:
        if not conn.is_connected():
            conn = rootconnect()
        cursor = conn.cursor()
        cursor.execute(sqlString)
        res = cursor.fetchwarnings()
        return res
    except Exception as ex:
        print(str(ex), file=sys.stderr)
    finally:    
        cursor.close()

"Done"    

'Done'

# 'WITH' to avoid sub-queries

### Number of orders by city of office

```mysql
with bigtable as (
  select offices.city, orders.orderNumber
  from offices, orders, customers, employees
  where orders.customerNumber = customers.customerNumber and
    customers.salesRepEmployeeNumber = employees.employeeNumber and
    employees.officeCode = offices.officeCode
)
select city, count(orderNumber) as total
from bigtable
group by city
order by total
```


### You can have multiple variable names
```mysql
WITH
  cte1 AS (SELECT a, b FROM table1),
  cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;
```

More info on [WITH Syntax (Common Table Expressions)](https://dev.mysql.com/doc/refman/8.0/en/with.html)

### Subquery style (tx David)
```mysql
  SELECT posts.Id, 
    JSON_OBJECT('userName', users.DisplayName, 
        'text', posts.Body, 'score', posts.Score, 
        'answers', JSON_ARRAYAGG(posts_answers.Answer)) AS answer_data
  FROM posts
  INNER JOIN users ON users.Id = posts.OwnerUserId
  INNER JOIN 
          (SELECT posts.ParentId, 
              JSON_OBJECT('userName', users.DisplayName, 
              'text', posts.Body, 'score', posts.Score) AS Answer
            FROM posts
            INNER JOIN users ON users.Id = posts.OwnerUserId
            WHERE posts.PostTypeId = 2) as posts_answers
      ON posts.Id = posts_answers.ParentId
	WHERE posts.PostTypeId = 1
	GROUP BY posts.Id;
```


## Your turn - Rewrite Davids query using 'with'-style

In [5]:
# Exercise 4
# Make a materialized view that has json objects with questions and its answeres, but no comments. 
# Both the question and each of the answers must have the display name of the user, the text body, and the score.
sqlDo("""
DROP VIEW IF EXISTS view_questions_and_answers;
CREATE VIEW view_questions_and_answers AS
with 
    posts_with_usernames as 
        (select posts.Id, PostTypeId, Score, DisplayName, ParentId, Body
            from posts, users
            where posts.OwnerUserId = users.Id and posts.PostTypeId in (1,2)),
    questions as (select * from posts_with_usernames where PostTypeId = 1),
    answers as (select * from posts_with_usernames where PostTypeId = 2),
    q_and_a as (select
        questions.Id as id,
        questions.DisplayName as qName, 
        questions.Score as qScore,
        questions.Body as qBody,
        answers.DisplayName as aName,
        answers.Score as aScore,
        answers.Body as aBody
        from questions inner join answers on questions.id = answers.ParentId)
select id, 
    JSON_OBJECT('name', qName, 'score', qScore, 'body', qBody) as question,
    json_arrayagg(JSON_OBJECT('name', aName, 'score', aScore, 'body', aBody)) as answer
from q_and_a
group by id;
""")

In [11]:
sqlQuery("select * from view_questions_and_answers limit 1")
"Ugly when printed"

'Ugly when printed'

# Query performance & explain

Consider this query:
```mysql
SELECT Id, Reputation, DisplayName, UpVotes FROM users where UpVotes>50;
```


### Open it in mySQLWorkbench

run it, and select "execution plan"

![](images/ExecutionPlan.png)

```mysql
SELECT Id, Reputation, DisplayName, UpVotes FROM users where UpVotes>50;
```

![](images/SimplePlan.png)

## A larger query
From last time...

```mysql
select
    customers.customerNumber,
    customers.customerName,
	SUM(quantityOrdered * priceEach) AS totalPrice,
	SUM(payments.amount) AS totalPaid
from customers
inner join orders on orders.customerNumber = customers.customerNumber
inner join orderdetails on orders.orderNumber = orderdetails.orderNumber
inner join payments on payments.customerNumber = customers.customerNumber
group by customers.customerNumber
```

# Your turn

### Try to change the order of the joins, and see how that change the plan
### Try to change the order of the selects, and see how that change the plan
### Try to change the 'on' clause by adding 1 to each side `... +1 = 1+...`

### What did we learn?

### Try to change the order of the joins, and see how that change the plan
### Try to change the order of the selects, and see how that change the plan
### Try to change the 'on' clause by adding 1 to each side `... +1 = 1+...`

# Back to Exercise 5.4 - Materialized view - two approaches

In [49]:
# Question 4
sqlDo("""
drop procedure if exists updateQuestionAndAnswer;
create procedure updateQuestionAndAnswer()
BEGIN
    drop table if exists materialized_view_questions_and_answers;
    CREATE TABLE materialized_view_questions_and_answers
    AS SELECT * FROM view_questions_and_answers;
END;
call updateQuestionAndAnswer();
""")

In [13]:
# An other implementation
sqlDo("""
drop procedure if exists updateQuestionAndAnswer2;
create procedure updateQuestionAndAnswer2()
BEGIN
    REPLACE INTO materialized_view_questions_and_answers 
    SELECT * FROM view_questions_and_answers;
END;
""")

# Which is the fastest?
Try in workbench

## Exercise 5.5 json_table

> Using the materialized view from exercise 4, create a stored procedure with one parameter keyword, which returns all posts where the keyword appears at least once, and where at least two comments mention the keyword as well.

In [50]:
# Question 5
sqlDo("""
drop procedure if exists findPostsAndCommentsByKey;

create procedure findPostsAndCommentsByKey(in keyword varchar(100))
begin
declare key_rx varchar(100);
set key_rx = concat('.{0,5}',keyword,'.{0,5}');
select id,
       count(rowid) as count,
       regexp_substr(any_value(question), key_rx)  as Question,
       group_concat(regexp_substr(answer, key_rx) separator ' / ')  as Answer
from materialized_view_questions_and_answers,
     json_table(answer,'$[*]'
       columns(  rowid for ordinality , ans json path '$.body')) as answers
where ans regexp key_rx and (question->'$.body' regexp key_rx)
group by id
having count>=2
order by count desc, id asc;
end
""") 

In [63]:
sqlQuery("""
call findPostsAndCommentsByKey('machine')
""")[0:4]

[(2508,
  7,
  'esso machine that',
  'p>My machine work / p>My machine work / p>My machine work / p>My machine work / p>My machine work / p>My machine work / p>My machine work'),
 (3408,
  4,
  'esso machine</a>.',
  'this machine and  / this machine and  / this machine and  / this machine and '),
 (2766,
  3,
  'esso machines a w',
  'esso machines.</p / esso machines.</p / esso machines.</p'),
 (3412,
  3,
  ' new machine).  G',
  ' the machine and  /  the machine and  /  the machine and ')]

# Functional dependency
### Notice `any_value(question)` in:
```mysql
select id,
       count(rowid) as count,
       regexp_substr(any_value(question), key_rx)  as Question,
       group_concat(regexp_substr(answer, key_rx) separator ' / ')  as Answer
from materialized_view_questions_and_answers,
     json_table(answer,'$[*]'
       columns(  rowid for ordinality , ans json path '$.body')) as answers
where ans regexp key_rx and (question->'$.body' regexp key_rx)
group by id
having count>=2
order by count desc, id asc;
```

# Removing it gives an error when executed
## Why?
## What can be done to solve it?


# Your turn - fix it

# Windowing functions

## In particular useful for producing data for analysis

```mysql
SELECT 
	users.DisplayName as Name, 
    posts.Title as Title,
    count(*) OVER (PARTITION BY posts.OwnerUserId) as totalPosts
FROM posts inner join users on posts.OwnerUserId = users.Id
where posts.PostTypeId=1;
```

(see in mySQLWorkbench)

# Larger query
```mysql
SELECT 
    users.DisplayName as Name, 
    posts.Title as Title,
    year(posts.CreationDate) as Year,
    month(posts.CreationDate) as Month,
    count(*) OVER (PARTITION BY year(posts.CreationDate) ) as postInYear,
    count(*) OVER (PARTITION BY month(posts.CreationDate) ) as postInMonth,
    count(*) OVER (PARTITION BY users.Id ) as postByUser
FROM posts inner join users on posts.OwnerUserId = users.Id
where posts.PostTypeId=1
order by posts.CreationDate;
```

# Running aggregation

```mysql
SELECT
	users.DisplayName as Name, 
    posts.Title as Title,
    posts.Score,
    year(posts.CreationDate) as Year,
    monthname(posts.CreationDate) as Month,
    avg(Score) OVER (PARTITION BY users.DisplayName order by posts.CreationDate) as avgScoreByNow
FROM posts inner join users on posts.OwnerUserId = users.Id
where posts.PostTypeId=1
order by posts.CreationDate;
```



# Activity 
```mysql
SELECT
	users.DisplayName as Name, 
    posts.Title as Title,
    posts.Score,
    year(posts.CreationDate) as Year,
    monthname(posts.CreationDate) as Month,
    datediff( posts.CreationDate, first_value( posts.CreationDate) 
		OVER (PARTITION by users.DisplayName order by posts.CreationDate)) as DaysSinceFirst
FROM posts inner join users on posts.OwnerUserId = users.Id
where posts.PostTypeId=1
order by users.DisplayName, posts.CreationDate;
```

Notice that the window function is used inside a select field as a normal expression.

# Days between posts

```mysql
SELECT
	users.DisplayName as Name, 
    posts.Title as Title,
    posts.Score,
    date(posts.CreationDate) as Date,
    COALESCE(datediff( posts.CreationDate, lag( posts.CreationDate, 1) 
		OVER (PARTITION by users.DisplayName 
			  ORDER BY posts.CreationDate
              )),0) as DaysSinceFirst
FROM posts inner join users on posts.OwnerUserId = users.Id
where posts.PostTypeId=1
order by users.DisplayName, posts.CreationDate;
```

# Your turn

### 1. Compute running sum of payments for customers. Use the classicmodels table. Only use Norwegian customers

### 2. Compute running sum of orders for Norwegian customers

### 3. Compute running balance for Norwegian customers