# SQL Queries and Calculations
### Working with a database in SQL

## Database Queries and Calculations

### Introduction

We want you to use the data in the database provided to determine which client of ours is the most profitable, and we want to make recommendations which can increase our bottom-line to each of our clients. 

`I.e. we want to identify: client spends, and which times of the day, as well as which advert types are the most profitable. With this, we can make analytical assessments and data-driven recommendations to our client to increase our client's bottom-line`.




### Learning Objectives

In this section we will introduce you to: 
- Basic SQL queries
- Aggregrations
- Sorting and filtering
- Analytical thinking  

### Background

TV advertising is a highly effective way to spread branding, and awareness of a product. Even to create direct responses through some kind of CTA (call to action), e.g. _"SMS your name to 34445 now for a quote."_.


<div> <img src="big_sale.png" width="800"> </div>

Well... that's _advertising_ in general I suppose. TV followed radio, which outpaced print in achieving wide-spread engagement, and even in the face of digital and social advertising, TV advertising is still relevant. So how do companies choose how to advertise on TV? 

### Flow of this notebook

We are going to import the database we need, discuss the domain knowledge required to tackle the problem, and then introduce SQL datatypes of importance before taking an analytical approach, and diving into our database. 

#### From importing, to analytics...
#### 

<div> <img src="flow.png" width="1000"> </div>


#### Case study objective

We are going to be solving the problem to investigate, and report back on the most profitable clients, and make recommendations to other clients based on what is most profitable. 

### Basic Queries

SQL Interacts with databases! Can we string together the commands below, when we have a table called students which we want to gather the highest 5 performers from the year 2021, based on their final_marks?

<div> <img src="Basic_queries.jpeg" width="1000"> </div>

### The Legibility

<div> <img src="Legibility.jpeg" width="1000"> </div>

## Section 1: Imports

#### Imports

In [4]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [5]:
%%sql
sqlite:///magenta_media.db

Let's have a look at the tables in our database:

In [6]:
%%sql

SELECT *
FROM sqlite_master
WHERE type = 'table'

 * sqlite:///magenta_media.db
Done.


type,name,tbl_name,rootpage,sql
table,test_table,test_table,5,"CREATE TABLE test_table ( 	""index"" BIGINT, ""0"" TEXT )"
table,ad_details_test,ad_details_test,7,"CREATE TABLE ad_details_test ( 	ad_id TEXT, creative_id TEXT, channel TEXT, programme TEXT, duration BIGINT, cost FLOAT, date DATE, time TIME )"
table,ad_details,ad_details,2,"CREATE TABLE ""ad_details"" (  ""ad_id"" VARCHAR(100),  ""creative_id"" VARCHAR(100),  ""date"" DATE,  ""programme"" VARCHAR(100),  ""channel"" VARCHAR(100),  ""time"" TIME, ""duration"" INTEGER,  ""cost"" REAL, PRIMARY KEY(""ad_id""),  FOREIGN KEY(""creative_id"") REFERENCES ""creative"" (""creative_id"") )"
table,client,client,3,"CREATE TABLE ""client"" (  ""client_id"" VARCHAR(100),  ""client_name"" VARCHAR(100), ""client_start_date"" DATETIME,  ""client_terms"" VARCHAR(100),  PRIMARY KEY(""client_id"") )"
table,creative,creative,9,"CREATE TABLE ""creative"" (  ""creative_id"" VARCHAR(100),  ""creative_name"" VARCHAR(100), ""client_id"" VARCHAR(100),  ""creative_duration"" INTEGER,  PRIMARY KEY(""creative_id""),  FOREIGN KEY(""client_id"") REFERENCES ""client"" (""client_id"") )"


## Section 2: Domain knowledge

We need reliable data, and a lot of it, in order to support this process. The database is going to be critical in `booking` to ensure communication is perfect across all parties. In this case - we can imagine being in the _Reporting_ stage of the media booking cycle, and we want to identify relevant findings in our data. 

### With that in mind, do we understand each of these tables?
#### 

<div> <img src="Magenta_ERD.png" width="500"> </div>



Let's look at the `ad_details` table:

In [35]:
%%sql

SELECT *
FROM ad_details
LIMIT 5

 * sqlite:///magenta_media.db
Done.


ad_id,creative_id,date,programme,channel,time,duration,cost
MAG10001,HOM/602/030,2022-05-01,CAPTAIN HOLT,SUPER-E,17:00:00.000000,30,2000.0
MAG10002,HOM/602/030,2022-05-01,DOUG JUDY,SUPER-E,18:00:00.000000,30,2000.0
MAG10003,HOM/602/030,2022-05-10,SANTIAGO,SUPER-E,14:00:00.000000,30,2000.0
MAG10004,HOM/602/030,2022-05-09,SANTIAGO,SUPER-E,14:30:00.000000,30,2000.0
MAG10005,HOM/602/030,2022-05-11,STORIES OF RACHEL,SUPER-V,14:00:00.000000,30,2000.0


Let's look at the `client` table:

In [36]:
%%sql

SELECT * 
FROM client

 * sqlite:///magenta_media.db
Done.


client_id,client_name,client_start_date,client_terms
CL_001,Homelander,1994-03-16,Package applicable discount not active.
CL_002,Queen Maeve,1986-03-26,Package applicable discount not active.
CL_003,Starlight,1996-05-12,Package applicable discount not active.


Let's look at the `creative` table:

In [37]:
%%sql

SELECT * 
FROM creative

 * sqlite:///magenta_media.db
Done.


creative_id,creative_name,client_id,creative_duration
HOM/602/030,The Name of the Game,CL_001,30
MAE/500/020,Cherry,CL_002,30
STA/101/030,Get Some,CL_003,30
STA/102/010,The Female of the Species,CL_003,30


## Section 3: SQL Data Types

#### SQL has 3 kinds of data that we will be dealing with
- Numeric
 - Integer (`INTEGER`, `BIGINT`)
 - Decimal (e.g. `NUMERIC(10,2)` / `REAL`)
- Text
 - Variable character (`VARCHAR(100)`)
 - Char (`CHAR(50)`)
- Datetime
 - `DATETIME`
 - `DATE`
 - `TIME`

#### Operations
- Numeric
 - Mathematical operations (addition, multiplication, etc) 
 - Summary statistics (frequently using `GROUP BY` statements)
- Text
 - Similarity / containing substring (`%LIKE%`)
 - Concatenation (SQLite uses `||`, otherwise use `concat`)
 - Case (`lower`/ `upper`)
 - Substring (`left`, `right`, `substr`)
- Datetime
 - Interval calculations (`BETWEEN`, `<`, `>`, `=`)
 - Substring (`left`, `right`, `substr`)
 - Datetime properties (`date`, `time`, `strftime(%Y)`, `strftime(%H:%M)`, etc.)
 - Manipulations e.g. `SELECT datetime('now','-1 day','localtime');`

---

**EXAMPLES**

Let's work through a few example problems.

**1A: From the `ad_details` table, can we determine the number of distinct ads on air?**

In [66]:
%%sql

SELECT COUNT(DISTINCT creative_id)
FROM ad_details

 * sqlite:///magenta_media.db
Done.


COUNT(DISTINCT creative_id)
4


The `DISTINCT` keyword find the *unique* entries. If we didn't use the `DISTINCT` keyword, we'd get all instances of `creative_id` in the `ad_details` table (which comes to 23!)

---

**1B: From the `ad_details` table, can we determine the number of distinct ads on air, on or after 12th of May?**


In [72]:
%%sql

SELECT COUNT(DISTINCT creative_id)
FROM ad_details
WHERE date >= '2022-05-12'

 * sqlite:///magenta_media.db
Done.


COUNT(DISTINCT creative_id)
2


Again, note how we make use of the `DISTINCT` keyword to help us here. Now we've introduced a condition, so we include a `WHERE` clause. We use interval calculations (>=) with our date. The date is stored as a DATE datatype, though one might think it looks like a text string. Try taking out the quotation marks and see what happens when you run the query - why do you think this happens?

---

**2: If we assume the spending for May happens each month (i.e. x 12) what would the total revenue be? Remember, this data we have is all from the month of May**

In [73]:
%%sql

SELECT sum(cost) as 'May Revenue', sum(cost)*12 as 'Annual Revenue'
FROM ad_details

 * sqlite:///magenta_media.db
Done.


May Revenue,Annual Revenue
34666.0,415992.0


Note how we've used *aliases* here, for the column names. This is to rename the columns in our output to make them more intuitive and understandable. If we didn't have these aliases, we'd simply have column headings corresponding to the calculation we've used in our `SELECT` statement. While these might still make sense to us, to someone else trying to read our query, it might not be so intuitive.

---

**3: Which clients joined prior to 1994?**

In [54]:
%%sql 

SELECT client_name, client_start_date, strftime('%Y', client_start_date) as start_year
FROM client
WHERE strftime('%Y', client_start_date) < '1994'

 * sqlite:///magenta_media.db
Done.


client_name,client_start_date,start_year
Queen Maeve,1986-03-26,1986


A nice example of using `strftime` (which you may have come across during the Python sprint) to extract certain portions of the date stored in our table. Could we form our `WHERE` clause differently, without having to extract the year from our date?

---

**4: What is the creative_id of the ad, which mentions 'Species'?**

In [50]:
%%sql 

SELECT creative_id, creative_name
FROM creative
WHERE creative_name LIKE '%species%'

 * sqlite:///magenta_media.db
Done.


creative_id,creative_name
STA/102/010,The Female of the Species


The `LIKE` operator searches text for certain patterns/substrings that we specify. In this case, we've looked for the word 'species', and we've used *wildcards* with it. The two wildcards used with the `LIKE` operator are the percentage sign (%) and the underscore ( _ ). The % wildcard indicates zero, one or multiple characters, while the _ represents one, single character. The way we've used it in the above example tells SQL that we are looking for any occurrence of 'species', surrounded by any number of characters. The table below shows a few examples of how wildcards could be used:

[Source](https://www.w3schools.com/sql/sql_like.asp)

| LIKE Operator | Description |
| :- | :- | 
| WHERE CustomerName LIKE 'a%' | Finds any values that start with "a" | 
| WHERE CustomerName LIKE '%a' | Finds any values that end with "a" |
| WHERE CustomerName LIKE '%or%' | Finds any values that have "or" in any position |
| WHERE CustomerName LIKE '_r%' | Finds any values that have "r" in the second position |
| WHERE CustomerName LIKE 'a_%' | Finds any values that start with "a" and are at least 2 characters in length |
| WHERE CustomerName LIKE 'a__%' | Finds any values that start with "a" and are at least 3 characters in length |
| WHERE CustomerName LIKE 'a%o' | Finds any values that start with "a" and ends with "o" |

---

## Section 4: Aggregations, Filtering, Sorting

Objective: We want to determine what ad types are the most profitable, what channels are the most profitable, and who our "prized" client is! Why do these things matter... because they are under our control, so we can influence them to make the company more profitable. 

The code below gives us details about whichever table we specify (here we're looking at information about the `ad_details` table. We can see the column names, data types, default value, and which is the Primary Key (pk).

In [7]:
%%sql

pragma table_info(ad_details)

 * sqlite:///magenta_media.db
Done.


cid,name,type,notnull,dflt_value,pk
0,ad_id,VARCHAR(100),0,,1
1,creative_id,VARCHAR(100),0,,0
2,date,DATE,0,,0
3,programme,VARCHAR(100),0,,0
4,channel,VARCHAR(100),0,,0
5,time,TIME,0,,0
6,duration,INTEGER,0,,0
7,cost,REAL,0,,0


#### Now that we have a database, we need to identify business-relevant information

- Understand RELATIONSHIPS in the database
- Find a specific value / text
- GROUP BY clients
- ORDER BY highest costs
- Make numeric CALCULATIONS 
- Use ALIASES to create intuitive outputs


Example 1: Find the ads average cost to the clients to air each ad

In [None]:
%%sql


Example 2: Find the ads which bring in the highest revenue (i.e. total cost, per ad, is our revenue)

In [None]:
%%sql


Example 3: Find the channels which are the most expensive (in ZAR, use $1 = R 18.85 rounded to 'cents' i.e. 2 d.p.)

In [None]:
%%sql


Example 5: (This one is a little more difficult - you haven't necessarily come across CASE WHEN statements just yet, but you can give it a shot!) Use case-statements to determine which adverts have had a total_operational_cost under $20,000

-> total_operational_cost = (airtime cost + production of $10,000)

In [None]:
%%sql


## Section 5: Findings. Conclusions. Recommendations. 

Take your findings, and translate it into value. 

- Most profitable ad?
- Most profitable channel?
- Should clients have more creatives?
- ...?

...What data would we want, in order to properly advise our clients for their best interests?

...How can we incentivise clients? Think about a mock marketing strategy.