<img src = "https://images2.imgbox.com/60/09/VFwl5LOq_o.jpg" width="400">

# 3. Aggregate Functions
---

This chapter teaches you how to use aggregate functions to summarize data and gain useful insights. You'll also learn about arithmetic in SQL and how to use aliases to make your results more readable.

In [1]:
# %pip install ipython-sql

In [2]:
%load_ext sql

In [3]:
%sql sqlite:///data/database.db

'Connected: @data/database.db'

## Aggregate functions
---

Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

For example,

`SELECT AVG(budget)
 FROM films`

gives you the average value from the **budget** column of the **films** table. Similarly, the `MAX()` function returns the highest budget:

`SELECT MAX(budget)
 FROM films`

The `SUM()` function returns the result of adding up the numeric values in a column:

`SELECT SUM(budget)
 FROM films`

You can probably guess what the `MIN()` function does! Now it's your turn to try out some SQL functions.

### Instructions
Use the `SUM()` function to get the total duration of all films.

In [4]:
%%sql

SELECT SUM( duration )
FROM   films 

 * sqlite:///data/database.db
Done.


SUM( duration )
534882


Get the average duration of all films.

In [5]:
%%sql

SELECT AVG( duration )
FROM   films 

 * sqlite:///data/database.db
Done.


AVG( duration )
107.94793138244198


Get the duration of the shortest film.

In [6]:
%%sql

SELECT MIN( duration )
FROM   films 

 * sqlite:///data/database.db
Done.


MIN( duration )
7


Get the duration of the longest film.

In [7]:
%%sql

SELECT MAX( duration )
FROM   films

 * sqlite:///data/database.db
Done.


MAX( duration )
334


## Aggregate functions practice
---

Aggregate functions are important to understand, so let's get some more practice!

Use the `SUM()` function to get the total amount grossed by all films.

In [8]:
%%sql

SELECT SUM( gross )
FROM   films 

 * sqlite:///data/database.db
Done.


SUM( gross )
202515840134


Get the average amount grossed by all films.

In [9]:
%%sql

SELECT AVG( gross )
FROM   films 

 * sqlite:///data/database.db
Done.


AVG( gross )
48705108.25733526


Get the amount grossed by the worst performing film.

In [10]:
%%sql

SELECT MIN( gross )
FROM   films 

 * sqlite:///data/database.db
Done.


MIN( gross )
162


Get the amount grossed by the best performing film.

In [11]:
%%sql

SELECT MAX( gross )
FROM   films 

 * sqlite:///data/database.db
Done.


MAX( gross )
936627416


## Combining aggregate functions with WHERE
---

Aggregate functions can be combined with the `WHERE` clause to gain further insights from your data.

For example, to get the total budget of movies made in the year 2010 or later:

`SELECT SUM(budget)
 FROM films
 WHERE release_year >= 2010`

Use the `SUM()` function to get the total amount grossed by all films made in the year 2000 or later.

In [12]:
%%sql

SELECT SUM( gross )
FROM   films
WHERE  release_year >= 2000 

 * sqlite:///data/database.db
Done.


SUM( gross )
150900926358


Get the average amount grossed by all films whose titles start with the letter 'A'.

In [13]:
%%sql

SELECT AVG( gross )
FROM   films
WHERE  title LIKE 'A%' 

 * sqlite:///data/database.db
Done.


AVG( gross )
47893236.42248062


Get the amount grossed by the worst performing film in 1994.

In [14]:
%%sql

SELECT MIN( gross )
FROM   films
WHERE  release_year = 1994 

 * sqlite:///data/database.db
Done.


MIN( gross )
125169


Get the amount grossed by the best performing film between 2000 and 2012, inclusive.

In [15]:
%%sql

SELECT MAX( gross )
FROM   films
WHERE  release_year BETWEEN 2000 AND 2012 

 * sqlite:///data/database.db
Done.


MAX( gross )
760505847


## A note on arithmetic
---

In addition to using aggregate functions, you can perform basic arithmetic with symbols like `+`, `-`, `*`, and `/`.

So, for example, this gives a result of 12:

`SELECT (4 * 3)`

However, the following gives a result of 1:

`SELECT (4 / 3)`

What's going on here?

SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!

If you want more precision when dividing, you can add decimal places to your numbers. For example,

`SELECT (4.0 / 3.0) AS result`

gives you the result you would expect: `1.333`.

What is the result of `SELECT (10 / 3)`?

In [16]:
%%sql

SELECT (10 / 3)

 * sqlite:///data/database.db
Done.


(10 / 3)
3


## It's AS simple AS aliasing
---

You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,

`SELECT MAX(budget)
 FROM films`

gives you a result with one column, named **max**. But what if you use two functions like this?

`SELECT MAX(budget), MAX(duration)
 FROM films`

Well, then you'd have two columns named **max**, which isn't very useful!

To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the `AS` keyword, which you've already seen earlier in this course.

For example, in the above example we could use aliases to make the result clearer:

`SELECT MAX(budget) AS max_budget,
        MAX(duration) AS max_duration
 FROM films`

Aliases are helpful for making results more readable!

### Instructions
Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as `net_profit`.

In [17]:
%%sql

SELECT title,
       ( gross - budget ) AS net_profit
FROM   films
LIMIT  10

 * sqlite:///data/database.db
Done.


title,net_profit
Intolerance: Love's Struggle Throughout the Ages,
Over the Hill to the Poorhouse,2900000.0
The Big Parade,
Metropolis,-5973565.0
Pandora's Box,
The Broadway Melody,2429000.0
Hell's Angels,
A Farewell to Arms,
42nd Street,1861000.0
She Done Him Wrong,


Get the title and duration in hours for all films. The duration is in minutes, so you'll need to divide by 60.0 to get the duration in hours. Alias the duration in hours as `duration_hours`.

In [18]:
%%sql

SELECT title,
       ( duration / 60.0 ) AS duration_hours
FROM   films
LIMIT  10 

 * sqlite:///data/database.db
Done.


title,duration_hours
Intolerance: Love's Struggle Throughout the Ages,2.05
Over the Hill to the Poorhouse,1.8333333333333333
The Big Parade,2.5166666666666666
Metropolis,2.4166666666666665
Pandora's Box,1.8333333333333333
The Broadway Melody,1.6666666666666667
Hell's Angels,1.6
A Farewell to Arms,1.3166666666666669
42nd Street,1.4833333333333334
She Done Him Wrong,1.1


Get the average duration in hours for all films, aliased as `avg_duration_hours`.

In [19]:
%%sql

SELECT AVG( duration ) / 60.0 AS avg_duration_hours
FROM   films 

 * sqlite:///data/database.db
Done.


avg_duration_hours
1.7991321897073662


## Even more aliasing
---

Let's practice your newfound aliasing skills some more before moving on!

Recall: SQL assumes that if you divide an integer by an integer, you want to get an integer back.

This means that the following will erroneously result in **400.0**:

`SELECT 45 / 10 * 100.0`

This is because **45 / 10** evaluates to an integer (**4**), and not a decimal number like we would expect.

So when you're dividing make sure at least one of your numbers has a decimal place:

`SELECT 45 * 100.0 / 10`

The above now gives the correct answer of **450.0** since the numerator (**45 * 100.0**) of the division is now a decimal!

### Instructions
Get the percentage of `people` who are no longer alive. Alias the result as `percentage_dead`. Remember to use `100.0` and not `100`!

In [20]:
%%sql

SELECT COUNT( deathdate ) * 100.0 / COUNT(*) AS percentage_dead
FROM   people 

 * sqlite:///data/database.db
Done.


percentage_dead
9.372394902941526


Get the number of years between the newest film and oldest film. Alias the result as `difference`.

In [21]:
%%sql

SELECT MAX( release_year ) - MIN( release_year ) AS difference
FROM   films 

 * sqlite:///data/database.db
Done.


difference
100


Get the number of decades the `films` table covers. Alias the result as `number_of_decades`. The top half of your fraction should be enclosed in parentheses.

In [22]:
%%sql

SELECT ( MAX( release_year ) - MIN( release_year ) ) / 10 AS number_of_decades
FROM   films 

 * sqlite:///data/database.db
Done.


number_of_decades
10
