# **Intermediate SQL**

# Course Description

SQL is the most popular language for turning raw data stored in a database into actionable insights. Using a database of films made around the world, this course covers:
✓ How to filter and compare data
✓ How to use aggregate functions to summarize data
✓ How to sort and group your data
✓ How to present your data cleanly using tools such as rounding and aliasing

Accompanied at every step with hands-on practice queries, this course teaches you everything you need to know to analyze data using your own SQL code today!


# **Chapter 1: Selecting Data**

In this first chapter, you’ll learn how to query a films database and select the data needed to answer questions about the movies and actors. You'll also understand how SQL code is executed and formatted.


# **Querying a database**

# Learning to COUNT()
You saw how to use COUNT() in the video. Do you remember what it returns?

Here is a query counting film_id. Select the answer below that correctly describes what the query will return.

SELECT COUNT(film_id) AS count_film_id
FROM reviews;
Run the query in the console to test your theory!

## Instructions

Possible answers


The number of unique films in the reviews table.

**The number of records containing a film_id.**

The total number of records in the reviews table.

The sum of the film_id field.

# Practice with COUNT()
As you've seen, COUNT(*) tells you how many records are in a table. However, if you want to count the number of non-missing values in a particular field, you can call COUNT() on just that field.

Let's get some practice with COUNT()! You can look at the data in the tables throughout these exercises by clicking on the table name in the console.

## Instructions 1/3

Count the total number of records in the people table, aliasing the result as count_records.

In [None]:
-- Count the number of records in the people table
SELECT COUNT(*) AS count_records
FROM people;

In [None]:
count_records
8397


## Instructions 2/3

Count the number of records with a birthdate in the people table, aliasing the result as count_birthdate.

In [None]:
-- Count the number of birthdates in the people table
SELECT COUNT( birthdate) AS count_birthdate
FROM people

In [None]:
count_birthdate
6152


## Instructions 3/3

Count the records for languages and countries in the films table; alias as count_languages and count_countries.

In [None]:
-- Count the languages and countries represented in the films table
SELECT COUNT(language) AS count_languages, COUNT(country) AS count_countries
FROM films;

In [None]:
count_languages	count_countries
4957	4966


# SELECT DISTINCT
Often query results will include many duplicate values. You can use the DISTINCT keyword to select the unique values from a field.

This might be useful if, for example, you're interested in knowing which languages are represented in the films table. See if you can find out what countries are represented in this table with the following exercises.

## Instructions 1/2

Return the unique countries represented in the films table using DISTINCT.

In [None]:
-- Return the unique countries from the films table
SELECT DISTINCT country
FROM films;


In [None]:
country
null
Soviet Union
Indonesia
Italy
.
.
.



## Instructions 2/2

Return the unique countries represented in the films table using DISTINCT.
Return the number of unique countries represented in the films table, aliased as count_distinct_countries.

In [None]:
-- Count the distinct countries from the films table
SELECT COUNT(DISTINCT country) AS count_distinct_countries
FROM films;

In [None]:
count_distinct_countries
64

# **Query execution**

# Debugging errors
Debugging is an essential skill for all coders, and it comes from making many mistakes and learning from them.

In this exercise, you'll be given some buggy code that you'll need to fix.

## Instructions 1/3

Debug and fix the SQL query provided.

In [None]:
-- Debug this code
SELECT certification
FROM films
LIMIT 5;

In [None]:
certification
Not Rated
null
Not Rated
Not Rated
Not Rated



## Instructions 2/3

Find the two errors in this code; the same error has been repeated twice.

In [None]:
-- Debug this code
SELECT film_id, imdb_score, num_votes
FROM reviews;

In [None]:
film_id	imdb_score	num_votes
3934	7.1	203461
3405	6.4	149998
478	3.2	8465
.
.
.


## Instructions 3/3

Find the two bugs in this final query.

In [None]:
-- Debug this code
SELECT COUNT(birthdate) AS count_birthdays
FROM people;

In [None]:
count_birthdays
6152

# **SQL style**
# Formatting
Readable code is highly valued in the coding community and professional settings. Without proper formatting, code and results can be difficult to interpret. You'll often be working with other people that need to understand your code or be able to explain your results, so having a solid formatting habit is essential.

In this exercise, you'll correct poorly written code to better adhere to SQL style standards.

## Instructions

Adjust the sample code so that it is in line with standard practices.


In [None]:
-- Rewrite this query
SELECT person_id, role
FROM roles
LIMIT 10;

In [None]:
person_id	role
1630	director
4843	actor
5050	actor
8175	actor
3000	director
4019	actor
5274	actor
7449	actor
1459	actor
3929	actor
Showing 10 out of 10 rows


# **Chapter 2: Filtering Records**
# Using WHERE with numbers
Filtering with WHERE allows you to analyze your data better. You may have a dataset that includes a range of different movies, and you need to do a case study on the most notable films with the biggest budgets. In this case, you'll want to filter your data to a specific budget range.

Now it's your turn to use the WHERE clause to filter numeric values!

## Instructions 1/3

Select the film_id and imdb_score from the reviews table and filter on scores higher than 7.0.

In [None]:
-- Select film_ids and imdb_score with an imdb_score over 7.0
SELECT film_id, "imdb_score"
FROM reviews
WHERE imdb_score > 7.0;


In [None]:
film_id	imdb_score
3934	7.1
74	7.6
1254	8

.
.
.


## Instructions 2/3

Select the film_id and facebook_likes of the first ten records with less than 1000 likes from the reviews table


## Instructions 2/3

Select the film_id and facebook_likes of the first ten records with less than 1000 likes from the reviews table.


In [None]:
-- Select film_ids and facebook_likes for ten records with less than 1000 likes
SELECT film_id, facebook_likes
FROM reviews
WHERE facebook_likes < 1000
LIMIT 10;


In [None]:
film_id	facebook_likes
3405	0
478	491
74	930
740	0
2869	689
1181	0
2020	0
2312	912
1820	872
831	975


## Instructions 3/3

Count how many records have a num_votes of at least 100,000; use the alias films_over_100K_votes.

In [None]:
-- Count the records with at least 100,000 votes
SELECT COUNT(num_votes) AS films_over_100K_votes
FROM reviews
WHERE num_votes >100000;

In [None]:
films_over_100k_votes
1211

# Using WHERE with text
WHERE can also filter string values.

Imagine you are part of an organization that gives cinematography awards, and you have several international categories. Before you confirm an award for every language listed in your dataset, it may be worth seeing if there are enough films of a specific language to make it a fair competition. If there is only one movie or a significant skew, it may be worth considering a different way of giving international awards.

Let's try this out!

## Instructions

Select and count the language field using the alias count_spanish.
Apply a filter to select only Spanish from the language field.

In [None]:
-- Count the Spanish-language films
SELECT COUNT(language) AS count_spanish
FROM films
WHERE language = 'Spanish';

In [None]:
count_spanish
40

# **Multiple criteria**
# Using AND
The following exercises combine AND and OR with the WHERE clause. Using these operators together strengthens your queries and analyses of data.

You will apply these new skills now on the films database.

## Instructions 1/3

Select the title and release_year for all German-language films released before 2000.

In [None]:
-- Select the title and release_year for all German-language films released before 2000
SELECT *
FROM films
WHERE language = 'German' AND release_year < 2000;

In [None]:
id	title	release_year	country	duration	language	certification	gross	budget
4	Metropolis	1927	Germany	145	German	Not Rated	26435	6000000
5	Pandora's Box	1929	Germany	110	German	Not Rated	9950	null
124	The Torture Chamber of Dr. Sadism	1967	West Germany	80	German	M	null	null
287	Das Boot	1981	West Germany	293	German	R	11433134	14000000
1110	Run Lola Run	1998	Germany	81	German	R	7267324	3500000
1176	Aimee & Jaguar	1999	Germany	125	German	null


## Instructions 2/3

Update the query from the previous step to show German-language films released after 2000 rather than before.

In [None]:
-- Update the query to see all German-language films released after 2000
SELECT *
FROM films
WHERE release_year > 2000
	AND language = 'German';

In [None]:
id	title	release_year	country	duration	language	certification	gross	budget
1952	Good Bye Lenin!	2003	Germany	121	German	R	4063859	4800000
2130	Downfall	2004	Germany	178	German	R	5501940	13500000
2224	Summer Storm	2004	Germany	98	German	R	95016	2700000
2709	The Lives of Others	2006	Germany	137	German	R	11284657	2000000
3100	The Baader Meinhof Complex	2008	Germany	184	German	R	476270	20000000
3143	The Wave	2008	Germany	107	German	null	null	5000000
3220	Cargo	2009	Switzerland	112	German	null	null	4500000
3346	Soul Kitchen	2009	Germany	99	German	null	274385	4000000
3412	The White Ribbon	2009	Germany	144	German	R	2222647	12000000



# Using OR
This time you'll write a query to get the title and release_year of films released in 1990 or 1999, which were in English or Spanish and took in more than $2,000,000 gross.

It looks like a lot, but you can build the query up one step at a time to get comfortable with the underlying concept in each step. Let's go!

## Instructions 1/3

Select the title and release_year for films released in 1990 or 1999 using only WHERE and OR.*italicised text*

In [None]:
-- Find the title and year of films from the 1990 or 1999
SELECT title, release_year
FROM films
WHERE release_year =1990 OR release_year =1999;

In [None]:
	release_year
Arachnophobia	1990
Back to the Future Part III	1990
Child's Play 2	1990
.
.
.


## Instructions 2/3

Filter the records to only include English or Spanish-language films.

In [None]:
SELECT title, release_year
FROM films
WHERE (release_year = 1990 OR release_year = 1999)
-- Add a filter to see only English or Spanish-language films
AND (language = 'Spanish' OR language = 'English');

In [None]:
title	release_year
Arachnophobia	1990
Back to the Future Part III	1990
Child's Play 2	1990
.
.
.



## Instructions 3/3

Finally, restrict the query to only return films worth more than $2,000,000 gross.

In [None]:
SELECT title, release_year
FROM films
WHERE (release_year = 1990 OR release_year = 1999)
	AND (language = 'English' OR language = 'Spanish')
-- Filter films with more than $2,000,000 gross
	AND gross > 2000000;

In [None]:
title	release_year
Arachnophobia	1990
Back to the Future Part III	1990
Child's Play 2	1990
.
.
.

# Using BETWEEN
Let's use BETWEEN with AND on the films database to get the title and release_year of all Spanish-language films released between 1990 and 2000 (inclusive) with budgets over $100 million.

We have broken the problem into smaller steps so that you can build the query as you go along!

## Instructions 1/4

Select the title and release_year of all films released between 1990 and 2000 (inclusive) using BETWEEN.

In [None]:
-- Select the title and release_year for films released between 1990 and 2000
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000;

In [None]:
title	release_year
Arachnophobia	1990
Back to the Future Part III	1990
.
.
.


## Instructions 2/4

Build on your previous query to select only films with a budget over $100 million.

In [None]:
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
-- Narrow down your query to films with budgets > $100 million
	AND budget >100000000;

In [None]:
title	release_year
Terminator 2: Judgment Day	1991
True Lies	1994
Waterworld	1995
.
.



## Instructions 3/4

Now, restrict the query to only return Spanish-language films.

In [None]:
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
	AND budget > 100000000
-- Restrict the query to only Spanish-language films
	AND language ='Spanish';

In [None]:
title	release_year
Tango	1998



## Instructions 4/4

Finally, amend the query to include all Spanish-language or French-language films with the same criteria.

In [None]:
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
	AND budget > 100000000
-- Amend the query to include Spanish or French-language films
	AND (language = 'Spanish' OR language = 'French');

In [None]:
title	release_year
Les couloirs du temps: Les visiteurs II	1998
Tango	1998


# **Filtering text**

# LIKE and NOT LIKE
The LIKE and NOT LIKE operators can be used to find records that either match or do not match a specified pattern, respectively. They can be coupled with the wildcards % and _. The % will match zero or many characters, and _ will match a single character.

This is useful when you want to filter text, but not to an exact word.

Do the following exercises to gain some practice with these keywords.

# Instructions 1/3

Select the names of all people whose names begin with 'B'.

In [None]:
-- Select the names that start with B
SELECT name
FROM people
WHERE name like 'B%'

In [None]:
name
B.J. Novak
Babak Najafi
Babar Ahmed
Bahare Seddiqi
.
.



## Instructions 2/3


Select the names of people whose names have 'r' as the second letter.

In [None]:
SELECT name
FROM people
-- Select the names that have r as the second letter
WHERE name LIKE '_r%'

In [None]:
name
Ara Celi
Aramis Knight
Arben Bajraktaraj
Arcelia Ramírez
.
.
,


## Instructions 3/3

Select the names of people whose names don't start with 'A'.

In [None]:
SELECT name
FROM people
-- Select names that don't start with A
WHERE name NOT LIKE 'A%'

In [None]:
name
50 Cent
Álex Angulo
Álex de la Iglesia
Ángela Molina
.
.
.

# WHERE IN
You now know you can query multiple conditions using the IN operator and a set of parentheses. It is a valuable piece of code that helps us keep our queries clean and concise.

Try using the IN operator yourself!

## Instructions 1/3

Select the title and release_year of all films released in 1990 or 2000 that were longer than two hours.

In [None]:
-- Find the title and release_year for all films over two hours in length released in 1990 and 2000
SELECT title, release_year
FROM films
WHERE release_year IN(1990, 2000) AND duration >120;

In [None]:
title	release_year
Dances with Wolves	1990
Die Hard 2	1990
Ghost	1990
.
.
.


## Instructions 2/3

Select the title and language of all films in English, Spanish, or French using IN.

In [None]:
-- Find the title and language of all films in English, Spanish, and French
SELECT title, language
FROM films
WHERE language IN ('English', 'Spanish', 'French');

In [None]:
title	language
The Broadway Melody	English
Hell's Angels	English
A Farewell to Arms	English
....


## Instructions 3/3

Select the title, certification and language of all films certified NC-17 or R that are in English, Italian, or Greek.

In [None]:
-- Find the title, certification, and language all films certified NC-17 or R that are in English, Italian, or Greek
SELECT title, certification, language
FROM films
WHERE certification IN ('NC-17', 'R') AND language IN('English', 'Italian', 'Greek');

# Combining filtering and selecting
Time for a little challenge. So far, your SQL vocabulary from this course includes COUNT(), DISTINCT, LIMIT, WHERE, OR, AND, BETWEEN, LIKE, NOT LIKE, and IN. In this exercise, you will try to use some of these together. Writing more complex queries will be standard for you as you become a qualified SQL programmer.

As this query will be a little more complicated than what you've seen so far, we've included a bit of code to get you started. You will be using DISTINCT here too because, surprise, there are two movies named 'Hamlet' in this dataset!

Follow the instructions to find out what 90's films we have in our dataset that would be suitable for English-speaking teens.

## Instructions

Count the unique titles from the films database and use the alias provided.
Filter to include only movies with a release_year from 1990 to 1999, inclusive.
Add another filter narrowing your query down to English-language films.
Add a final filter to select only films with 'G', 'PG', 'PG-13' certifications.

In [None]:
-- Count the unique titles
SELECT COUNT(DISTINCT title) AS nineties_english_films_for_teens
FROM films
-- Filter to release_years to between 1990 and 1999
WHERE release_year >= 1990 AND release_year <=1999
-- Filter to English-language films
AND language ='English'
-- Narrow it down to G, PG, and PG-13 certifications
AND certification IN ('G', 'PG', 'PG-13');

In [None]:
nineties_english_films_for_teens
310

# **NULL values**
# Practice with NULLs
Well done. Now that you know what NULL means and what it's used for, it's time for some more practice!

Let's explore the films table again to better understand what data you have.

# Instructions 1/2

Select the title of every film that doesn't have a budget associated with it and use the alias no_budget_info.

In [None]:
-- List all film titles with missing budgets
SELECT title AS no_budget_info
FROM films
WHERE budget IS NULL;

In [None]:
no_budget_info
Pandora's Box
The Prisoner of Zenda
.
.
.


## Instructions 2/2

Count the number of films with a language associated with them and use the alias count_language_known.

In [None]:
-- Count the number of films we have language data for
SELECT COUNT(language) AS count_language_known
FROM films
WHERE language IS NOT NULL;

In [None]:
count_language_known
4957

# **Chapter 3: Aggregate Functions**
SQL allows you to zoom in and out to better understand an entire dataset, its subsets, and its individual records. You'll learn to summarize data using aggregate functions and perform basic arithmetic calculations inside queries to gain insights into what makes a successful film.

# **Summarizing data**

# Practice with aggregate functions
Now let's try extracting summary information from a table using these new aggregate functions. Summarizing is helpful in real life when extracting top-line details from your dataset. Perhaps you'd like to know how old the oldest film in the films table is, what the most expensive film is, or how many films you have listed.

Now it's your turn to get more insights about the films table!

# Instructions 1/4

Use the SUM() function to calculate the total duration of all films and alias with total_duration.

In [None]:
-- Query the sum of film durations
SELECT SUM(duration) AS total_duration
FROM films

In [None]:
total_duration
534882


## Instructions 2/4

Calculate the average duration of all films and alias with average_duration.

In [None]:
-- Calculate the average duration of all films
SELECT AVG(duration) AS average_duration
FROM films

In [None]:
average_duration
107.9479313824419778


## Instructions 3/4

Find the most recent release_year in the films table, aliasing as latest_year.

In [None]:
-- Find the latest release_year
SELECT MAX(release_year) AS latest_year
FROM films

In [None]:
latest_year
2016


## Instructions 4/4

Find the duration of the shortest film and use the alias shortest_film.

In [None]:
-- Find the duration of the shortest film
SELECT MIN(duration) AS shortest_film
FROM films

In [None]:
shortest_film
7

# **Summarizing subsets**
# Combining aggregate functions with WHERE
When combining aggregate functions with WHERE, you get a powerful tool that allows you to get more granular with your insights, for example, to get the total budget of movies made from the year 2010 onwards.

This combination is useful when you only want to summarize a subset of your data. In your film-industry role, as an example, you may like to summarize each certification category to compare how they each perform or if one certification has a higher average budget than another.

Let's see what insights you can gain about the financials in the dataset.

## Instructions 1/4

Use SUM() to calculate the total gross for all films made in the year 2000 or later, and use the alias total_gross.

In [None]:
-- Calculate the sum of gross from the year 2000 or later
SELECT SUM(gross) AS total_gross
FROM films
WHERE release_year >= 2000;

In [None]:
total_gross
150900926358


Instructions 2/4

Calculate the average amount grossed by all films whose titles start with the letter 'A' and alias with avg_gross_A.

In [None]:
-- Calculate the average gross of films that start with A
SELECT AVG(gross) AS avg_gross_A
FROM films
WHERE title LIKE 'A%';

In [None]:
avg_gross_a
47893236.422480620155


## Instructions 3/4

Calculate the lowest gross film in 1994 and use the alias lowest_gross.

In [None]:
-- Calculate the lowest gross film in 1994
SELECT MIN(gross) AS lowest_gross
FROM films
WHERE release_year = 1994;

In [None]:
lowest_gross
125169


## Instructions 4/4

Calculate the highest gross film between 2000 and 2012, inclusive, and use the alias highest_gross.

In [None]:
-- Calculate the highest gross film released between 2000-2012
SELECT MAX(gross) AS highest_gross
FROM films
WHERE release_year >= 2000 AND release_year <= 2012;

In [None]:
highest_gross
760505847

# Using ROUND()
Aggregate functions work great with numerical values; however, these results can sometimes get unwieldy when dealing with long decimal values. Luckily, SQL provides you with the ROUND() function to tame these long decimals.

If asked to give the average budget of your films, ten decimal places is not necessary. Instead, you can round to two decimal places to create results that make more sense for currency.

Now you try!

## Instructions

Calculate the average facebook_likes to one decimal place and assign to the alias, avg_facebook_likes.

In [None]:
-- Round the average number of facebook_likes to one decimal place
SELECT ROUND(AVG(facebook_likes), 1) AS avg_facebook_likes
FROM reviews


In [None]:
avg_facebook_likes
7802.9

# ROUND() with a negative parameter
A useful thing you can do with ROUND() is have a negative number as the decimal place parameter. This can come in handy if your manager only needs to know the average number of facebook_likes to the hundreds since granularity below one hundred likes won't impact decision making.

Social media plays a significant role in determining success. If a movie trailer is posted and barely gets any likes, the movie itself may not be successful. Remember how 2020's "Sonic the Hedgehog" movie got a revamp after the public saw the trailer?

Let's apply this to other parts of the dataset and see what the benchmark is for movie budgets so, in the future, it's clear whether the film is above or below budget.

## Instructions

Calculate the average budget from the films table, aliased as avg_budget_thousands, and round to the nearest thousand.

In [None]:
-- Calculate the average budget rounded to the thousands
SELECT ROUND(AVG(budget), -3) AS avg_budget_thousands
FROM films;


In [None]:
avg_budget_thousands
39903000

# **Aliasing and arithmetic**

# Using arithmetic
SQL arithmetic comes in handy when your table is missing a metric you want to review. Suppose you have some data on movie ticket sales, but the table only has fields for ticket price and discount. In that case, you could combine these by subtracting the discount from the ticket price to get the amount the film-goer paid.

You have seen that SQL can act strangely when dividing integers. What is the result if you divide a discount of two dollars by the paid_price of ten dollars to get the discount percentage?

## Instructions

Possible answers


2

0.222

## **0**

0.2

# Aliasing with functions
Aliasing can be a lifesaver, especially as we start to do more complex SQL queries with multiple criteria. Aliases help you keep your code clean and readable. For example, if you want to find the MAX() value of several fields without aliasing, you'll end up with the result with several columns called max and no idea which is which. You can fix this with aliasing.

Now, it's over to you to clean up the following queries.

# Instructions 1/3

Select the title and duration in hours for all films and alias as duration_hours; since the current durations are in minutes, you'll need to divide duration by 60.0.

In [None]:
-- Calculate the title and duration_hours from films
SELECT title, duration/60.0 AS duration_hours
FROM films;


## Instructions 2/3

Calculate the percentage of people who are no longer alive and alias the result as percentage_dead.

In [None]:
-- Calculate the percentage of people who are no longer alive
SELECT COUNT(deathdate) * 100.0 /COUNT(*) AS percentage_dead
FROM people;

In [None]:
percentage_dead
9.3723949029415267


## Instructions 3/3

Find how many decades (period of ten years) the films table covers by using MIN() and MAX(); alias as number_of_decades.

In [None]:
-- Find the number of decades in the films table
SELECT (MAX(release_year) -MIN(release_year))/ 10.0 AS number_of_decades
FROM films;

In [None]:
number_of_decades
10.0000000000000000

# Rounding results
You found some valuable insights in the previous exercise, but many of the results were inconveniently long. We forgot to round! We won't make you redo them all; however, you'll update the worst offender in this exercise.

## Instructions

Update the query by adding ROUND() around the calculation and round to two decimal places.

In [None]:
-- Round duration_hours to two decimal places
SELECT title, ROUND(duration / 60.0, 2) AS duration_hours
FROM films;

In [None]:
itle	                                              duration_hours
Intolerance: Love's Struggle Throughout the Ages	   2.05
.
.
.


# **Chapter 4: Sorting and Grouping**

This final chapter teaches you how to sort and group data. These skills will take your analyses to a new level by helping you uncover critical business insights and identify trends and performance. You'll get hands-on experience to determine which films performed the best and how movie durations and budgets changed over time.

# **Sorting results**
# Sorting single fields
Now that you understand how ORDER BY works, you'll put it into practice. In this exercise, you'll work on sorting single fields only. This can be helpful to extract quick insights such as the top-grossing or top-scoring film.

The following exercises will help you gain further insights into the film database.

## Instructions 1/2

Select the name of each person in the people table, sorted alphabetically.

In [None]:
-- Select name from people and sort alphabetically
SELECT name
FROM people
ORDER BY name;

In [None]:
name
50 Cent
A. Michael Baldwin


## Instructions 2/2

Select the title and duration for every film, from longest duration to shortest.

In [None]:
-- Select the title and duration from longest to shortest film
SELECT title, duration
FROM films
ORDER BY duration DESC;

In [None]:
title	                        duration
Destiny	                      null
Should've Been Romeo	        null
.
.
.

# Sorting multiple fields
ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort by the next, and so on. As an example, you may want to sort the people data by age and keep the names in alphabetical order.

Try using ORDER BY to sort multiple columns.

## Instructions 1/2

Select the release_year, duration, and title of films ordered by their release year and duration, in that order.

In [None]:
-- Select the release year, duration, and title sorted by release year and duration
SELECT release_year, duration, title
FROM films
ORDER BY release_year, duration;

In [None]:
release_year	  duration	         title
1916	           123	              Intolerance: Love's Struggle Throughout the Ages


## Instructions 2/2

Select the certification, release_year, and title from films ordered first by certification (alphabetically) and second by release year, starting with the most recent year.

In [None]:
-- Select the certification, release year, and title sorted by certification and release year
SELECT certification, release_year, title
FROM films
ORDER BY certification, release_year DESC;

In [None]:
certification	        release_year	        title
Approved	            1967	                In Cold Blood
Approved	            1967	                You Only Live Twice
.
.
.

# **Grouping data**
# GROUP BY single fields
GROUP BY is a SQL keyword that allows you to group and summarize results with the additional use of aggregate functions. For example, films can be grouped by the certification and language before counting the film titles in each group. This allows you to see how many films had a particular certification and language grouping.

In the following steps, you'll summarize other groups of films to learn more about the films in your database.

## Instructions 1/2

Select the release_year and count of films released in each year aliased as film_count.

In [None]:
-- Find the release_year and film_count of each year
SELECT release_year, COUNT(title) AS film_count
FROM films
GROUP BY release_year;

In [None]:
release_year	      film_count
1954	               5
1988	               31
.
.
.


## Instructions 2/2

Select the release_year and average duration aliased as avg_duration of all films, grouped by release_year.

In [None]:
-- Find the release_year and average duration of films for each year
SELECT release_year, AVG(duration) AS avG_duration
FROM films
GROUP BY release_year;


In [None]:
release_year	    avg_duration
1954	            140.6000000000000000
1988	            107.0000000000000000
.
.
.

# GROUP BY multiple fields
GROUP BY becomes more powerful when used across multiple fields or combined with ORDER BY and LIMIT.

Perhaps you're interested in learning about budget changes throughout the years in individual countries. You'll use grouping in this exercise to look at the maximum budget for each country in each year there is data available.

## Instructions

Select the release_year, country, and the maximum budget aliased as max_budget for each year and each country; sort your results by release_year and country.

In [None]:
-- Find the release_year, country, and max_budget, then group and order by release_year and country
SELECT release_year, country, MAX(budget) AS max_budget
FROM films
GROUP BY release_year, country;

In [None]:
release_year	    country	            max_budget
1965	            UK	                9000000
2000	            Argentina	          1500000

# **Filtering grouped data**
# Filter with HAVING
Your final keyword is HAVING. It works similarly to WHERE in that it is a filtering clause, with the difference that HAVING filters grouped data.

Filtering grouped data can be especially handy when working with a large dataset. When working with thousands or even millions of rows, HAVING will allow you to filter for just the group of data you want, such as films over two hours in length!

Practice using HAVING to find out which countries (or country) have the most varied film certifications.

## Instructions

Select country from the films table, and get the distinct count of certification aliased as certification_count.
Group the results by country.
Filter the unique count of certifications to only results greater than 10.

In [None]:
-- Select the country and distinct count of certification as certification_count
SELECT country, COUNT(DISTINCT certification) AS certification_count
FROM films
-- Group by country
GROUP BY country
-- Filter results to countries with more than 10 different certifications
HAVING COUNT(DISTINCT certification) > 10;

In [None]:
country	      certification_count
USA	           12

# HAVING and sorting
Filtering and sorting go hand in hand and gives you greater interpretability by ordering our results.

Let's see this magic at work by writing a query showing what countries have the highest average film budgets.

## Instructions

Select the country and the average budget as average_budget, rounded to two decimal, from films.
Group the results by country.
Filter the results to countries with an average budget of more than one billion (1000000000).
Sort by descending order of the average_budget.

In [None]:
-- Select the country and average_budget from films
SELECT country, ROUND(AVG(budget), 2)AS average_budget
FROM films
-- Group by country
GROUP BY country
-- Filter to countries with an average_budget of more than one billion
HAVING AVG(budget) > 1000000000
-- Order by descending order of the aggregated budget
ORDER BY average_budget;

In [None]:
country	               average_budget
Hungary	               1260000000.00
South Korea	           1383960000.00

# All together now
It's time to use much of what you've learned in one query! This is good preparation for using SQL in the real world where you'll often be asked to write more complex queries since some of the basic queries can be answered by playing around in spreadsheet applications.

In this exercise, you'll write a query that returns the average budget and gross earnings for films each year after 1990 if the average budget is greater than 60 million.

This will be a big query, but you can handle it!

## Instructions 1/4

Select the release_year for each film in the films table, filter for records released after 1990, and group by release_year.

In [None]:
-- Select the budget for films released after 1990 grouped by year
SELECT release_year
FROM films
WHERE release_year > 1990
GROUP BY release_year;

In [None]:
release_year
2008
1991
.
.
.


## Instructions 2/4

Modify the query to include the average budget aliased as avg_budget and average gross aliased as avg_gross for the results we have so far.

In [None]:
-- Modify the query to also list the average budget and average gross
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year;

In [None]:
release_year	        avg_budget	                  avg_gross
2008	                 41804885.572139303483	      44573509.378109452736


## Instructions 3/4

Modify the query once more so that only years with an average budget of greater than 60 million are included.

In [None]:
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
-- Modify the query to see only years with an avg_budget of more than 60 million
HAVING AVG(budget) >60000000;

In [None]:
release_year	          avg_budget	                      avg_gross
2005	                  70323938.231527093596	          41159143.290640394089
2006	                  93968929.577464788732	          39237855.953703703704


## Instructions 4/4

Finally, order the results from the highest average gross and limit to one.

In [None]:
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
-- Order the results from highest to lowest average gross and limit to one
ORDER BY avg_gross DESC
LIMIT 1;

In [None]:
release_year	      avg_budget	               avg_gross
2005	              70323938.231527093596	     41159143.290640394089