## Organizing Data

# Scenario
The database operations team has created a relational database named world containing three tables:<br> city, country, and countrylanguage.<br> You help write a few queries to group records for analysis by using both the GROUP BY and OVER clauses.

## Lab overview and objectives
This lab demonstrates how to use some common database functions with the GROUP BY and OVER clauses.<br>

After completing this lab, you should be able to:<br>

1. Use the GROUP BY clause with the aggregate function SUM()
2. Use the OVER clause with the RANK() window function
3. Use the OVER clause with the aggregate function SUM() and the RANK() window function

When you start the lab, the following resources are already created for you:<br>
<img src="img/architecture-start (6).jpg" style="height:400px">  <br>
A Command Host instance and world database containing three tables

At the end of this lab, you would have used both the GROUP BY and OVER clauses with some common database operators:<br>
<img src="img/architecture-end (6).jpg" style = "height:400px"> <br>
A lab user is connected to a database instance. It also displays some commonly used SQL clauses and database functions.

## Task 2: Query the world database
In this task, you query the world database using various SELECT statements and database functions.

12. To show the existing databases, enter the following command in the terminal.<br> 
>>SHOW DATABASES;

Verify that a database named world is available. If the world database is not available, contact your instructor.

13. To review the table schema, data, and number of rows in the country table, enter the following query.
>>SELECT * <br>
>>FROM world.country;

14. To return a list of records where the Region is Australia and New Zealand, <br>run the following query.<br> This query includes an ORDER BY clause (which a previous lab introduced) that arranges the results by Population in descending order.
>>SELECT Region, Name, Population<br> 
>>FROM world.country<br> 
>>WHERE Region = 'Australia and New Zealand' ORDER By Population desc;

You can use the GROUP BY clause to group related records together.<br> The following example starts by filtering records using a condition where the region is equal to Australia and New Zealand.<br> The results are then grouped together by using a GROUP BY clause.<br> The SUM() function is then applied to the grouped results to generate a total population for that region.<br> Run the following query in your terminal.

>>SELECT Region, SUM(Population)<br>
FROM world.country<br> 
WHERE Region = 'Australia and New Zealand'<br> 
GROUP By Region<br> 
ORDER By SUM(Population) desc;

This query returns a SUM() of the Population for the Australia and New Zealand region.<br> Because the WHERE clause is filtered by Region, only the Australia and New Zealand records are aggregated. 

The following example uses a windowing function to generate a running total by adding the Population of the first record to the Population of the second record and subsequent records.<br> This query uses the OVER() clause to group the records by Region and uses the SUM() function to aggregate the records.<br> The output displays the population of a country along side a running total of the region.<br> Run the following query in your terminal.

>>SELECT Region, Name, Population, SUM(Population) OVER(partition by Region ORDER BY Population) as 'Running Total'<br> FROM world.country<br> 
WHERE Region = 'Australia and New Zealand';

The following query groups the records by Region and orders them by Population with the OVER() clause.<br>  This query also includes the RANK() function to generate a rank number indicating the position of each record in the result set.<br>  The RANK() function is useful when dealing with large groups of records.<br>  Run the following query in your terminal.

>>SELECT Region, Name, Population, SUM(Population) OVER(partition by Region ORDER BY Population) as 'Running Total', RANK() over(partition by region ORDER BY population) as 'Ranked'<br> 
FROM world.country<br>
WHERE region = 'Australia and New Zealand';

## Challenge
Write a query to rank the countries in each region by their population from largest to smallest.<br>

You have to determine whether to use either the GROUP BY or OVER grouping clause and either the SUM() or RANK() function.

>>SELECT Region, Name, Population, RANK() OVER(partition by Region ORDER BY Population desc) as 'Ranked'<br> 
FROM world.country<br> 
order by Region, Ranked;

## Conclusion
Congratulations! You have now successfully:<br>

1. Used the GROUP BY clause with the aggregate function SUM()
2. Used the OVER clause with the RANK() window function
3. Used the OVER clause with the aggregate function SUM() and the RANK() window function