<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img
 src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/alx-courses/aice/assets/Content_page_banner_blue_dots.png"
 alt="ALX Content Header"
 class="full-width-image"
/>
</div>

# Grouping with a CASE statement

In this notebook, we learn how to use the `CASE` statement to categorise and group data.

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

By the end of this train, you should:
- Know how to categorise data using `CASE` statements.
- Know how to combine `CASE` statements with aggregate functions for enhanced data summarisation.
- Understand how to use the `GROUP BY` clause with the `CASE` statement.


## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table created in MySQL Workbench, we want to answer some questions on the range of our dataset. We can apply the same queries in MySQL Workbench and in this notebook if we connect to our MySQL server. Since we have a MySQL database, we can connect to it using mysql and pymysql.

In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:Mysql.003@localhost:3306/united_nations

## Exercise

The following table specifies which countries belong to each Regional Economic Community (REC):

| Regional Economic Community | Countries |
|----------------------------|------------|
| SADC                       | Angola, Botswana, Comoros, Democratic Republic of Congo, Eswatini, Lesotho, Madagascar, Malawi, Mauritius, Mozambique, Namibia, Seychelles, South Africa, United Republic Tanzania, Zambia, Zimbabwe |
| UMA                        | Algeria, Libya, Mauritania, Morocco, Tunisia |
| ECOWAS                     | Benin, Burkina Faso, Cabo Verde, Cote d’Ivoire, Gambia, Ghana, Guinea, Guinea-Bissau, Liberia, Mali, Niger, Nigeria, Senegal, Sierra Leone, Togo |
| Not Classified             | Other countries not listed above |

We will use this table when constructing our queries.

### 1. Identify regions in Africa

Construct a query that selects only the regions falling within Africa. Use the `LIKE` operator to identify records where the `Region` contains `Africa`.

In [None]:
# Add your code here

### 2. Classify SADC countries

Use a `CASE` statement to classify whether an African country belongs to SADC, or does not belong to SADC. In other words, if the `country_name` is one of the names we specified as an SADC country (in the table above), it should be classified as SADC. Otherwise, it must be classified as  `Not Classified`.

The query should return three columns:
- An alias, `Regional_economic_community`, that contains the results of the `CASE` statement
- `Country_name`
- `Pct_managed_drinking_water_services`

In [None]:
# Add your code here

### 3. Classify UMA and ECOWAS countries

Add to the `CASE` statement to include classifications for the UMA and ECOWAS RECS. Classify the countries based on the table above.


In [None]:
# Add your code here

### 4. Calculate the minimum, average, and maximum percentages of managed drinking water services for each REC.

Use the `MIN()`, `AVG()`, and `MAX()` aggregate functions on the `Pct_managed_drinking_water_services` column to obtain the minimum, average, and maximum percentages of managed drinking water services for each REC

Make sure to group the results by the REC using the `GROUP BY` clause to specify how the data should be grouped for the aggregate calculations.


In [None]:
# Add your code here

## Solutions

### 1. Identify regions in Africa

In [3]:
%%sql

SELECT *
FROM united_nations.Access_to_Basic_Services
WHERE Region LIKE "%Africa%"
LIMIT 5; -- Remove this line to see the full list

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_access_to_water,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Northern Africa and Western Asia,Northern Africa,Algeria,2015,92.0,85.0,39.543154,165.98,2381741.0,11.21
Northern Africa and Western Asia,Northern Africa,Algeria,2016,93.0,85.33,40.339329,160.03,2381741.0,10.2
Northern Africa and Western Asia,Northern Africa,Algeria,2017,93.0,84.67,41.136546,170.1,2381741.0,12.0
Northern Africa and Western Asia,Northern Africa,Algeria,2018,93.0,84.67,41.927007,174.91,2381741.0,
Northern Africa and Western Asia,Northern Africa,Algeria,2019,93.33,84.67,42.705368,171.77,2381741.0,


### 2. Classify SADC countries

In [5]:
%%sql

SELECT
	CASE 
		WHEN Country_name IN ('Angola', 'Botswana', 'Comoros', 'Democratic Republic of Congo', 'Eswatini', 
						 'Lesotho', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique', 'Namibia', 
						 'Seychelles', 'South Africa', 'United Republic Tanzania', 'Zambia', 'Zimbabwe')
			THEN 'SADC'
		ELSE 'Not Classified'
	END AS Regional_economic_community,
    Country_name,
    Pct_access_to_water
FROM united_nations.Access_to_Basic_Services
WHERE Region LIKE "%Africa%"
LIMIT 5; -- Remove this line to see the entire result set

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Regional_economic_community,Country_name,Pct_access_to_water
Not Classified,Algeria,92.0
Not Classified,Algeria,93.0
Not Classified,Algeria,93.0
Not Classified,Algeria,93.0
Not Classified,Algeria,93.33


### 3. Classify UMA and ECOWAS countries

In [6]:
%%sql

SELECT
	CASE 
		WHEN Country_name IN ('Angola', 'Botswana', 'Comoros', 'Democratic Republic of Congo', 'Eswatini', 
						 'Lesotho', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique', 'Namibia', 
						 'Seychelles', 'South Africa', 'United Republic Tanzania', 'Zambia', 'Zimbabwe')
			THEN 'SADC'
            
		WHEN Country_name IN ('Algeria', 'Libya', 'Mauritania', 'Morocco', 'Tunisia')
			THEN 'UMA'
            
        WHEN Country_name IN ('Benin', 'Burkina Faso', 'Cabo Verde', 'Cote d’Ivoire', 'Gambia', 'Ghana', 'Guinea', 
							'Guinea-Bissau', 'Liberia', 'Mali', 'Niger', 'Nigeria', 'Senegal', 'Sierra Leone', 'Togo')
			THEN 'ECOWAS'
            
		ELSE 'Not Classified'
	END AS Regional_economic_community,
	Country_name,
	Pct_access_to_water
FROM united_nations.Access_to_Basic_Services
WHERE Region LIKE "%Africa%"
LIMIT 5; -- Remove this line to see the entire result set

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Regional_economic_community,Country_name,Pct_access_to_water
UMA,Algeria,92.0
UMA,Algeria,93.0
UMA,Algeria,93.0
UMA,Algeria,93.0
UMA,Algeria,93.33


### 4. Calculate the minimum, average, and maximum percentages of managed drinking water services for each REC

We need to use the same `CASE` statement as in our `SELECT` clause to group the countries by their REC. This is because when we group, every non-aggregated column in the `SELECT` clause, such as our `CASE` statement determining the REC, must also be part of the `GROUP BY` clause to ensure data are grouped correctly. Additionally, we need to remove the `Country_name` column from the list of columns that will be returned as it is not an aggregated column, and it is not part of our `GROUP BY` clause.

In [9]:
%%sql

SELECT
	CASE 
		WHEN Country_name IN ('Angola', 'Botswana', 'Comoros', 'Democratic Republic of Congo', 'Eswatini', 
						 'Lesotho', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique', 'Namibia', 
						 'Seychelles', 'South Africa', 'United Republic Tanzania', 'Zambia', 'Zimbabwe')
			THEN 'SADC'
            
		WHEN Country_name IN ('Algeria', 'Libya', 'Mauritania', 'Morocco', 'Tunisia')
			THEN 'UMA'
            
        WHEN Country_name IN ('Benin', 'Burkina Faso', 'Cabo Verde', 'Cote d’Ivoire', 'Gambia', 'Ghana', 'Guinea', 
							'Guinea-Bissau', 'Liberia', 'Mali', 'Niger', 'Nigeria', 'Senegal', 'Sierra Leone', 'Togo')
			THEN 'ECOWAS'

        WHEN Country_name IN ('Burundi','Democratic Republic of the Congo (DRC)','Kenya','Rwanda','South Sudan',
                              'Tanzania','Uganda','Somalia')
            THEN 'EAC'
            
		ELSE 'Not Classified'
	END AS Regional_economic_community,
	MIN(Pct_access_to_water) as Min_pct_managed_drinking_water_services,
    AVG(Pct_access_to_water) as Avg_pct_managed_drinking_water_services,
    MAX(Pct_access_to_water) as Max_pct_managed_drinking_water_services
FROM united_nations.Access_to_Basic_Services
WHERE Region LIKE "%Africa%"
GROUP BY CASE 
			WHEN Country_name IN ('Angola', 'Botswana', 'Comoros', 'Democratic Republic of Congo', 'Eswatini', 
						 'Lesotho', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique', 'Namibia', 
						 'Seychelles', 'South Africa', 'United Republic Tanzania', 'Zambia', 'Zimbabwe')
			THEN 'SADC'
            
			WHEN Country_name IN ('Algeria', 'Libya', 'Mauritania', 'Morocco', 'Tunisia')
			THEN 'UMA'
            
			WHEN Country_name IN ('Benin', 'Burkina Faso', 'Cabo Verde', 'Cote d’Ivoire', 'Gambia', 'Ghana', 'Guinea', 
							'Guinea-Bissau', 'Liberia', 'Mali', 'Niger', 'Nigeria', 'Senegal', 'Sierra Leone', 'Togo')
			THEN 'ECOWAS'

            WHEN Country_name IN ('Burundi','Democratic Republic of the Congo (DRC)','Kenya','Rwanda','South Sudan',
                              'Tanzania','Uganda','Somalia')
            THEN 'EAC'
		ELSE 'Not Classified'
	END;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Regional_economic_community,Min_pct_managed_drinking_water_services,Avg_pct_managed_drinking_water_services,Max_pct_managed_drinking_water_services
EAC,46.33,59.87888929578993,70.33
ECOWAS,53.33,70.78928593226841,87.33
Not Classified,38.33,83.06698564593302,100.0
SADC,50.33,75.81304875815788,100.0
UMA,66.67,88.23300018310547,100.0


### Summary

We now have a summarised report by the regional economic community, showcasing the minimum, average, and maximum values of managed drinking water services for each.


#  
<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/refs/heads/master/ALX_banners/ALX_Navy.png"  style="width:100px"  ;/>
</div>