# Climate Change and Impacts in Africa

According to the [United Nations](https://www.un.org/en/climatechange/what-is-climate-change), Climate change refers to long-term shifts in temperatures and weather patterns. Such shifts can be natural, due to changes in the sun’s activity or large volcanic eruptions. But since the 1800s, **human activities** have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil, and gas.

The consequences of climate change now include, among others, intense droughts, water scarcity, severe fires, rising sea levels, flooding, melting polar ice, catastrophic storms, and declining biodiversity.

You work for a Non-governmental organization tasked with reporting the state of climate change in Africa at the upcoming African Union Summit. The head of analytics has provided you with [IEA-EDGAR CO2 dataset](https://docs.google.com/spreadsheets/d/1cNhVUPKYP79AayGJp89_tXCJmHoxQO4cwiaseSziwbY/edit#gid=191680117) which you will clean, combine and analyze to create a report on the state of climate change in Africa. You will also provide insights on the impact of climate change on African regions (with four countries, one from each African region, as case studies). 

## Dataset

*The dataset, IEA-EDGAR CO2, is a component of the EDGAR (Emissions Database for Global Atmospheric Research) Community GHG database version 7.0 (2022) including or based on data from IEA (2021) Greenhouse Gas Emissions from Energy, www.iea.org/statistics, as modified by the Joint Research Centre. The data source was the [EDGARv7.0_GHG website](https://edgar.jrc.ec.europa.eu/dataset_ghg70) provided by Crippa *et. al.* (2022) and with [DOI](https://data.europa.eu/doi/10.2904/JRC_DATASET_EDGAR).*

The dataset contains three sheets - `IPCC 2006`, `1PCC 1996`, and `TOTALS BY COUNTRY` on the amount of CO2 (a greenhouse gas) generated by countries between 1970 and 2021. **You can download the dataset from your workspace or inspect the dataset directly [here](https://docs.google.com/spreadsheets/d/1cNhVUPKYP79AayGJp89_tXCJmHoxQO4cwiaseSziwbY/edit#gid=191680117)**.

### TOTALS BY COUNTRY SHEET

This sheet contains the annual CO2 (kt) produced between 1970 - 2021 in each country. The relevant columns in this sheet are:

| Columns | Description |
| ------- | ------------|
| `C_group_IM24_sh` | The region of the world |
| `Country_code_A3` | The country code |
| `Name`            | The name of the country |
| `Y_1970 - Y_2021` | The amount of CO2 (kt) from 1970 - 2021 |


### IPCC 2006

These sheets contain the amount of CO2 by country and the industry responsible. 

| Columns | Description |
| ------- | ------------|
| `C_group_IM24_sh` | The region of the world |
| `Country_code_A3` | The country code |
| `Name`            | The name of the country |
| `Y_1970 - Y_2021` | The amount of CO2 (kt) from 1970 - 2021 |
| `ipcc_code_2006_for_standard_report_name` | The industry responsible for generating CO2 |

## Instructions

The head of analytics in your organization has specifically asked you to do the following:


1. Clean and tidy the datasets. 
2. Create a line plot to show the trend of `CO2` levels across the African regions.
3. Determine the relationship between time (`Year`) and `CO2` levels across the African regions.
4. Determine if there is a significant difference in the `CO2` levels among the African Regions.
5. Determine the most common (top 5) industries in each African region.
6. Determine the industry responsible for the most amount of CO2 (on average) in each African Region.
7. Predict the `CO2` levels (at each African region) in the year 2025.
8. Determine if `CO2` levels affect annual `temperature` in the selected African countries.

In [1]:
# Setup
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Add other packages you need

# The sheet names containing our datasets
sheet_names = ['IPCC 2006', 'TOTALS BY COUNTRY']
# The column names of the dataset starts from rows 11
# Let's skip the first 10 rows
datasets = pd.read_excel('IEA_EDGAR_CO2_1970-2021.xlsx', sheet_name = sheet_names, skiprows = 10)
ipcc_2006 = datasets['IPCC 2006']
totals_by_country = datasets['TOTALS BY COUNTRY']

# Read the temperatures datasets containing four African countries
# One from each African Region:
# Nigeria:    West Africa
# Ethiopa :   East Africa
# Tunisia:    North Africa
# Mozambique: South Africa
temperatures = pd.read_csv('temperatures.csv')

In [2]:
# we need only the African regions
# These are the datasets (including temperatures) you'll use
african_regions = ['Western_Africa', 'Southern_Africa', 'Northern_Africa', 'Middle_East']
ipcc_2006_africa = ipcc_2006[ipcc_2006.C_group_IM24_sh.isin(african_regions)]
totals_by_country_africa = totals_by_country[totals_by_country.C_group_IM24_sh.isin(african_regions)]

## Instruction 1: Clean and tidy the datasets

### Tasks

- Rename `C_group_IM24_sh` to `Region`, `Country_code_A3` to `Code`, and `ipcc_code_2006_for_standard_report_name` to `Industry` in the corresponding african datasets.
- Melt `Y_1970` to `Y_2021` into a two columns `Year` and `CO2`. Drop rows where `CO2` is missing.
- Drop `IPCC_annex`, `ipcc_code_2006_for_standard_report`, and `Substance` from the corresponding datasets.
- Convert `Year` to `int` type.

### Hints

- Use `df.rename()` method to rename columns.
- You might find `pd.melt()` useful.
- You might also find `df.drop()` useful.
- Use `df.column.astype(type)` to convert from one column type to another 

In [None]:
# Your code here (for the learner)



In [5]:
# DO NOT MODIFY THIS CELL

# Run this cell to determine if you've done the above correctly
# If there are no error messages, you are correct :)

# Check if the columns have been renamed
# And if the year columns have been melt into Year and CO2
test_cols_1 = [c.lower() for c in ipcc_2006_africa.columns]
test_cols_2 = [c.lower() for c in totals_by_country_africa.columns]

test_cols_1.sort()
test_cols_2.sort()

assert test_cols_1 == ['co2', 'code', 'fossil_bio', 'industry', 'name', 'region', 'year'],\
    "Have you renamed 'ipcc_2006_africa' columns?"

assert test_cols_2 == ['co2', 'code', 'name', 'region', 'year'], "Have you renamed 'totals_by_country_africa' columns?"


# Check if columns were dropped
assert len(ipcc_2006_africa.columns) == 7,\
    "Have you dropped `IPCC_annex`, `ipcc_code_2006_for_standard_report`, and `Substance` from ipcc_2006_africa dataset?"

assert len(totals_by_country_africa.columns) == 5,\
    "Have you dropped `IPCC_annex`, and `Substance` from totals_by_country_africa dataset?"


# Check that rows with missing CO2 have been dropped
assert ipcc_2006_africa[ipcc_2006_africa.CO2.isnull()].size == 0, "Did you drop the rows with missing CO2?"
assert totals_by_country_africa[totals_by_country_africa.CO2.isnull()].size == 0, "Did you drop the rows with missing CO2?"

# Check that Year column is an integer
assert ipcc_2006_africa.Year.dtype == int, "Have you converted the Year column to an integer?"
assert totals_by_country_africa.Year.dtype == int, "Have you converted the Year column to an integer?"

In [4]:
# Instruction 1 Solution
# NOTE: THE SOLUTION CODE WON'T BE IN THE SAME NOTEBOOK (OF A LEARNER). 
# WE'LL PLACE IT IN A SEPARATE solutions.py file BEFORE PUBLISHING!

# rename columns
ipcc_2006_africa.rename(columns={'C_group_IM24_sh': 'Region', 'Country_code_A3': 'Code',
                         'ipcc_code_2006_for_standard_report_name': 'Industry'}, inplace=True)

totals_by_country_africa.rename(columns={'C_group_IM24_sh': 'Region', 'Country_code_A3': 'Code'}, inplace=True)

# drop columns
ipcc_2006_africa.drop(['IPCC_annex', 'ipcc_code_2006_for_standard_report', 'Substance'], axis=1, inplace=True)
totals_by_country_africa.drop(['IPCC_annex', 'Substance'], axis=1, inplace=True)


# Melt and clean Year column
def melt_clean(df):
    value_vars = list(filter(lambda x: x.startswith('Y_'), df.columns))
    id_vars = list(set(df.columns).difference(value_vars))
    
    # melt
    long = pd.melt(df, id_vars=id_vars, value_vars=value_vars, var_name='Year', value_name='CO2')
    
    # drop rows where co2 is missing
    long = long[~long.CO2.isnull()]
    
    # convert year to integer
    long.Year = long.Year.str.replace('Y_', '').astype(int)
    
    return long


ipcc_2006_africa = melt_clean(ipcc_2006_africa)
totals_by_country_africa = melt_clean(totals_by_country_africa)

## Instruction 2: Create a line plot to show the trend of `CO2` levels across the African regions

### Tasks

In [None]:
# Your code here

In [None]:
# tests

In [None]:
# Solutions

## Instruction 3: Determine the relationship between time (`Year`) and `CO2` levels across the African regions

### Tasks

In [None]:
# Your code here

In [None]:
# tests

In [None]:
# Solutions

## Instruction 4: Determine if there is a significant difference in the `CO2` levels among the African Regions

### Tasks

In [None]:
# Your code here

In [None]:
# tests

In [None]:
# Solutions

## Instruction 5: Determine the most common (top 5) industries in each African region.

### Tasks

In [None]:
# Your code here

In [None]:
# tests

In [None]:
# Solutions

## Instruction 6: Determine the industry responsible for the most amount of CO2 (on average) in each African Region

### Tasks

In [None]:
# Your code here

In [None]:
# tests

In [None]:
# Solutions

## Instruction 7: Predict the `CO2` levels (at each African region) in the year 2025

### Tasks

In [None]:
# Your code here

In [None]:
# tests

In [None]:
# Solutions

## Instruction 8: Determine if `CO2` levels affect annual `temperature` in the selected African countries

### Tasks

In [None]:
# Your code here

In [None]:
# tests

In [None]:
# Solutions