
### GroupBy

1. Compute the **total CO₂ emissions per continent** per year by joining `emissions` with `continents`.
2. Find the **top 3 entities by average annual emissions** within each continent.
3. For each continent, calculate the **percentage share of emissions** contributed by the top country vs. the rest for the year 2000.
4. Within each continent, identify the **first and last year** emissions were reported and the difference between them.
5. For each decade, compute the **mean, min, and max emissions** per continent.

---

### Custom `.apply`

6. Write a function that returns a row’s **emission growth rate vs. the previous year** and apply it within each entity group.
7. Within each continent group, apply a function that extracts the **entity with the largest single-year emission spike**.
8. Apply a custom function to compute a **rolling 5-year average emissions** for each entity.
9. Create a custom apply that flags whether an entity’s **emissions in a given year were above or below its long-term median**.
10. For each continent, apply a function that computes a **dictionary of summary stats** (`min`, `max`, `range`, `std`) across all entities.

---

### Melt & Pivot

11. Reshape the emissions dataset so that **years become columns** and emissions are values, with one row per entity.
12. Melt the reshaped table back to **long format** and ensure it matches the original dataset.
13. Pivot the data to show a **continent × year matrix**, with total emissions as values.
14. Create a pivot table showing **average emissions per continent per decade**, then melt it back into tidy form.
15. Make a wide table with **entities as rows and continents as columns**, where each value is that entity’s total emissions over all years.

---


In [4]:
import pandas as pd 

In [5]:
emissions = pd.read_csv('annual-co-emissions-by-region.csv')
continents = pd.read_csv('continents-according-to-our-world-in-data.csv')

In [6]:
emissions.shape
emissions.info()
emissions.describe()
emissions.columns
emissions.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29137 entries, 0 to 29136
Data columns (total 4 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Entity                29137 non-null  object 
 1   Code                  23497 non-null  object 
 2   Year                  29137 non-null  int64  
 3   Annual CO₂ emissions  29137 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 910.7+ KB


Unnamed: 0,Entity,Code,Year,Annual CO₂ emissions
0,Afghanistan,AFG,1949,14656.0
1,Afghanistan,AFG,1950,84272.0
2,Afghanistan,AFG,1951,91600.0
3,Afghanistan,AFG,1952,91600.0
4,Afghanistan,AFG,1953,106256.0


In [7]:
conts = ['Africa','Europe','Asia','North America', 'South America', 'Antarctica', 'Australia']

In [8]:
continents.info()
continents.describe()
continents.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272 entries, 0 to 271
Data columns (total 4 columns):
 #   Column                           Non-Null Count  Dtype 
---  ------                           --------------  ----- 
 0   Entity                           272 non-null    object
 1   Code                             257 non-null    object
 2   Year                             272 non-null    int64 
 3   World regions according to OWID  272 non-null    object
dtypes: int64(1), object(3)
memory usage: 8.6+ KB


Unnamed: 0,Entity,Code,Year,World regions according to OWID
0,Afghanistan,AFG,2023,Asia
1,Aland Islands,ALA,2023,Europe
2,Albania,ALB,2023,Europe
3,Algeria,DZA,2023,Africa
4,American Samoa,ASM,2023,Oceania


In [9]:
emissions = emissions.merge(continents, on='Code', how='left', suffixes=('_em', '_cont'))

In [10]:
emissions.rename(columns={'Annual CO₂ emissions':'annual_emissions', 'World regions according to OWID':'continent'}, inplace=True)


In [47]:
emissions

Unnamed: 0,Entity_em,Code,Year_em,annual_emissions,Entity_cont,Year_cont,continent
0,Afghanistan,AFG,1949,14656.0,Afghanistan,2023.0,Asia
1,Afghanistan,AFG,1950,84272.0,Afghanistan,2023.0,Asia
2,Afghanistan,AFG,1951,91600.0,Afghanistan,2023.0,Asia
3,Afghanistan,AFG,1952,91600.0,Afghanistan,2023.0,Asia
4,Afghanistan,AFG,1953,106256.0,Afghanistan,2023.0,Asia
...,...,...,...,...,...,...,...
108092,Zimbabwe,ZWE,2019,10262950.0,Zimbabwe,2023.0,Africa
108093,Zimbabwe,ZWE,2020,8494503.0,Zimbabwe,2023.0,Africa
108094,Zimbabwe,ZWE,2021,10203630.0,Zimbabwe,2023.0,Africa
108095,Zimbabwe,ZWE,2022,10424940.0,Zimbabwe,2023.0,Africa


In [12]:
emissions.groupby(['continent'])['annual_emissions'].sum()

continent
Africa           1.711756e+13
Asia             1.766659e+13
Europe           7.734052e+13
North America    9.028917e+12
Oceania          2.215523e+10
South America    8.578828e+12
Name: annual_emissions, dtype: float64

In [51]:
avg_emissions = emissions.groupby(['continent', 'Entity_em'])['annual_emissions'].mean().reset_index()
avg_emissions

Unnamed: 0,continent,Entity_em,annual_emissions
0,Africa,Africa,3.756870e+08
1,Africa,Africa (GCP),3.022745e+08
2,Africa,Algeria,4.873787e+07
3,Africa,Angola,9.666763e+06
4,Africa,Asia,2.195719e+09
...,...,...,...
360,South America,South America (GCP),2.663666e+08
361,South America,Suriname,1.693519e+06
362,South America,Upper-middle-income countries,2.899448e+09
363,South America,Uruguay,2.791476e+06


In [52]:
top_3_per_continent = avg_emissions.groupby('continent').apply(
    lambda x: x.nlargest(3, 'annual_emissions')
).reset_index(drop=True)

top_3_per_continent

  top_3_per_continent = avg_emissions.groupby('continent').apply(


Unnamed: 0,continent,Entity_em,annual_emissions
0,Africa,OECD (GCP),5631716000.0
1,Africa,Non-OECD (GCP),4499258000.0
2,Africa,High-income countries,4161000000.0
3,Asia,OECD (GCP),5631716000.0
4,Asia,Non-OECD (GCP),4499258000.0
5,Asia,High-income countries,4161000000.0
6,Europe,OECD (GCP),5631716000.0
7,Europe,Non-OECD (GCP),4499258000.0
8,Europe,High-income countries,4161000000.0
9,North America,OECD (GCP),5631716000.0
