# Group Project part 01

#### Deadline for the code submission: October 10th at 08:59 am CET

#### Reminder
- your group is the one assigned to you by the University.
- one goal of this project is to learn how to work as a group, which is the standard in the tech industry. Therefore you need to resolve group issues on your own, as a group.
- if you did not manage to resolve the group issues on your own, you need to escalate to the teacher early, not last minute.
- if the group splits, it would result in a 0 for the whole group.

**Penalty for unexcused absence or lateness**:
- If you are absent or late on presentation day without an official excuse, you will receive 0 for the presentation part of the group project.
- If you are late without an official excuse and can still make it to the presentation of your team, you will still receive 0 for the presentation part of the group project.

## Objective
In this project, you utilise your skills to :
- collect data through multiple APIs and open source datasets, for both quantitative and qualitative data
- merge data from different sources
- describe and analyse datasets
- uncover patterns, insights
- calculate aggregated measures, statistics
- create compelling data visualisations
- write clean code
- tell a story and convince your audience

Each group can pick one and one only scenario among the following ones.

Be mindful to pick a topic that enables enough data collection and analysis in order to showcase all the skills gathered during the course, listed above.

### Scenario 01: Become a Business Manager

Your task is to design a local business that leverages data from various APIs to make informed, strategic decisions. Whether you're launching a street food stand, a drink shop, or another local venture, your team will gather and analyze relevant data —such as foot traffic, weather patterns, customer trends, or competitor insights— to shape your business plan. Your final deliverable will be a data-supported report and/or presentation to a management board, demonstrating how your findings guide key decisions in operations, marketing, or product offerings. The ultimate goal: to optimize performance and increase the chances of business success. Will your business thrive in today’s data-driven world?
Examples:
- lemonade stands business
- food truck business
- delivery service

### Scenario 02: Fact Check Popular beliefs

You are part of a fact-checking research team investigating common beliefs, trending opinions, or viral social media claims (e.g. “drinking lemon water boosts metabolism” or “blue light ruins your sleep”). Your goal is to dig into reliable sources, data, and expert opinions to determine whether these beliefs hold up under scrutiny. Use data to challenge or prove real-world claims with clear, persuasive insights. Drawing on research, statistics, and visual evidence, your team will present a well-supported explanation to help your audience separate fact from fiction.

You may also choose to divide the group into two sides—one defending the belief and the other challenging it—before presenting your findings in a debate or side-by-side analysis.

Examples:
- Electric cars are always better for the environment
- Areas with more green space have better physical and mental health outcomes.
- Does public sentiment on social media predict stock market trends?

## 01 - Getting Ready: first questions

Depending on the scenario you picked, please consider the following questions to help you get started.

### Scenario 01: Become a Business Manager

   - What kind of business do we run? What do we sell ? The choice of the business must be original and unique to your group.
   - How do we name our business?
   - When do we operate? Is it an all-year-round business or a seasonal one? If so, which seasons? Which months / weeks / days / hours of the day do we operate?
   - Where do we operate? In which countries / cities are we currently active ? Where do we want to develop in the future ? Determine where to set up your business stand based on weather conditions, local attractions, or events.
   - Which datasets will assist us in making our business the most successful?

### Scenario 02: Fact-Check a popular belief

- What specific belief or claim do you want to investigate ?
- Why is this belief important or worth fact-checking ?
- What evidence or data supports or contradicts the belief ?
- Will you split the team into two group (in favor / against) ?
- What real-world impact does this belief have on people ?
- What are the consequences if people continue believing or acting on this (true or false) idea ?

## 02 - Collect data from multiple APIs, the more the merrier

Integrate with as many APIs as you can e.g.:
- OpenWeatherMap API
- Google Maps,
- TripAdvisor,
- News API,
- Yelp,
- Wikipedia,
- Booking,
- Amadeus Travel API,
- Foursquare,
- etc. (make your own research and be original!)

Each API can provide different types of information. Pick the ones that best suit your scenario.

After collecting all the data you need, save them.

•⁠  ⁠What specific belief or claim do you want to investigate ?

We aim to investigate the correlation between government funds spend on combatting climate change and the actual effect of these funds measured in the CO2 emission of countries over time.

•⁠  ⁠Why is this belief important or worth fact-checking ?

This belief matters, because a significant amount of public funding is going toward climate change initiatives. Understanding whether these initiatives are truly effective helps ensure accountability, smart use of resources, and real environmental progress. Fact-checking this belief can also shape how policies are developed and whether the public supports future climate budgets.


•⁠  ⁠What evidence or data supports or contradicts the belief ?
Data from the OECD and the World Bank show a mixed picture. In some countries, higher government spending on climate programs has led to lower emissions and more renewable energy use. But in others, similar investments haven’t made much difference. This suggests that money alone isn’t always enough — how it’s used matters. Because of this, it’s too early to take a clear stance, and both datasets are needed to understand the full story.


•⁠  ⁠Will you split the team into two group (in favor / against) ?

Yes, this topic is so complex that we will have Pro and Contra arguments in our final product.

•⁠  ⁠What real-world impact does this belief have on people ?

This belief shapes how people see the government's role in fighting climate change and most importantly, how willing they are to support it through taxes or public programs. It also affects things like job opportunities in clean energy, the cost of electricity, and how quickly we can move toward a low-carbon future.

•⁠  ⁠What are the consequences if people continue believing or acting on this (true or false) idea ?

True: Ongoing funding could speed up climate action, protect people’s health, and help build a stronger, more sustainable economy.

False: Money could be wasted, public trust could take a hit, and real progress on climate solutions might be delayed—leading to even greater environmental and economic costs down the line.

In [3]:
import wbgapi as wb
import pandas as pd

energy_indicators = [
    "EG.EGY.PRIM.PP.KD", "EG.ELC.ACCS.RU.ZS", "EG.ELC.ACCS.UR.ZS", "EG.ELC.ACCS.ZS",
    "EG.ELC.COAL.ZS", "EG.ELC.FOSL.ZS", "EG.ELC.HYRO.ZS", "EG.ELC.LOSS.ZS",
    "EG.ELC.NGAS.ZS", "EG.ELC.NUCL.ZS", "EG.ELC.PETR.ZS", "EG.ELC.RNEW.ZS",
    "EG.ELC.RNWX.KH", "EG.ELC.RNWX.ZS", "EG.FEC.RNEW.ZS", "EG.GDP.PUSE.KO.PP",
    "EG.GDP.PUSE.KO.PP.KD", "EG.IMP.CONS.ZS", "EG.USE.COMM.CL.ZS", "EG.USE.COMM.FO.ZS",
    "EG.USE.COMM.GD.PP.KD", "EG.USE.CRNW.ZS", "EG.USE.ELEC.KH.PC", "EG.USE.PCAP.KG.OE"
]

worldb_energy_df = wb.data.DataFrame(
        energy_indicators,
        time=range(2000, 2020),
        skipBlanks=True,
        labels=True,
        columns='series'
    ).reset_index()

worldb_energy_df.head(50)

ModuleNotFoundError: No module named 'wbgapi'

In [None]:
envghg_indicators = [
    "EN.CLC.DRSK.XQ", "EN.CLC.MDAT.ZS",
    "EN.GHG.ALL.LU.MT.CE.AR5", "EN.GHG.ALL.MT.CE.AR5", "EN.GHG.ALL.PC.CE.AR5", "EN.GHG.CH4.AG.MT.CE.AR5",
    "EN.GHG.CO2.AG.MT.CE.AR5", "EN.GHG.CO2.BU.MT.CE.AR5", "EN.GHG.CO2.FE.MT.CE.AR5",
    "EN.GHG.CO2.IC.MT.CE.AR5", "EN.GHG.CO2.IP.MT.CE.AR5", "EN.GHG.CO2.LU.DF.MT.CE.AR5", "EN.GHG.CO2.LU.FL.MT.CE.AR5",
    "EN.GHG.CO2.LU.MT.CE.AR5", "EN.GHG.CO2.LU.OL.MT.CE.AR5", "EN.GHG.CO2.LU.OS.MT.CE.AR5", "EN.GHG.CO2.MT.CE.AR5",
    "EN.GHG.CO2.PC.CE.AR5", "EN.GHG.CO2.PI.MT.CE.AR5", "EN.GHG.CO2.RT.GDP.KD", "EN.GHG.CO2.RT.GDP.PP.KD",
    "EN.GHG.CO2.TR.MT.CE.AR5", "EN.GHG.CO2.WA.MT.CE.AR5", "EN.GHG.CO2.ZG.AR5", "EN.GHG.FGAS.IP.MT.CE.AR5",
    "EN.GHG.TOT.ZG.AR5"
]

worldb_envghg_df = wb.data.DataFrame(
        envghg_indicators,
        time=range(2000, 2020),
        skipBlanks=True,
        labels=True,
        columns='series'
    ).reset_index()

worldb_envghg_df.head(50)

In [None]:
envfin_indicators = [
    "NY.GDP.MKTP.CD",
    "NY.GDP.MKTP.CN",
    "NY.GDP.MKTP.KD.ZG",
    "SP.POP.TOTL",
    "NY.ADJ.DCO2.CD",
    "NY.ADJ.DCO2.GN.ZS",
    "NY.ADJ.DPEM.CD",
    "NY.ADJ.DPEM.GN.ZS",
    "DT.NFL.UNEP.CD"
]

worldb_envfin_df = wb.data.DataFrame(
        envfin_indicators,
        time=range(2000, 2020),
        skipBlanks=True,
        labels=True,
        columns='series'
    ).reset_index()

worldb_envfin_df.head(50)

In [2]:
import pandas as pd
import os
os.getcwd()
path = '/Users/CedricESMT/ESMT/Term 1/Data wrangling/data_wrangling_team_pluto'

worldb_energy_df = pd.read_csv(path + '/data/worldb_energy.csv')
worldb_envfin_df = pd.read_csv(path + '/data/worldb_envfin.csv')
worldb_envghg_df = pd.read_csv(path + '/data/worldb_envghg.csv')


In [3]:
worldb_energy_df['economy_Time'] = worldb_energy_df['economy'].astype(str) + '_' + worldb_energy_df['Time'].astype(str)

cols = ['economy_Time'] + [col for col in worldb_energy_df.columns if col != 'economy_Time']
worldb_energy_df = worldb_energy_df[cols]

worldb_energy_df.head()

worldb_envfin_df['economy_Time'] = worldb_envfin_df['economy'].astype(str) + '_' + worldb_envfin_df['Time'].astype(str)

cols = ['economy_Time'] + [col for col in worldb_envfin_df.columns if col != 'economy_Time']
worldb_envfin_df = worldb_envfin_df[cols]

worldb_envfin_df.head()

worldb_envghg_df['economy_Time'] = worldb_envghg_df['economy'].astype(str) + '_' + worldb_envghg_df['Time'].astype(str)

cols = ['economy_Time'] + [col for col in worldb_envghg_df.columns if col != 'economy_Time']
worldb_envghg_df = worldb_envghg_df[cols]

worldb_envghg_df.head()

Unnamed: 0.1,economy_Time,Unnamed: 0,economy,time,Country,Time,EN.CLC.DRSK.XQ,EN.CLC.MDAT.ZS,EN.GHG.ALL.LU.MT.CE.AR5,EN.GHG.ALL.MT.CE.AR5,...,EN.GHG.CO2.MT.CE.AR5,EN.GHG.CO2.PC.CE.AR5,EN.GHG.CO2.PI.MT.CE.AR5,EN.GHG.CO2.RT.GDP.KD,EN.GHG.CO2.RT.GDP.PP.KD,EN.GHG.CO2.TR.MT.CE.AR5,EN.GHG.CO2.WA.MT.CE.AR5,EN.GHG.CO2.ZG.AR5,EN.GHG.FGAS.IP.MT.CE.AR5,EN.GHG.TOT.ZG.AR5
0,ZMB_2011,0,ZMB,YR2011,Zambia,2011,3.75,,-19.8048,18.7092,...,3.2873,0.227687,0.0349,0.18841,0.068259,1.3439,,8.563408,0.0001,20.846413
1,YEM_2011,1,YEM,YR2011,"Yemen, Rep.",2011,2.25,,45.4791,47.3661,...,23.0001,0.833854,5.2967,0.418006,,6.657,,223.612342,2.1099,170.57839
2,VEN_2011,2,VEN,YR2011,"Venezuela, RB",2011,2.75,,-41.5935,257.1569,...,172.9572,5.926193,26.3558,,,50.8393,0.0153,68.609273,4.0841,57.45219
3,VUT_2011,3,VUT,YR2011,Vanuatu,2011,2.0,,-6.2252,0.7118,...,0.1796,0.737656,0.0407,0.255844,0.217536,0.1006,,75.219512,,50.677392
4,USA_2011,4,USA,YR2011,United States,2011,3.5,,5507.7404,6509.5922,...,5307.9147,17.021305,2245.8501,0.319856,0.282532,1633.5907,0.0033,6.503141,187.8107,4.836258


In [9]:
worldb_df_merge_one = pd.merge(worldb_energy_df, worldb_envfin_df, on="economy_Time", how="left")
worldb_df_merged = pd.merge(worldb_df_merge_one, worldb_envghg_df, on="economy_Time", how="inner")

worldb_df_merged.shape

(5160, 75)

In [15]:
worldb_df_final = worldb_df_merged.drop(columns=['Unnamed: 0_x','economy_x', 'time_x', 'Country_x', 'Time_x', 'economy_y', 'time_y', 'Country_y', 'Time_y', 'economy', 'time', 'Country', 'Time'], axis=1)
worldb_df_final.shape

(5160, 62)

In [16]:
column_names = worldb_df_final.columns.tolist()
print(column_names)

['economy_Time', 'EG.EGY.PRIM.PP.KD', 'EG.ELC.ACCS.RU.ZS', 'EG.ELC.ACCS.UR.ZS', 'EG.ELC.ACCS.ZS', 'EG.ELC.COAL.ZS', 'EG.ELC.FOSL.ZS', 'EG.ELC.HYRO.ZS', 'EG.ELC.LOSS.ZS', 'EG.ELC.NGAS.ZS', 'EG.ELC.NUCL.ZS', 'EG.ELC.PETR.ZS', 'EG.ELC.RNEW.ZS', 'EG.ELC.RNWX.KH', 'EG.ELC.RNWX.ZS', 'EG.FEC.RNEW.ZS', 'EG.GDP.PUSE.KO.PP', 'EG.GDP.PUSE.KO.PP.KD', 'EG.IMP.CONS.ZS', 'EG.USE.COMM.CL.ZS', 'EG.USE.COMM.FO.ZS', 'EG.USE.COMM.GD.PP.KD', 'EG.USE.CRNW.ZS', 'EG.USE.ELEC.KH.PC', 'EG.USE.PCAP.KG.OE', 'Unnamed: 0_y', 'DT.NFL.UNEP.CD', 'NY.ADJ.DCO2.CD', 'NY.ADJ.DCO2.GN.ZS', 'NY.ADJ.DPEM.CD', 'NY.ADJ.DPEM.GN.ZS', 'NY.GDP.MKTP.CD', 'NY.GDP.MKTP.CN', 'NY.GDP.MKTP.KD.ZG', 'SP.POP.TOTL', 'Unnamed: 0', 'EN.CLC.DRSK.XQ', 'EN.CLC.MDAT.ZS', 'EN.GHG.ALL.LU.MT.CE.AR5', 'EN.GHG.ALL.MT.CE.AR5', 'EN.GHG.ALL.PC.CE.AR5', 'EN.GHG.CH4.AG.MT.CE.AR5', 'EN.GHG.CO2.AG.MT.CE.AR5', 'EN.GHG.CO2.BU.MT.CE.AR5', 'EN.GHG.CO2.FE.MT.CE.AR5', 'EN.GHG.CO2.IC.MT.CE.AR5', 'EN.GHG.CO2.IP.MT.CE.AR5', 'EN.GHG.CO2.LU.DF.MT.CE.AR5', 'EN.GHG.CO2.LU

In [17]:
worldb_data = worldb_df_final.rename(columns={
    'EG.EGY.PRIM.PP.KD': 'Energy intensity level of primary energy (MJ/$2017 PPP GDP)',
    'EG.ELC.ACCS.RU.ZS': 'Access to electricity, rural (% of rural population)',
    'EG.ELC.ACCS.UR.ZS': 'Access to electricity, urban (% of urban population)',
    'EG.ELC.ACCS.ZS': 'Access to electricity (% of population)',
    'EG.ELC.COAL.ZS': 'Electricity production from coal sources (% of total)',
    'EG.ELC.FOSL.ZS': 'Electricity production from oil, gas and coal sources (% of total)',
    'EG.ELC.HYRO.ZS': 'Electricity production from hydroelectric sources (% of total)',
    'EG.ELC.LOSS.ZS': 'Electric power transmission and distribution losses (% of output)',
    'EG.ELC.NGAS.ZS': 'Electricity production from natural gas sources (% of total)',
    'EG.ELC.NUCL.ZS': 'Electricity production from nuclear sources (% of total)',
    'EG.ELC.PETR.ZS': 'Electricity production from oil sources (% of total)',
    'EG.ELC.RNEW.ZS': 'Renewable electricity output (% of total electricity output)',
    'EG.ELC.RNWX.KH': 'Electricity production from renewable sources, excluding hydroelectric (kWh)',
    'EG.ELC.RNWX.ZS': 'Electricity production from renewable sources, excluding hydroelectric (% of total)',
    'EG.FEC.RNEW.ZS': 'Renewable energy consumption (% of total final energy consumption)',
    'EG.GDP.PUSE.KO.PP': 'GDP per unit of energy use (PPP $ per kg of oil equivalent)',
    'EG.GDP.PUSE.KO.PP.KD': 'GDP per unit of energy use (constant 2021 PPP $ per kg of oil equivalent)',
    'EG.IMP.CONS.ZS': 'Energy imports, net (% of energy use)',
    'EG.USE.COMM.CL.ZS': 'Alternative and nuclear energy (% of total energy use)',
    'EG.USE.COMM.FO.ZS': 'Fossil fuel energy consumption (% of total)',
    'EG.USE.COMM.GD.PP.KD': 'Energy use (kg of oil equivalent) per $1,000 GDP (constant 2021 PPP)',
    'EG.USE.CRNW.ZS': 'Combustible renewables and waste (% of total energy)',
    'EG.USE.ELEC.KH.PC': 'Electric power consumption (kWh per capita)',
    'EG.USE.PCAP.KG.OE': 'Energy use (kg of oil equivalent per capita)',
    'DT.NFL.UNEP.CD': 'Net official flows from UN agencies, UNEP (current US$)',
    'NY.ADJ.DCO2.CD': 'Adjusted savings: carbon dioxide damage (current US$)',
    'NY.ADJ.DCO2.GN.ZS': 'Adjusted savings: carbon dioxide damage (% of GNI)',
    'NY.ADJ.DPEM.CD': 'Adjusted savings: particulate emission damage (current US$)',
    'NY.ADJ.DPEM.GN.ZS': 'Adjusted savings: particulate emission damage (% of GNI)',
    'NY.GDP.MKTP.CD': 'GDP (current US$)',
    'NY.GDP.MKTP.CN': 'GDP (current LCU)',
    'NY.GDP.MKTP.KD.ZG': 'GDP growth (annual %)',
    'SP.POP.TOTL': 'Population, total',
    'EN.CLC.DRSK.XQ': 'Disaster risk reduction progress score (1-5 scale; 5=best)',
    'EN.CLC.MDAT.ZS': 'Droughts, floods, extreme temperatures (% of population, average 1990-2009)',
    'EN.GHG.ALL.LU.MT.CE.AR5': 'Total greenhouse gas emissions including LULUCF (Mt CO2e)',
    'EN.GHG.ALL.MT.CE.AR5': 'Total greenhouse gas emissions excluding LULUCF (Mt CO2e)',
    'EN.GHG.ALL.PC.CE.AR5': 'Total greenhouse gas emissions excluding LULUCF per capita (t CO2e/capita)',
    'EN.GHG.CH4.AG.MT.CE.AR5': 'Methane (CH4) emissions from Agriculture (Mt CO2e)',
    'EN.GHG.CO2.AG.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Agriculture (Mt CO2e)',
    'EN.GHG.CO2.BU.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Building (Energy) (Mt CO2e)',
    'EN.GHG.CO2.FE.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Fugitive Emissions (Energy) (Mt CO2e)',
    'EN.GHG.CO2.IC.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Industrial Combustion (Energy) (Mt CO2e)',
    'EN.GHG.CO2.IP.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Industrial Processes (Mt CO2e)',
    'EN.GHG.CO2.LU.DF.MT.CE.AR5': 'Carbon dioxide (CO2) net fluxes from LULUCF - Deforestation (Mt CO2e)',
    'EN.GHG.CO2.LU.FL.MT.CE.AR5': 'Carbon dioxide (CO2) net fluxes from LULUCF - Forest Land (Mt CO2e)',
    'EN.GHG.CO2.LU.MT.CE.AR5': 'Carbon dioxide (CO2) net fluxes from LULUCF - Total excluding non-tropical fires (Mt CO2e)',
    'EN.GHG.CO2.LU.OL.MT.CE.AR5': 'Carbon dioxide (CO2) net fluxes from LULUCF - Other Land (Mt CO2e)',
    'EN.GHG.CO2.LU.OS.MT.CE.AR5': 'Carbon dioxide (CO2) net fluxes from LULUCF - Organic Soil (Mt CO2e)',
    'EN.GHG.CO2.MT.CE.AR5': 'Carbon dioxide (CO2) emissions (total) excluding LULUCF (Mt CO2e)',
    'EN.GHG.CO2.PC.CE.AR5': 'Carbon dioxide (CO2) emissions excluding LULUCF per capita (t CO2e/capita)',
    'EN.GHG.CO2.PI.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Power Industry (Energy) (Mt CO2e)',
    'EN.GHG.CO2.RT.GDP.KD': 'Carbon intensity of GDP (kg CO2e per constant 2015 US$ of GDP)',
    'EN.GHG.CO2.RT.GDP.PP.KD': 'Carbon intensity of GDP (kg CO2e per 2021 PPP $ of GDP)',
    'EN.GHG.CO2.TR.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Transport (Energy) (Mt CO2e)',
    'EN.GHG.CO2.WA.MT.CE.AR5': 'Carbon dioxide (CO2) emissions from Waste (Mt CO2e)',
    'EN.GHG.CO2.ZG.AR5': 'Carbon dioxide (CO2) emissions (total) excluding LULUCF (% change from 1990)',
    'EN.GHG.FGAS.IP.MT.CE.AR5': 'F-gases emissions from Industrial Processes (Mt CO2e)',
    'EN.GHG.TOT.ZG.AR5': 'Total greenhouse gas emissions excluding LULUCF (% change from 1990)'
})

column_names = worldb_df.columns.tolist()
print(column_names)

['economy_Time', 'Unnamed: 0_x', 'EG.EGY.PRIM.PP.KD', 'EG.ELC.ACCS.RU.ZS', 'EG.ELC.ACCS.UR.ZS', 'EG.ELC.ACCS.ZS', 'EG.ELC.COAL.ZS', 'EG.ELC.FOSL.ZS', 'EG.ELC.HYRO.ZS', 'EG.ELC.LOSS.ZS', 'EG.ELC.NGAS.ZS', 'EG.ELC.NUCL.ZS', 'EG.ELC.PETR.ZS', 'EG.ELC.RNEW.ZS', 'EG.ELC.RNWX.KH', 'EG.ELC.RNWX.ZS', 'EG.FEC.RNEW.ZS', 'EG.GDP.PUSE.KO.PP', 'EG.GDP.PUSE.KO.PP.KD', 'EG.IMP.CONS.ZS', 'EG.USE.COMM.CL.ZS', 'EG.USE.COMM.FO.ZS', 'EG.USE.COMM.GD.PP.KD', 'EG.USE.CRNW.ZS', 'EG.USE.ELEC.KH.PC', 'EG.USE.PCAP.KG.OE', 'Unnamed: 0_y', 'DT.NFL.UNEP.CD', 'NY.ADJ.DCO2.CD', 'NY.ADJ.DCO2.GN.ZS', 'NY.ADJ.DPEM.CD', 'NY.ADJ.DPEM.GN.ZS', 'NY.GDP.MKTP.CD', 'NY.GDP.MKTP.CN', 'NY.GDP.MKTP.KD.ZG', 'SP.POP.TOTL', 'Unnamed: 0', 'EN.CLC.DRSK.XQ', 'EN.CLC.MDAT.ZS', 'EN.GHG.ALL.LU.MT.CE.AR5', 'EN.GHG.ALL.MT.CE.AR5', 'EN.GHG.ALL.PC.CE.AR5', 'EN.GHG.CH4.AG.MT.CE.AR5', 'EN.GHG.CO2.AG.MT.CE.AR5', 'EN.GHG.CO2.BU.MT.CE.AR5', 'EN.GHG.CO2.FE.MT.CE.AR5', 'EN.GHG.CO2.IC.MT.CE.AR5', 'EN.GHG.CO2.IP.MT.CE.AR5', 'EN.GHG.CO2.LU.DF.MT.CE.AR5'

In [18]:
worldb_data.head(50)

Unnamed: 0,economy_Time,Energy intensity level of primary energy (MJ/$2017 PPP GDP),"Access to electricity, rural (% of rural population)","Access to electricity, urban (% of urban population)",Access to electricity (% of population),Electricity production from coal sources (% of total),"Electricity production from oil, gas and coal sources (% of total)",Electricity production from hydroelectric sources (% of total),Electric power transmission and distribution losses (% of output),Electricity production from natural gas sources (% of total),...,Carbon dioxide (CO2) emissions (total) excluding LULUCF (Mt CO2e),Carbon dioxide (CO2) emissions excluding LULUCF per capita (t CO2e/capita),Carbon dioxide (CO2) emissions from Power Industry (Energy) (Mt CO2e),Carbon intensity of GDP (kg CO2e per constant 2015 US$ of GDP),Carbon intensity of GDP (kg CO2e per 2021 PPP $ of GDP),Carbon dioxide (CO2) emissions from Transport (Energy) (Mt CO2e),Carbon dioxide (CO2) emissions from Waste (Mt CO2e),Carbon dioxide (CO2) emissions (total) excluding LULUCF (% change from 1990),F-gases emissions from Industrial Processes (Mt CO2e),Total greenhouse gas emissions excluding LULUCF (% change from 1990)
0,ZWE_2019,14.35,28.3,85.4,46.7,48.223229,48.223229,49.66611,18.387789,0.0,...,11.1692,0.731382,4.9028,0.539034,0.22198,2.2368,,-35.841275,0.4795,-17.405963
1,ZWE_2018,13.64,26.4,85.4,45.4,44.113566,44.113566,53.879816,20.706586,0.0,...,12.27,0.816126,5.3006,0.554661,0.228416,2.6232,,-29.518,0.4345,-12.927973
2,ZWE_2017,13.62,24.2,85.5,44.0,44.802013,45.43769,52.549331,22.910873,0.0,...,10.5854,0.714627,5.0041,0.502482,0.206928,1.9961,,-39.19477,0.3836,-18.811867
3,ZWE_2016,14.18,21.9,85.5,42.5,53.974607,56.776152,41.195142,22.426166,0.0,...,11.2663,0.771649,5.9458,0.560124,0.230666,1.8381,,-35.283508,0.3419,-17.29795
4,ZWE_2015,14.59,10.9,81.2,33.7,46.771702,47.29688,51.385027,17.032231,0.0,...,12.6895,0.881276,6.7615,0.635649,0.261768,2.3409,,-27.108285,0.2997,-12.060964
5,ZWE_2014,14.69,7.7,83.4,32.3,43.831655,44.40012,54.163758,16.445597,0.0,...,12.5977,0.886702,6.7737,0.642282,0.264499,2.5505,,-27.635607,0.2712,-15.211948
6,ZWE_2013,14.79,18.9,85.5,40.6,45.430809,46.088773,52.177546,17.785901,0.0,...,12.7159,0.907384,6.717,0.663719,0.273327,2.8374,,-26.956637,0.2488,-8.787702
7,ZWE_2012,14.61,23.8,85.4,44.0,39.027039,39.645998,58.529699,19.252905,0.0,...,12.74,0.921993,5.5053,0.678206,0.279293,2.5546,,-26.8182,0.2421,-9.386864
8,ZWE_2011,16.48,14.1,83.2,36.9,41.547619,42.294372,56.287879,17.705628,0.0,...,15.8457,1.165517,8.8828,0.984116,0.40527,2.2545,,-8.978269,0.2374,-0.988995
9,ZWE_2010,17.83,15.8,85.3,38.9,31.626194,32.178617,66.739556,19.749108,0.0,...,10.065,0.753563,4.2801,0.713824,0.293961,1.3179,,-42.18408,0.2358,-19.985443


In [21]:
worldb_data.to_csv(path + '/data/worldb_data.csv', index=False)

## 03 - Collect data from open source data sources

The dataset must align with your end-goal and serve its purpose.

- governement websites
- statistics institutes,
- etc.

At the end of this step, you should have collected both quantitative and qualitative data.