# Group Project part 01

#### Deadline for the code submission: October 10th at 08:59 am CET

#### Reminder
- your group is the one assigned to you by the University.
- one goal of this project is to learn how to work as a group, which is the standard in the tech industry. Therefore you need to resolve group issues on your own, as a group.
- if you did not manage to resolve the group issues on your own, you need to escalate to the teacher early, not last minute.
- if the group splits, it would result in a 0 for the whole group.

**Penalty for unexcused absence or lateness**:
- If you are absent or late on presentation day without an official excuse, you will receive 0 for the presentation part of the group project.
- If you are late without an official excuse and can still make it to the presentation of your team, you will still receive 0 for the presentation part of the group project.

## Objective
In this project, you utilise your skills to :
- collect data through multiple APIs and open source datasets, for both quantitative and qualitative data
- merge data from different sources
- describe and analyse datasets
- uncover patterns, insights
- calculate aggregated measures, statistics
- create compelling data visualisations
- write clean code
- tell a story and convince your audience

Each group can pick one and one only scenario among the following ones.

Be mindful to pick a topic that enables enough data collection and analysis in order to showcase all the skills gathered during the course, listed above.

### Scenario 01: Become a Business Manager

Your task is to design a local business that leverages data from various APIs to make informed, strategic decisions. Whether you're launching a street food stand, a drink shop, or another local venture, your team will gather and analyze relevant data —such as foot traffic, weather patterns, customer trends, or competitor insights— to shape your business plan. Your final deliverable will be a data-supported report and/or presentation to a management board, demonstrating how your findings guide key decisions in operations, marketing, or product offerings. The ultimate goal: to optimize performance and increase the chances of business success. Will your business thrive in today’s data-driven world?
Examples:
- lemonade stands business
- food truck business
- delivery service

### **Scenario 02: Fact Check Popular beliefs**

You are part of a fact-checking research team investigating common beliefs, trending opinions, or viral social media claims (e.g. “drinking lemon water boosts metabolism” or “blue light ruins your sleep”). Your goal is to dig into reliable sources, data, and expert opinions to determine whether these beliefs hold up under scrutiny. Use data to challenge or prove real-world claims with clear, persuasive insights. Drawing on research, statistics, and visual evidence, your team will present a well-supported explanation to help your audience separate fact from fiction.

You may also choose to divide the group into two sides—one defending the belief and the other challenging it—before presenting your findings in a debate or side-by-side analysis.

Examples:
- Electric cars are always better for the environment
- Areas with more green space have better physical and mental health outcomes.
- Does public sentiment on social media predict stock market trends?

## 01 - Getting Ready: first questions

Depending on the scenario you picked, please consider the following questions to help you get started.

### Scenario 01: Become a Business Manager

   - What kind of business do we run? What do we sell ? The choice of the business must be original and unique to your group.
   - How do we name our business?
   - When do we operate? Is it an all-year-round business or a seasonal one? If so, which seasons? Which months / weeks / days / hours of the day do we operate?
   - Where do we operate? In which countries / cities are we currently active ? Where do we want to develop in the future ? Determine where to set up your business stand based on weather conditions, local attractions, or events.
   - Which datasets will assist us in making our business the most successful?

### **Scenario 02: Fact-Check a popular belief**

•⁠  ⁠What specific belief or claim do you want to investigate ?

We aim to investigate the correlation between government funds spend on combatting climate change and the actual effect of these funds measured in the CO2 emission of countries over time.

•⁠  ⁠Why is this belief important or worth fact-checking ?

This belief matters, because a significant amount of public funding is going toward climate change initiatives. Understanding whether these initiatives are truly effective helps ensure accountability, smart use of resources, and real environmental progress. Fact-checking this belief can also shape how policies are developed and whether the public supports future climate budgets.


•⁠  ⁠What evidence or data supports or contradicts the belief ?

Data from the OECD and the World Bank show a mixed picture. In some countries, higher government spending on climate programs has led to lower emissions and more renewable energy use. But in others, similar investments haven’t made much difference. This suggests that money alone isn’t always enough -> how it’s used matters. Because of this, it’s too early to take a clear stance, and both datasets are needed to understand the full story.


•⁠  ⁠Will you split the team into two group (in favor / against) ?

Yes, this topic is so complex that we will have Pro and Contra arguments in our final product.

•⁠  ⁠What real-world impact does this belief have on people ?

This belief shapes how people see the government's role in fighting climate change and most importantly, how willing they are to support it through taxes or public programs. It also affects things like job opportunities in clean energy, the cost of electricity, and how quickly we can move toward a low-carbon future.

•⁠  ⁠What are the consequences if people continue believing or acting on this (true or false) idea ?

True: Ongoing funding could speed up climate action, protect people’s health, and help build a stronger, more sustainable economy.

False: Money could be wasted, public trust could take a hit, and real progress on climate solutions might be delayed -> leading to even greater environmental and economic costs down the line.

## 02 - Collect data from multiple APIs, the more the merrier

Integrate with as many APIs as you can e.g.:
- OpenWeatherMap API
- Google Maps,
- TripAdvisor,
- News API,
- Yelp,
- Wikipedia,
- Booking,
- Amadeus Travel API,
- Foursquare,
- etc. (make your own research and be original!)

Each API can provide different types of information. Pick the ones that best suit your scenario.

After collecting all the data you need, save them.

In [2]:
!pip install wikipedia==1.4.0
!pip install wbgapi

Collecting wikipedia==1.4.0
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=03acb07e62910e8bce2e01eb753dec60c2cbbd3ef57c7655b60cef28c303e4d2
  Stored in directory: /root/.cache/pip/wheels/63/47/7c/a9688349aa74d228ce0a9023229c6c0ac52ca2a40fe87679b8
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0
Collecting wbgapi
  Downloading wbgapi-1.0.12-py3-none-any.whl.metadata (13 kB)
Downloading wbgapi-1.0.12-py3-none-any.whl (36 kB)
Installing collected packages: wbgapi
Successfully installed wbgapi-1.0.12


In [3]:
from google.colab import drive
drive.mount('/content/drive')

import wbgapi as wb
import pandas as pd
import wikipedia

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Using the world bank api
to retrieve a rich dataset of quanitive data including GHG-emissions, financial data on climate funds budget and spendig as well as GDP and demografic data.

In [23]:
energy_indicators = [
    "EG.EGY.PRIM.PP.KD", "EG.ELC.ACCS.RU.ZS", "EG.ELC.ACCS.UR.ZS", "EG.ELC.ACCS.ZS",
    "EG.ELC.COAL.ZS", "EG.ELC.FOSL.ZS", "EG.ELC.HYRO.ZS", "EG.ELC.LOSS.ZS",
    "EG.ELC.NGAS.ZS", "EG.ELC.NUCL.ZS", "EG.ELC.PETR.ZS", "EG.ELC.RNEW.ZS",
    "EG.ELC.RNWX.KH", "EG.ELC.RNWX.ZS", "EG.FEC.RNEW.ZS", "EG.GDP.PUSE.KO.PP",
    "EG.GDP.PUSE.KO.PP.KD", "EG.IMP.CONS.ZS", "EG.USE.COMM.CL.ZS", "EG.USE.COMM.FO.ZS",
    "EG.USE.COMM.GD.PP.KD", "EG.USE.CRNW.ZS", "EG.USE.ELEC.KH.PC", "EG.USE.PCAP.KG.OE"
]

worldb_energy_df = wb.data.DataFrame(
        energy_indicators,
        time=range(2000, 2020),
        skipBlanks=True,
        labels=True,
        columns='series'
    ).reset_index()

worldb_energy_df.to_csv('/content/drive/MyDrive/Group_Project/data/worldb_energy.csv')

worldb_energy_df.head(5)

Unnamed: 0,economy,time,Country,Time,EG.EGY.PRIM.PP.KD,EG.ELC.ACCS.RU.ZS,EG.ELC.ACCS.UR.ZS,EG.ELC.ACCS.ZS,EG.ELC.COAL.ZS,EG.ELC.FOSL.ZS,...,EG.FEC.RNEW.ZS,EG.GDP.PUSE.KO.PP,EG.GDP.PUSE.KO.PP.KD,EG.IMP.CONS.ZS,EG.USE.COMM.CL.ZS,EG.USE.COMM.FO.ZS,EG.USE.COMM.GD.PP.KD,EG.USE.CRNW.ZS,EG.USE.ELEC.KH.PC,EG.USE.PCAP.KG.OE
0,ZWE,YR2019,Zimbabwe,2019,14.35,28.3,85.4,46.7,48.223229,48.223229,...,81.0,7.791175,7.993887,21.200162,5.69,0.0,125.095585,46.695835,520.778492,412.16542
1,ZWE,YR2018,Zimbabwe,2018,13.64,26.4,85.4,45.4,44.113566,44.113566,...,79.7,5.872757,8.026014,23.868661,6.49,0.0,124.594856,43.476042,641.65957,445.175095
2,ZWE,YR2017,Zimbabwe,2017,13.62,24.2,85.5,44.0,44.802013,45.43769,...,82.0,16.927992,8.297782,22.227173,5.54,0.0,120.514128,46.623946,542.717959,416.196233
3,ZWE,YR2016,Zimbabwe,2016,14.18,21.9,85.5,42.5,53.974607,56.776152,...,81.7,6.60894,7.905079,20.515526,4.15,0.0,126.500944,45.686486,511.770516,423.185509
4,ZWE,YR2015,Zimbabwe,2015,14.59,10.9,81.2,33.7,46.771702,47.29688,...,80.8,5.83502,7.422367,18.295154,6.57,51.176466,134.727916,42.781337,552.537872,453.579543


In [22]:
envghg_indicators = [
    "EN.CLC.DRSK.XQ", "EN.CLC.MDAT.ZS",
    "EN.GHG.ALL.LU.MT.CE.AR5", "EN.GHG.ALL.MT.CE.AR5", "EN.GHG.ALL.PC.CE.AR5", "EN.GHG.CH4.AG.MT.CE.AR5",
    "EN.GHG.CO2.AG.MT.CE.AR5", "EN.GHG.CO2.BU.MT.CE.AR5", "EN.GHG.CO2.FE.MT.CE.AR5",
    "EN.GHG.CO2.IC.MT.CE.AR5", "EN.GHG.CO2.IP.MT.CE.AR5", "EN.GHG.CO2.LU.DF.MT.CE.AR5", "EN.GHG.CO2.LU.FL.MT.CE.AR5",
    "EN.GHG.CO2.LU.MT.CE.AR5", "EN.GHG.CO2.LU.OL.MT.CE.AR5", "EN.GHG.CO2.LU.OS.MT.CE.AR5", "EN.GHG.CO2.MT.CE.AR5",
    "EN.GHG.CO2.PC.CE.AR5", "EN.GHG.CO2.PI.MT.CE.AR5", "EN.GHG.CO2.RT.GDP.KD", "EN.GHG.CO2.RT.GDP.PP.KD",
    "EN.GHG.CO2.TR.MT.CE.AR5", "EN.GHG.CO2.WA.MT.CE.AR5", "EN.GHG.CO2.ZG.AR5", "EN.GHG.FGAS.IP.MT.CE.AR5",
    "EN.GHG.TOT.ZG.AR5"
]

worldb_envghg_df = wb.data.DataFrame(
        envghg_indicators,
        time=range(2000, 2020),
        skipBlanks=True,
        labels=True,
        columns='series'
    ).reset_index()

worldb_envghg_df.to_csv('/content/drive/MyDrive/Group_Project/data/worldb_envghg.csv')

worldb_envghg_df.head(5)


Unnamed: 0,economy,time,Country,Time,EN.CLC.DRSK.XQ,EN.CLC.MDAT.ZS,EN.GHG.ALL.LU.MT.CE.AR5,EN.GHG.ALL.MT.CE.AR5,EN.GHG.ALL.PC.CE.AR5,EN.GHG.CH4.AG.MT.CE.AR5,...,EN.GHG.CO2.MT.CE.AR5,EN.GHG.CO2.PC.CE.AR5,EN.GHG.CO2.PI.MT.CE.AR5,EN.GHG.CO2.RT.GDP.KD,EN.GHG.CO2.RT.GDP.PP.KD,EN.GHG.CO2.TR.MT.CE.AR5,EN.GHG.CO2.WA.MT.CE.AR5,EN.GHG.CO2.ZG.AR5,EN.GHG.FGAS.IP.MT.CE.AR5,EN.GHG.TOT.ZG.AR5
0,ZMB,YR2011,Zambia,2011,3.75,,-19.8048,18.7092,1.295849,3.4788,...,3.2873,0.227687,0.0349,0.18841,0.068259,1.3439,,8.563408,0.0001,20.846413
1,YEM,YR2011,"Yemen, Rep.",2011,2.25,,45.4791,47.3661,1.717227,5.011,...,23.0001,0.833854,5.2967,0.418006,,6.657,,223.612342,2.1099,170.57839
2,VEN,YR2011,"Venezuela, RB",2011,2.75,,-41.5935,257.1569,8.811205,27.967,...,172.9572,5.926193,26.3558,,,50.8393,0.0153,68.609273,4.0841,57.45219
3,VUT,YR2011,Vanuatu,2011,2.0,,-6.2252,0.7118,2.923515,0.3468,...,0.1796,0.737656,0.0407,0.255844,0.217536,0.1006,,75.219512,,50.677392
4,USA,YR2011,United States,2011,3.5,,5507.7404,6509.5922,20.874819,244.5116,...,5307.9147,17.021305,2245.8501,0.319856,0.282532,1633.5907,0.0033,6.503141,187.8107,4.836258


In [18]:
envfin_indicators = [
    "NY.GDP.MKTP.CD",
    "NY.GDP.MKTP.CN",
    "NY.GDP.MKTP.KD.ZG",
    "SP.POP.TOTL",
    "NY.ADJ.DCO2.CD",
    "NY.ADJ.DCO2.GN.ZS",
    "NY.ADJ.DPEM.CD",
    "NY.ADJ.DPEM.GN.ZS",
    "DT.NFL.UNEP.CD"
]

worldb_envfin_df = wb.data.DataFrame(
        envfin_indicators,
        time=range(2000, 2020),
        skipBlanks=True,
        labels=True,
        columns='series'
    ).reset_index()

worldb_envfin_df.to_csv('/content/drive/MyDrive/Group_Project/data/worldb_envfin.csv')

worldb_envfin_df.head(5)

Unnamed: 0,economy,time,Country,Time,DT.NFL.UNEP.CD,NY.ADJ.DCO2.CD,NY.ADJ.DCO2.GN.ZS,NY.ADJ.DPEM.CD,NY.ADJ.DPEM.GN.ZS,NY.GDP.MKTP.CD,NY.GDP.MKTP.CN,NY.GDP.MKTP.KD.ZG,SP.POP.TOTL
0,ZWE,YR2019,Zimbabwe,2019,,479560100.0,2.231775,347206600.0,1.615829,25715660000.0,84913100.0,-6.33245,15271368.0
1,ZWE,YR2018,Zimbabwe,2018,,485032900.0,1.450331,520956700.0,1.55775,34156060000.0,27859600.0,5.009921,15034452.0
2,ZWE,YR2017,Zimbabwe,2017,,386892700.0,2.481075,283331000.0,1.816952,51074730000.0,25619900.0,4.734411,14812482.0
3,ZWE,YR2016,Zimbabwe,2016,,395745300.0,2.198269,349918700.0,1.943713,20548760000.0,8223700.0,0.755794,14600294.0
4,ZWE,YR2015,Zimbabwe,2015,,432226200.0,2.45515,337453500.0,1.916818,19963060000.0,7989300.0,1.779854,14399013.0


# Using the Wikipedia API for qualitative context
We will use the Wikipedia API to gather qualitative background information that complements our quantitative World Bank datasets. Collecting this contextual data will help explain trends and provide narratives about how and why emissions changed in specific countries.

In [24]:
wikipedia.set_lang('en')

topics = [
"Climate change",
"Climate change mitigation",
"Climate change adaptation",
"United Nations Framework Convention on Climate Change",
"Kyoto Protocol",
"Paris Agreement",
"Nationally determined contribution",
"Loss and damage (climate change)",
"Loss and Damage Fund",
"Climate finance",
"Green Climate Fund",
"The Adaptation Fund",
"Global Environment Facility",
"European Union Emissions Trading System",
"Emissions trading",
"Carbon price",
"Carbon tax",
"Carbon emission trading",
"EU Carbon Border Adjustment Mechanism",
"European Green Deal",
"Fit for 55",
"Carbon budget",
"Global Carbon Project",
"List of countries by carbon dioxide emissions",
"List of countries by carbon dioxide emissions per capita",
"List of countries by greenhouse gas emissions",
"List of countries by greenhouse gas emissions per capita",
"List of countries by carbon intensity",
"Fossil fuel subsidies",
"Greenhouse gas emissions from agriculture",
"Climate Change Performance Index",
"Climate change in Germany"
]

articles = []

for t in topics:
    try:
        s = wikipedia.summary(t)
    except Exception:
        s = None
    articles.append({'title': t, 'extract': s})
for a in articles:
    print('===', a['title'], '===')
    print(a['extract'])
    print()

=== Climate change ===
Present-day climate change includes both global warming—the ongoing increase in global average temperature—and its wider effects on Earth's climate system. Climate change in a broader sense also includes previous long-term changes to Earth's climate. The current rise in global temperatures is driven by human activities, especially fossil fuel (coal, oil and natural gas) burning since the Industrial Revolution. Fossil fuel use, deforestation, and some agricultural and industrial practices release greenhouse gases. These gases absorb some of the heat that the Earth radiates after it warms from sunlight, warming the lower atmosphere. Carbon dioxide, the primary gas driving global warming, has increased in concentration by about 50% since the pre-industrial era to levels not seen for millions of years.
Climate change has an increasingly large impact on the environment. Deserts are expanding, while heat waves and wildfires are becoming more common. Amplified warming i

## 03 - Collect data from open source data sources

The dataset must align with your end-goal and serve its purpose.

- governement websites
- statistics institutes,
- etc.

At the end of this step, you should have collected both quantitative and qualitative data.

In [21]:
#https://climatedata.imf.org/datasets/3fb1ed30d3394574b3145246846023b1/explore
environment_imf = pd.read_csv('/content/drive/MyDrive/Group_Project/data/Environment_IMF.csv')
environment_imf.head(5)


Unnamed: 0,ObjectId,Country,ISO2,ISO3,Indicator,Source,CTS Code,CTS Name,CTS Full Descriptor,Unit,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,1,Albania,AL,ALB,Environmental Taxes,Organisation for Economic Co-operation and Dev...,ECGTE,Environmental Taxes,"Environment, Climate Change, Government Policy...",Domestic Currency,...,,,,43993140000.0,47813790000.0,47548580000.0,51145590000.0,53415650000.0,,
1,2,Albania,AL,ALB,Environmental Taxes,Organisation for Economic Co-operation and Dev...,ECGTE,Environmental Taxes,"Environment, Climate Change, Government Policy...",Percent of GDP,...,,,,3.067206,3.247163,3.066373,3.124865,3.157133,,
2,3,Albania,AL,ALB,Taxes on Energy (including fuel for transport),Organisation for Economic Co-operation and Dev...,ECGTEN,Taxes on Energy (Including Fuel for Transport),"Environment, Climate Change, Government Policy...",Domestic Currency,...,,,,37741110000.0,40945620000.0,40400040000.0,43521820000.0,45165300000.0,,
3,4,Albania,AL,ALB,Taxes on Energy (including fuel for transport),Organisation for Economic Co-operation and Dev...,ECGTEN,Taxes on Energy (Including Fuel for Transport),"Environment, Climate Change, Government Policy...",Percent of GDP,...,,,,2.631314,2.780726,2.605369,2.659072,2.669496,,
4,5,Albania,AL,ALB,Taxes on Pollution,Organisation for Economic Co-operation and Dev...,ECGTEP,Taxes on Pollution,"Environment, Climate Change, Government Policy...",Domestic Currency,...,,,,1782069000.0,1879970000.0,1941324000.0,2226251000.0,2625011000.0,,


In [20]:
#https://www.climatewatchdata.org/ghg-emissions?chartType=area&end_year=2022&regions=WORLD&start_year=1990
ghg_emissions = pd.read_csv('/content/drive/MyDrive/Group_Project/data/ghg-emissions.csv')
ghg_emissions.head(5)

Unnamed: 0,iso,Country/Region,unit,1990,1991,1992,1993,1994,1995,1996,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,CHN,China,MtCO2e,3049.87,3196.9,3338.53,3581.08,3751.23,4148.03,4164.03,...,11175.26,11237.61,11085.85,11135.02,11376.55,11868.7,12109.18,12263.69,12852.14,12851.84
1,USA,United States,MtCO2e,5472.89,5424.7,5506.44,5615.49,5709.34,5777.03,5947.73,...,5795.2,5845.56,5735.74,5804.37,5755.15,5972.88,5851.35,5305.85,5591.52,5670.87
2,IND,India,MtCO2e,1126.56,1181.45,1208.14,1244.03,1293.11,1361.15,1410.07,...,2947.22,3134.57,3179.0,3247.88,3367.94,3506.66,3507.03,3326.2,3584.74,3805.03
3,RUS,Russia,MtCO2e,2618.14,2550.02,2382.89,2186.5,1952.47,1876.57,1832.29,...,1603.53,1596.32,1572.12,1697.9,1746.26,1840.91,1844.3,1759.19,1939.93,1886.05
4,BRA,Brazil,MtCO2e,1675.44,1696.97,1707.24,1717.63,1734.63,1761.52,1764.33,...,1410.09,1450.3,1437.66,1524.43,1550.01,1509.63,1533.43,1537.1,1598.81,1597.22


In [19]:
#https://www.climatepolicyinitiative.org/resources/data-visualizations/global-landscape-of-climate-finance-data-dashboard/
cpi_global_climate_financing = pd.read_csv('/content/drive/MyDrive/Group_Project/data/CPI_Global_Climate_Finance_Data.csv')
cpi_global_climate_financing.head(5)

Unnamed: 0,Development_Status_Origin,Region_Destination,SIDs_Destination,Development_Status_Destination,Institution_Type_Layer1,Institution_Type_Layer2,Domestic_International,Instrument,Use,Sector,Sub_Sector,Year,Value
0,Advanced,Central Asia & Eastern Europe,False,Advanced,Private,Commercial FIs,Domestic,Balance sheet financing (debt portion),Mitigation,Energy systems,Power & Heat Generation,2023,0.638006
1,Advanced,Central Asia & Eastern Europe,False,Advanced,Private,Commercial FIs,Domestic,Project-level market rate debt,Mitigation,Energy systems,Power & Heat Generation,2023,0.097378
2,Advanced,Central Asia & Eastern Europe,False,Advanced,Private,Commercial FIs,Domestic,Project-level market rate debt,Mitigation,Waste,Other/Unspecified,2023,0.008648
3,Advanced,Central Asia & Eastern Europe,False,Advanced,Private,Commercial FIs,Domestic,Project-level market rate debt,Dual benefit,"Agriculture, Forestry, Other land uses and Fis...",Forestry,2023,0.000562
4,Advanced,Central Asia & Eastern Europe,False,Advanced,Private,Commercial FIs,Domestic,Project-level market rate debt,Dual benefit,Water & wastewater,Other/Unspecified,2023,0.000337
