# API Quest
## Oslo

# HYPOTHESIS
- Rich countries have more Nobel Prizes
    - Nobel prize winners immigrate towards rich countries
    - Nobel prize winners immigrate towards stable countries
- Countries of birth / early education have more impact than countries of higher education
- Nobel Prizes Laureates are getting younger
- Nobel Prizes are awarded more to international teams than before

- Gender Differences: Is there a significant difference in the gender ratio among Nobel Prize winners? Has this changed over time?
- Geographic Distribution: In which countries or regions are Nobel Prize winners predominantly located? Has this distribution changed over time?
- Age of Winners: What is the age distribution of Nobel Prize winners? Are there any noticeable trends in age?
- Publications: Are there specific journals where Nobel Prize winners’ research is commonly published? How influential are these journals?

## HYPOTHESIS 1
- Men are over represented in Nobel Prizes

# DATA SOURCES

1. **Nobel Laureates Data**
	- **Nobel Prize Official Data**
	  - Description: Comprehensive information on all Nobel laureates, including their age, nationality, affiliation, prize category, and motivation.
	  - Link: [Nobel Prize Official Website](https://www.nobelprize.org/organization/developer-zone-2/)
	  - API: [Nobel Prize API](https://www.nobelprize.org/organization/developer-zone-2/)
	- **Kaggle Nobel Laureates Dataset**
	  - Description: A dataset compiled from the Nobel Prize official data, available in CSV format for easy analysis.
	  - Link: [Kaggle Nobel Prize Dataset](https://www.kaggle.com/datasets/imdevskp/nobel-prize/data)

3. **Economic Indicators**
	- **World Bank GDP Data**
	  - Description: GDP per capita and other economic indicators for countries worldwide.
	  - Link: [World Bank GDP per Capita](https://data.worldbank.org/indicator/NY.GDP.PCAP.CD)
	- **Heritage Foundation Index of Economic Freedom**
	  - Description: Measures economic freedom in countries across 12 quantitative and qualitative factors.
	  - Link: [Index of Economic Freedom](https://www.heritage.org/index/)

4. **Education Expenditure and Statistics**
	- **UNESCO Education Data**
	  - Description: Data on government expenditure on education as a percentage of GDP and total government expenditure.
	  - Link: [UNESCO Education Expenditure](http://data.uis.unesco.org/)
	- **OECD Education Statistics**
	  - Description: Detailed statistics on education spending, enrollment rates, and educational attainment among OECD countries.
	  - Link: [OECD Education at a Glance](https://www.oecd.org/education/education-at-a-glance/)

5. **Gender Statistics**
	- **UNESCO Gender Parity Index**
	  - Description: Data on gender parity in education and literacy rates.
	  - Link: [UNESCO Gender Equality Data](http://data.uis.unesco.org/)
	- **World Bank Gender Data Portal**
	  - Description: Comprehensive data on gender equality indicators globally.
	  - Link: [World Bank Gender Data](https://datatopics.worldbank.org/gender/)

18. **Demographic and Socioeconomic Data**
	 - **United Nations Educational, Scientific and Cultural Organization (UNESCO) Institute for Statistics**
		- Description: Data on education, literacy rates, and demographic factors.
		- Link: [UNESCO UIS Data](http://data.uis.unesco.org/)
	 - **OECD Social and Welfare Statistics**
		- Description: Indicators on social protection, income inequality, and more.
		- Link: [OECD Social Data](https://www.oecd.org/social/soc/)



## Selected data sources

1. Nobel API
2. https://uis.unesco.org/
3. https://databank.worldbank.org/source/world-development-indicators

In [355]:
#imports
import os
import json
import requests
import pandas as pd
from dotenv import load_dotenv
import seaborn as sb

In [356]:
#settings
pd.set_option('display.max_colwidth', 900)
pd.set_option('display.max_rows', 1000)

In [357]:
#load env
load_dotenv()
token = os.getenv('TOKEN')
print(token)

test


In [358]:
#TODO: Get the data from the API
enrollment_df = pd.read_csv('sources/school_enrolment_gender.csv')
enrollment_df.head()

laureates_url = 'https://api.nobelprize.org/2.1/laureates'

In [359]:
def flatten(dictionnary, prefix=''):
    flattened = pd.json_normalize(dictionnary)

    if prefix:
        flattened = flattened.add_prefix(prefix + '.')
    
    for column in flattened.columns:
        sample = flattened[column].iloc[0]

        if isinstance(sample, list) and len(sample) > 0 and isinstance(sample[0], dict):

            inner_dict = flattened[column].apply(
                lambda x: x[0] if isinstance(x, list) and len(x) > 0 else None)

            flattened = pd.concat([flattened, flatten(inner_dict, column)])
            flattened.drop(column, axis=1, inplace=True)

    return flattened


flattened = flatten(raw_data['laureates'])
display(flattened.head())
display(pd.DataFrame(flattened.columns))

Unnamed: 0,id,fileName,gender,sameAs,knownName.en,knownName.se,givenName.en,givenName.se,familyName.en,familyName.se,...,nobelPrizes.affiliations.countryNow.latitude,nobelPrizes.affiliations.countryNow.longitude,nobelPrizes.affiliations.continent.en,nobelPrizes.affiliations.locationString.en,nobelPrizes.affiliations.locationString.no,nobelPrizes.affiliations.locationString.se,nobelPrizes.links.rel,nobelPrizes.links.href,nobelPrizes.links.action,nobelPrizes.links.types
0,745,spence,male,"[https://www.wikidata.org/wiki/Q157245, https://en.wikipedia.org/wiki/Michael_Spence]",A. Michael Spence,A. Michael Spence,A. Michael,A. Michael,Spence,Spence,...,,,,,,,,,,
1,102,bohr,male,"[https://www.wikidata.org/wiki/Q103854, https://en.wikipedia.org/wiki/Aage_Bohr]",Aage N. Bohr,Aage N. Bohr,Aage N.,Aage N.,Bohr,Bohr,...,,,,,,,,,,
2,779,ciechanover,male,"[https://www.wikidata.org/wiki/Q233205, https://en.wikipedia.org/wiki/Aaron_Ciechanover]",Aaron Ciechanover,Aaron Ciechanover,Aaron,Aaron,Ciechanover,Ciechanover,...,,,,,,,,,,
3,259,klug,male,"[https://www.wikidata.org/wiki/Q190626, https://en.wikipedia.org/wiki/Aaron_Klug]",Aaron Klug,Aaron Klug,Aaron,Aaron,Klug,Klug,...,,,,,,,,,,
4,1004,gurnah,male,"[https://www.wikidata.org/wiki/Q317877, https://en.wikipedia.org/wiki/Abdulrazak_Gurnah]",Abdulrazak Gurnah,Abdulrazak Gurnah,Abdulrazak,Abdulrazak,Gurnah,Gurnah,...,,,,,,,,,,


Unnamed: 0,0
0,id
1,fileName
2,gender
3,sameAs
4,knownName.en
5,knownName.se
6,givenName.en
7,givenName.se
8,familyName.en
9,familyName.se


In [360]:
#export_csv = flattened.to_csv(r'./sources/nobel_laureates.csv', index = None, header=True)

In [361]:
def get_all_laureates():
    offset = 0
    limit = 25
    max = 1000
    all_laureates = []
    
    while offset < max:
        url = f"{laureates_url}?offset={offset}&limit={limit}"
        response = requests.get(url)
        data = response.json()
        
        flattened = flatten(data['laureates'])
        all_laureates.append(flattened)
        offset += limit

    return get_all_laureates()

laureates_df = get_all_laureates()
laureates_df