# API Quest
## Oslo

# HYPOTHESIS
- Rich countries have more Nobel Prizes
    - Nobel prize winners immigrate towards rich countries
    - Nobel prize winners immigrate towards stable countries
- Countries of birth / early education have more impact than countries of higher education
- Nobel Prizes Laureates are getting younger
- Nobel Prizes are awarded more to international teams than before

- Gender Differences: Is there a significant difference in the gender ratio among Nobel Prize winners? Has this changed over time?
- Geographic Distribution: In which countries or regions are Nobel Prize winners predominantly located? Has this distribution changed over time?
- Age of Winners: What is the age distribution of Nobel Prize winners? Are there any noticeable trends in age?
- Publications: Are there specific journals where Nobel Prize winners’ research is commonly published? How influential are these journals?

## HYPOTHESIS 1
- Men are over represented in Nobel Prizes

# DATA SOURCES

1. **Nobel Laureates Data**
	- **Nobel Prize Official Data**
	  - Description: Comprehensive information on all Nobel laureates, including their age, nationality, affiliation, prize category, and motivation.
	  - Link: [Nobel Prize Official Website](https://www.nobelprize.org/organization/developer-zone-2/)
	  - API: [Nobel Prize API](https://www.nobelprize.org/organization/developer-zone-2/)
	- **Kaggle Nobel Laureates Dataset**
	  - Description: A dataset compiled from the Nobel Prize official data, available in CSV format for easy analysis.
	  - Link: [Kaggle Nobel Prize Dataset](https://www.kaggle.com/datasets/imdevskp/nobel-prize/data)

3. **Economic Indicators**
	- **World Bank GDP Data**
	  - Description: GDP per capita and other economic indicators for countries worldwide.
	  - Link: [World Bank GDP per Capita](https://data.worldbank.org/indicator/NY.GDP.PCAP.CD)
	- **Heritage Foundation Index of Economic Freedom**
	  - Description: Measures economic freedom in countries across 12 quantitative and qualitative factors.
	  - Link: [Index of Economic Freedom](https://www.heritage.org/index/)

4. **Education Expenditure and Statistics**
	- **UNESCO Education Data**
	  - Description: Data on government expenditure on education as a percentage of GDP and total government expenditure.
	  - Link: [UNESCO Education Expenditure](http://data.uis.unesco.org/)
	- **OECD Education Statistics**
	  - Description: Detailed statistics on education spending, enrollment rates, and educational attainment among OECD countries.
	  - Link: [OECD Education at a Glance](https://www.oecd.org/education/education-at-a-glance/)

5. **Gender Statistics**
	- **UNESCO Gender Parity Index**
	  - Description: Data on gender parity in education and literacy rates.
	  - Link: [UNESCO Gender Equality Data](http://data.uis.unesco.org/)
	- **World Bank Gender Data Portal**
	  - Description: Comprehensive data on gender equality indicators globally.
	  - Link: [World Bank Gender Data](https://datatopics.worldbank.org/gender/)

18. **Demographic and Socioeconomic Data**
	 - **United Nations Educational, Scientific and Cultural Organization (UNESCO) Institute for Statistics**
		- Description: Data on education, literacy rates, and demographic factors.
		- Link: [UNESCO UIS Data](http://data.uis.unesco.org/)
	 - **OECD Social and Welfare Statistics**
		- Description: Indicators on social protection, income inequality, and more.
		- Link: [OECD Social Data](https://www.oecd.org/social/soc/)



## Selected data sources

1. Nobel API
2. https://uis.unesco.org/
3. https://databank.worldbank.org/source/world-development-indicators

In [242]:
#imports
import os
import json
import requests
import pandas as pd
from dotenv import load_dotenv
import seaborn as sb

In [243]:
#settings
pd.set_option('display.max_colwidth', 900)

In [None]:
#load env
load_dotenv()
token = os.getenv('TOKEN')
print(token)

In [None]:
#TODO: Get the data from the API
enrollment_df = pd.read_csv('sources/school_enrolment_gender.csv')
enrollment_df

In [None]:
url= "https://api.nobelprize.org/2.1/laureates"
response = requests.get(url)
data = response.json()
laureate_infos = {}
for laureate in data['laureates']:
    laureate_infos[laureate['id']]= {
        'Name': laureate['knownName']['en'],
        'Gender': laureate.get('gender',None),
        'Country': laureate['birth']['place']['country']['en'] if 'birth' in laureate and 'place' in laureate['birth'] and 'country' in
        laureate['birth']['place'] else 'None',
        'Prize_year': laureate['nobelPrizes'][0]['awardYear'],
        'Prize_category': laureate['nobelPrizes'][0]['category']['en'],}
        #'Prize_affiliations': ['nobelPrizes'][0]['affiliations'][0]['name']['en']}
print(laureate_infos)
df = pd.DataFrame.from_dict(laureate_infos,orient ='index')
print(df)

In [None]:
raw_data = response.json()
nobels = pd.json_normalize(raw_data['laureates'])
display (nobels.head())


In [None]:

""" def flatten_column(df, column):
    flattened = pd.json_normalize(df[column][0])
    return flattened

def flatten(dictionnary):
    flattened = pd.json_normalize(dictionnary)
    return flattened

def flatten_old(dictionnary):
    flattened = pd.json_normalize(dictionnary)
    
    for column in flattened.columns:
        sample = flattened[column].iloc[0]
        
        if isinstance(sample, list) and isinstance(sample[0], dict):
            pd.concat([flattened, flatten(sample[0])])
    display(flattened)      
    return flattened 
    
    display(flatten_column(nobels, 'sameAs'))
display(flatten_column(flatten_column(nobels, 'nobelPrizes'), 'affiliations')) 

def flatten_nobels(df):
    final_df = df.copy()
    
    for column in final_df.columns:
        sample = final_df[column].iloc[0]
        
        if isinstance(sample, list) and isinstance(sample[0], dict):
            final_df = pd.concat([final_df, flatten(sample[0])], axis=1)
     
    return final_df """

def flatten(dictionnary):
    flattened = pd.json_normalize(dictionnary)
    display('head',flattened.head())      
    
    for column in flattened.columns:
        sample = flattened[column].iloc[0]
        
        if isinstance(sample, list) and len(sample) > 0 and isinstance(sample[0], dict):

                col = flattened[column].apply(lambda x: x[0] if isinstance(x,list) and len(x) > 0 else None)
                #pd.concat([flattened, flatten(col)]),
                flattened = pd.concat([flattened, flatten(col)], axis=1)
                flattened.drop(column, axis=1, inplace=True)
                
                

    display('last',flattened)           
    return flattened 


flattened = flatten(raw_data['laureates'])

In [249]:

export_csv = flattened.to_csv(r'./sources/nobel_laureates.csv', index = None, header=True)