# Example how create list of universities in a country

This example is based on the OpenAlex (https://openalex.org/) API. 

OpenAlex is a free and open catalog of the world's scholarly papers, researchers, journals, and institutions — along with all the ways they're connected to one another.

Using OpenAlex, you can build your own scholarly search engine, recommender service, or knowledge graph. You can help manage research by tracking citation impact, spotting promising new research areas, and identifying and promoting work from underrepresented groups. And you can do research on research itself, in areas like bibliometrics, science and technology studies, and Science of science policy.

For more info refer to https://openalex.org/about
For more info on the API refer to https://docs.openalex.org/

### Import needed packages

In [1]:
import requests
import csv
import sys
for line in sys.stdin:
    # Decode what you receive:
    line = line.decode('iso8859-1')

    # Work with Unicode internally:
    line = line.upper()

    # Encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

### Build the query
This is expressed as an url 

In [7]:
query_url = 'https://api.openalex.org/institutions?filter=country_code:BR,works_count:%3E500,type:education&per-page=200'

In this case we ask openalex.org for all institutions available (/institution) and then filter by country, Italy (country_code:IT), having minimum of 500 publication per year (works_count:%3E500) to clean from vary small/online institutions, by type (type:education) to get universities only, and finally we limit the results we get to 200 universities. 

Note: the entities are sorted by publications per year. 



### Make the query and explore results

In [8]:
res_json = requests.get(query_url).json()



The response has three sections 

In [9]:
res_json.keys()

dict_keys(['meta', 'results', 'group_by'])

'meta' contains info about the query

In [10]:
res_json['meta']

{'count': 300, 'db_response_time_ms': 74, 'page': 1, 'per_page': 200}

We got 86 results.

The 'results' sections contains the actual results. It is a list of dictionaries containing the following informations

In [11]:
print(res_json['results'][0].keys())

dict_keys(['id', 'ror', 'display_name', 'country_code', 'type', 'homepage_url', 'image_url', 'image_thumbnail_url', 'display_name_acronyms', 'display_name_alternatives', 'works_count', 'cited_by_count', 'ids', 'geo', 'international', 'associated_institutions', 'counts_by_year', 'x_concepts', 'works_api_url', 'updated_date', 'created_date'])


Let's now extract some info and add to a list.

In [12]:
it_unis = [['name', 'local name', 'homepage','Alternative names', 'image url', 'country', 'city', 'key subjects', 'key subject scores', 'institutions']]
for uni in res_json['results']:
    try:
        local_name = uni['international']['display_name']['it']
    except:
        local_name = ''
    key_subjects = [concept['display_name'] for concept in uni['x_concepts'] ]
    key_subject_score = [concept['score'] for concept in uni['x_concepts'] ]
    related_institutions = [inst['display_name'] for inst in uni['associated_institutions']] 
       
    it_unis.append((uni['display_name'],local_name, uni['homepage_url'],uni['display_name_alternatives'],uni['image_url'], uni['geo']['country'], uni['geo']['city'],key_subjects,key_subject_score,related_institutions))

The first row is an header

In [13]:
print(it_unis[0])

['name', 'local name', 'homepage', 'Alternative names', 'image url', 'country', 'city', 'key subjects', 'key subject scores', 'institutions']


Each row is then an university

In [14]:
print(it_unis[1])

('Universidade de São Paulo', 'Universidade de São Paulo', 'http://www5.usp.br/en/', [], 'https://upload.wikimedia.org/wikipedia/commons/2/2f/Webysther_20170627_-_Bras%%C3%%A3o_USP.svg', 'Brazil', 'São Paulo', ['Biology', 'Medicine', 'Chemistry', 'Physics', 'Internal medicine', 'Genetics', 'Biochemistry', 'Pathology', 'Mathematics', 'Psychology', 'Computer science', 'Organic chemistry', 'Engineering', 'Quantum mechanics'], [64.8, 56.8, 44.4, 36.5, 35.8, 31.1, 31.0, 24.9, 24.2, 24.0, 23.8, 23.4, 22.7, 20.9], ['Clinics Hospital of Ribeirão Preto', 'Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo', 'Instituto Butantan'])


We can now save the results to file (.csv)

In [15]:
with open("french_uni.csv","w+") as my_csv:
    csvWriter = csv.writer(my_csv,delimiter=',')
    csvWriter.writerows(it_unis)

Or use the stk to write them in MI

In [None]:
# TODO