<a href="https://colab.research.google.com/github/adong-hood/cs200/blob/main/ch_6_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Reading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

<ul>
    <li><a href = "https://nextjournal.com/sdanisch/cartographic-visualization">Cartographic Visualization</a></li>
</ul>
    

### This section covers a few large concepts that work together in one big example:
<ul>
    <li>Using a web API to get information</li>
    <li>Applying a function to a data set and getting a new column</li>
    <li>Mapping data using Altair</li>
<ul>

## Preliminaries

In [1]:
import altair as alt
import pandas as pd
from vega_datasets import data
import requests
# for Jupyter Notebook only. alt.renderers.enable('notebook')

Note: Vega Datasets is a common repository for example datasets used by Vega related projects. Vega is a visualization grammar, a declarative language for creating, saving, and sharing interactive visualization designs. With Vega, you can describe the visual appearance and interactive behavior of a visualization in a JSON format, and generate web-based views using Canvas or SVG.

## Warmup with the book example

Let’s take on the seemingly simple task of plotting some of the country data on a map like we did in Google Sheets earlier.

Altair provides us with the facility to make a blank map. The counties data that is passed to the chart is the data needed to create and outline the map.

### altair.topo_feature
<code>altair.topo_feature(url, feature, **kwargs)</code>
<p>A convenience function for extracting features from a topojson url</p>

Parameters:
url:string
An URL from which to load the data set.

feature:string
The name of the TopoJSON object set to convert to a GeoJSON feature collection. For example, in a map of the States, there may be an object set named “counties”. Using the feature property, we can extract this set and generate a GeoJSON feature object for each county.

**kwargs :
additional keywords passed to TopoDataFormat



In [None]:
counties = alt.topo_feature(data.us_10m.url, 'counties')

In [None]:
alt.Chart(counties).mark_geoshape().project(
    type='albersUsa').properties(
    width=500,
    height=300
)

What we want to do is graphing the unemployment data by county.

In [None]:
unemp_data = data.unemployment.url
unemp_data

'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/unemployment.tsv'

In [None]:
unemp_data = pd.read_csv(unemp_data,sep='\t')
unemp_data.head()

Unnamed: 0,id,rate
0,1001,0.097
1,1003,0.091
2,1005,0.134
3,1007,0.121
4,1009,0.099


Using the <code>transform_lookup</code> method, we can arrange for the id in the geographic data to be matched against the id in our unemp_data data frame.

In [None]:
alt.Chart(counties).mark_geoshape(
).encode(
    color='rate:Q'
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(unemp_data, 'id', ['rate'])
).project(
    type='albersUsa'
).properties(
    width=500,
    height=300,
    title='Unemployment by County'
)

## Part 1 - Adding country codes to our data.


Let's try a more complicated case where we do not have all the data readily available to us.This time we will plot average incomes of counties on map.

### Read in country income data

This data set contains a few countries average income, but it does not have each country's numeric code which is needed for transform_lookup when mapping income on map.

Our goal is add unique country numeric code to this set so that we can map them.

In [None]:
income_data = pd.read_csv('http://pluto.hood.edu/~dong/datasets/country_income.csv')
print(income_data.shape)
income_data.head()

(35, 8)


Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag Codes
0,AUS,AVWAGE,TOT,USD,A,2017,49125.86723,
1,AUT,AVWAGE,TOT,USD,A,2017,50348.9402,
2,BEL,AVWAGE,TOT,USD,A,2017,49674.99505,
3,CAN,AVWAGE,TOT,USD,A,2017,47621.84365,
4,CZE,AVWAGE,TOT,USD,A,2017,25372.04296,


### Look up country code for one country

We need to find countries' numeric code on line.
<p>we import the requests module, then we request what we want, and save the results as res.</p>


In [None]:
#requst
res = requests.get('https://restcountries.com/v2/alpha/aus')
res.status_code

200

<p><code>get </code> method does the following - first it goes to the website https://restcountries.com/v2/alpha/usa, then returns the information for that country in json format.</p>
<ul>
<!--li>/rest - technically REST stands for REpresentational State Transfer. This uses the HTTP protocol to ask for and respond with data.</li-->
<li>/v2 - this is version 2 of this website’s protocol</li>
<li>/alpha - This tells the website that the next thing we are going to pass tell it is the three letter code for the country.</li>
<li>/AUS - this can be any valid three letter country code. for example usa</li>
</ul>

In [None]:
res.text

'{"name":"Australia","topLevelDomain":[".au"],"alpha2Code":"AU","alpha3Code":"AUS","callingCodes":["61"],"capital":"Canberra","altSpellings":["AU"],"subregion":"Australia and New Zealand","region":"Oceania","population":25687041,"latlng":[-27.0,133.0],"demonym":"Australian","area":7692024.0,"gini":34.4,"timezones":["UTC+05:00","UTC+06:30","UTC+07:00","UTC+08:00","UTC+09:30","UTC+10:00","UTC+10:30","UTC+11:30"],"nativeName":"Australia","numericCode":"036","flags":{"svg":"https://flagcdn.com/au.svg","png":"https://flagcdn.com/w320/au.png"},"currencies":[{"code":"AUD","name":"Australian dollar","symbol":"$"}],"languages":[{"iso639_1":"en","iso639_2":"eng","name":"English","nativeName":"English"}],"translations":{"br":"Aostralia","pt":"Austrália","nl":"Australië","hr":"Australija","fa":"استرالیا","de":"Australien","es":"Australia","fr":"Australie","ja":"オーストラリア","it":"Australia","hu":"Ausztrália"},"flag":"https://flagcdn.com/au.svg","cioc":"AUS","independent":true}'

In [None]:
temp = res.json()
temp

{'name': 'Australia',
 'topLevelDomain': ['.au'],
 'alpha2Code': 'AU',
 'alpha3Code': 'AUS',
 'callingCodes': ['61'],
 'capital': 'Canberra',
 'altSpellings': ['AU'],
 'subregion': 'Australia and New Zealand',
 'region': 'Oceania',
 'population': 25687041,
 'latlng': [-27.0, 133.0],
 'demonym': 'Australian',
 'area': 7692024.0,
 'gini': 34.4,
 'timezones': ['UTC+05:00',
  'UTC+06:30',
  'UTC+07:00',
  'UTC+08:00',
  'UTC+09:30',
  'UTC+10:00',
  'UTC+10:30',
  'UTC+11:30'],
 'nativeName': 'Australia',
 'numericCode': '036',
 'flags': {'svg': 'https://flagcdn.com/au.svg',
  'png': 'https://flagcdn.com/w320/au.png'},
 'currencies': [{'code': 'AUD', 'name': 'Australian dollar', 'symbol': '$'}],
 'languages': [{'iso639_1': 'en',
   'iso639_2': 'eng',
   'name': 'English',
   'nativeName': 'English'}],
 'translations': {'br': 'Aostralia',
  'pt': 'Austrália',
  'nl': 'Australië',
  'hr': 'Australija',
  'fa': 'استرالیا',
  'de': 'Australien',
  'es': 'Australia',
  'fr': 'Australie',
  'ja'

In [None]:
temp['numericCode']

'036'

## Adding country code to income data

<p>We now implement a function that takes a country letter code and fetch its numeric code from the web using Web API. This will allow us to add numeric code to all countries easily.</p>

In [None]:
#function, end point v2.
def look_up_code(country_code):
    address = 'https://restcountries.com/v2/alpha/'+country_code
    res = requests.get(address)
    country_info = res.json()
    country_num = country_info['numericCode']
    return int(country_num)

<p><code>map</code> is a method of a Series, so we use the syntax df.myColumn.map(function). This applies the function we pass as a parameter to each element of the series and constructs a brand new series. Add a new column <code>country code</code> with the new series after map.</p>

In [None]:
#map and add another column
income_data['countrycode'] = income_data.LOCATION.map(look_up_code)
print(income_data.shape)
income_data.head()


(35, 9)


Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag Codes,countrycode
0,AUS,AVWAGE,TOT,USD,A,2017,49125.86723,,36
1,AUT,AVWAGE,TOT,USD,A,2017,50348.9402,,40
2,BEL,AVWAGE,TOT,USD,A,2017,49674.99505,,56
3,CAN,AVWAGE,TOT,USD,A,2017,47621.84365,,124
4,CZE,AVWAGE,TOT,USD,A,2017,25372.04296,,203


In [None]:
# country with no income data
income_data[income_data['LOCATION'] == 'BRA']

Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag Codes,countrycode


# Part 2 - Mapping Income Data

## 2.1 Getting a Blank Map

In [2]:
# get the countries objects.
countries = alt.topo_feature(data.world_110m.url, 'countries')
print(type(countries))

<class 'altair.vegalite.v5.schema.core.UrlData'>


The above code indicates that we want to extract Geo features from the specified url for the countries object. All countries are represented using numeric code. Let us draw the world map by passing countries to Altair.

Projections map from a data domain (spatial position) to a visual range (pixel position). We can also specify projection parameters, such as scale (zoom level) and translate (panning), to customize the projection settings.

In [None]:
#blank map
alt.Chart(countries).mark_geoshape(
    fill='#666666',
    stroke='white'
).properties(
    width=750,
    height=450
).project('equirectangular')

## 2.2 Adding Income Data to the Map

To encode income data in the geo shape, using the <code>transform_lookup</code> method, we can arrange for country in the geographic data to be matched against the country in our income data frame.

In [None]:
alt.Chart(countries).mark_geoshape(stroke='black', strokeWidth=0.5).encode(
    tooltip = 'LOCATION:N',
    color=alt.Color('Value:Q', scale=alt.Scale(scheme='plasma'))
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(income_data, 'countrycode', ['Value', 'LOCATION'])
).project(
    type = 'equirectangular'
).properties(
    width=750,
    height=450,
    title = "Income by Country"
)

The big white space indicates we only have income data for a small number of countries.

# Your implementation of section 6.3 starts from here...

Clearly mark the question number and the map.

In [4]:
wd = pd.read_csv('http://pluto.hood.edu/~dong/datasets/world_countries.csv')
print(wd.columns)
wd.head(2)


Index(['Country', 'Code', 'Region', 'Population', 'Area', 'Pop. Density',
       'Coastline', 'Net migration', 'Infant mortality', 'GDP', 'Literacy',
       'Phones', 'Arable', 'Crops', 'Other', 'Climate', 'Birthrate',
       'Deathrate', 'Agriculture', 'Industry', 'Service'],
      dtype='object')


Unnamed: 0,Country,Code,Region,Population,Area,Pop. Density,Coastline,Net migration,Infant mortality,GDP,...,Phones,Arable,Crops,Other,Climate,Birthrate,Deathrate,Agriculture,Industry,Service
0,Afghanistan,AFG,ASIA (EX. NEAR EAST),31056997,647500,48.0,0.0,23.06,163.07,700.0,...,3.2,12.13,0.22,87.65,1.0,46.6,20.34,0.38,0.24,0.38
1,Albania,ALB,EASTERN EUROPE,3581655,28748,124.6,1.26,-4.93,21.52,4500.0,...,71.2,21.09,4.42,74.49,3.0,15.11,5.22,0.232,0.188,0.579


In [None]:
#use a different end point.
code = 'USA'
url = f'https://restcountries.com/v3.1/alpha?codes={code}'
res = requests.get(url)
res.status_code

200

In [None]:
print(res.json()[0]['ccn3'])
res.json()

840


[{'name': {'common': 'United States',
   'official': 'United States of America',
   'nativeName': {'eng': {'official': 'United States of America',
     'common': 'United States'}}},
  'tld': ['.us'],
  'cca2': 'US',
  'ccn3': '840',
  'cca3': 'USA',
  'cioc': 'USA',
  'independent': True,
  'status': 'officially-assigned',
  'unMember': True,
  'currencies': {'USD': {'name': 'United States dollar', 'symbol': '$'}},
  'idd': {'root': '+1',
   'suffixes': ['201',
    '202',
    '203',
    '205',
    '206',
    '207',
    '208',
    '209',
    '210',
    '212',
    '213',
    '214',
    '215',
    '216',
    '217',
    '218',
    '219',
    '220',
    '224',
    '225',
    '227',
    '228',
    '229',
    '231',
    '234',
    '239',
    '240',
    '248',
    '251',
    '252',
    '253',
    '254',
    '256',
    '260',
    '262',
    '267',
    '269',
    '270',
    '272',
    '274',
    '276',
    '281',
    '283',
    '301',
    '302',
    '303',
    '304',
    '305',
    '307',
    '3

In [None]:
#function end point 3. This is new link.
def look_up_code_v2(code):
    address = f'https://restcountries.com/v3.1/alpha?codes={code}'
    res = requests.get(address)
    # Check if the request was successful (status code 200)
    if res.status_code == 200:
        country_info = res.json()[0]
        country_num = country_info['ccn3']
        return int(country_num)
    else:
        # Handle cases where the API request fails
        print(f"Error looking up code for {code}: Status code {res.status_code}")
        # Returning a default value or raising an exception is recommended
        return None  # or raise ValueError(f"Invalid country code: {country_code}")

In [None]:
#wd['countrycode'] = wd.Country.map(look_up_country)
wd['countrycode'] = wd.Code.map(look_up_code_v2)
print(wd.shape)
wd.head()

(224, 22)


Unnamed: 0,Country,Code,Region,Population,Area,Pop. Density,Coastline,Net migration,Infant mortality,GDP,...,Arable,Crops,Other,Climate,Birthrate,Deathrate,Agriculture,Industry,Service,countrycode
0,Afghanistan,AFG,ASIA (EX. NEAR EAST),31056997,647500,48.0,0.0,23.06,163.07,700.0,...,12.13,0.22,87.65,1.0,46.6,20.34,0.38,0.24,0.38,4
1,Albania,ALB,EASTERN EUROPE,3581655,28748,124.6,1.26,-4.93,21.52,4500.0,...,21.09,4.42,74.49,3.0,15.11,5.22,0.232,0.188,0.579,8
2,Algeria,DZA,NORTHERN AFRICA,32930091,2381740,13.8,0.04,-0.39,31.0,6000.0,...,3.22,0.25,96.53,1.0,17.14,4.61,0.101,0.6,0.298,12
3,American Samoa,ASM,OCEANIA,57794,199,290.4,58.29,-20.71,9.27,8000.0,...,10.0,15.0,75.0,2.0,22.46,3.27,,,,16
4,Andorra,AND,WESTERN EUROPE,71201,468,152.1,0.0,6.6,4.05,19000.0,...,2.22,0.0,97.78,3.0,8.71,6.25,,,,20


In [6]:
# yet another end point. only pull out the needed information.
url = 'https://restcountries.com/v3.1/independent?status=true&fields=ccn3,cca3'
res_all = requests.get(url)
res_all.status_code

200

In [7]:
res_all.json()[0]

{'ccn3': '308', 'cca3': 'GRD'}

In [8]:
country_df = pd.DataFrame(res_all.json())
country_df.head()

Unnamed: 0,ccn3,cca3
0,308,GRD
1,756,CHE
2,694,SLE
3,348,HUN
4,52,BRB
