# Working with vector data: Exercises

**Author**: Andrea Ballatore (Birkbeck, University of London)

**Abstract**: Learn how to load, process, and save geospatial vector data using various formats.

## Setup
This is to check that your environment is set up correctly (it should print 'env ok', ignore warnings).

In [9]:
# check environment
import os
print("Conda env:", os.environ['CONDA_DEFAULT_ENV'])
assert os.environ['CONDA_DEFAULT_ENV'] == 'geoprogv1'
# spatial libraries 
import fiona as fi
import geopandas
import pandas as pd
import pysal as sal

print('env ok')

Conda env: geoprogv1
env ok


-----
## Exercises

When you are in doubt about how a package or a function work, use the Python website (https://docs.python.org/3.9/) and **Google** to find relevant documentation. `geopandas` is the main package used in these exercises.

### a.
Consider these datasets: world country borders and World Bank indicators. First, remove Antarctica and project the geometries in Eckert IV (plot the geometries to make sure the result is correct). Second, select World Bank data for 2019. Reuse and adapt the code from the lecture.

In [10]:
countries = geopandas.read_file("data/natural_earth_world_boundaries_50m_2018.geojson")
print(len(countries))
wb_df = pd.read_csv('data/world_bank_indicators_2014_2019.tsv', sep='\t')
print(len(wb_df))

# enter your code here

241
1584


### b.
With the data produced in the previous step, focus on these two demographic variables in 2019: `sp_pop_0014_to_zs` (percentage of people between 0 and 14 years of age) and `sp_pop_65up_to_zs` (percentage of people 65 or older). Extract only these two indicators to a new data frame (with the country code to identify the country). With the function `merge`, join this data with the country boundaries.

In [17]:
# enter your code here
wb_df

Unnamed: 0,country_code,gb_xpd_rsdv_gd_zs,iq_cpa_envr_xq,ms_mil_xpnd_gd_zs,ny_gdp_mktp_cd,ny_gdp_pcap_pp_cd,ny_gnp_mktp_pp_cd,sh_sta_airp_p5,sh_sta_mmrt_ne,sh_sta_traf_p5,sh_xpd_chex_gd_zs,si_pov_dday,si_pov_mdim,si_pov_mdim_xq,sp_dyn_le00_in,sp_dyn_tfrt_in,sp_pop_0014_to_zs,sp_pop_65up_to_zs,year
0,ABW,,,,2.765363e+09,36444.262057,3.641308e+09,,,,,,,,75.583,1.834,19.111724,11.707171,2014
1,AFG,,2.5,1.298013,2.048489e+10,2069.424642,6.905035e+10,,,,9.528878,,,,62.966,5.163,45.640589,2.451575,2014
2,AGO,,,4.698455,1.457122e+11,8179.296007,2.069808e+11,,,,2.434129,,,,58.776,5.864,47.177159,2.303044,2014
3,ALB,,,1.346516,1.322814e+10,11259.246206,3.282990e+10,,,,5.503493,1.6,,,77.813,1.688,19.220282,12.180692,2014
4,AND,,,,3.271808e+09,,,,,,5.979125,,,,,,,,2014
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1579,XKX,,3.5,0.824281,7.926134e+09,11870.797576,2.178237e+10,,,,,,,,,,,,2019
1580,YEM,,2.5,,2.258108e+10,,,,,,,,,,,,39.223375,2.902141,2019
1581,ZAF,,,0.976948,3.514316e+11,13034.164661,7.417382e+11,,,,,,,,,,28.968325,5.415256,2019
1582,ZMB,,3.5,1.211039,2.330977e+10,3624.024939,6.361639e+10,,,,,,,,,,44.462509,2.115315,2019


### c. 
With the data produced in the previous steps, focus on European countries (condition: `continent=='Europe'`). Generate two bar charts: percentage of young people (0-14) in each country, ordered from high to low; percentage of senior people (65+) in each country, ordered from high to low. Use `seaborn` package.

In [12]:
# enter your code here

### d. 
With the data produced in the previous steps, plot the distribution of percentage of young and senior people using histograms at the global level. Then produce histograms specific to countries in Europe and Africa. You should produce 6 separate histograms either with `matplotlib` or `seaborn` (see https://seaborn.pydata.org/tutorial/distributions.html).

In [13]:
# enter your code here

### e. 
With the data produced in the previous steps, produce two choropleth maps about percentage of young and senior people in countries, using `.plot()` from `geopandas`. Choose colours and binning strategy appropriately. Save the results as PDF in the `tmp` folder.

In [14]:
# enter your code here

### f. 
Using political borders of European countries in 1914, calculate the overlap of the German Empire and the Austro-Hungarian Empire in 1914 with current countries (as of 2018). Use a `for` loop or a function to avoid repeating the same code.

In [15]:
import gzip
europe14_df = geopandas.read_file( gzip.open('data/europe_boundaries_1914.geojson.gz','rb') )

# insert your code here

### g. 
Plot the results from the previous step as bar charts, sorting them from largest to smallest percentage of overlap (see https://seaborn.pydata.org/generated/seaborn.barplot.html).

In [16]:
# insert your code here

End of notebook