Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download admin area cod (common operational dataset) for any country #5

Open
andysouth opened this issue Feb 18, 2020 · 19 comments
Open

Comments

@andysouth
Copy link

andysouth commented Feb 18, 2020

I want to write some code allowing users to download the admin area COD for any country and admin level.

I'm coming up against a few inconsistencies in hdx tags and format that make this tricky.

I've put all this in one issue for now, in case there is a better way that you can point me to. I can break these up into individual issues if that helps.

What I'm trying to do.

  1. write a query that returns a single dataset for the admin area COD
  2. identify the shapefile resource (assuming that is commonest)
  3. get list of layers
  4. identify and download layer for a specified admin level

Current issues (for examples see in the code below) :

  1. I'm struggling to get a query that reliably returns just the one dataset
  2. shapefiles are sometimes tagged as 'zipped shapefile' sometimes 'zipped shapefiles'
  3. sometimes the zipfile contains a subfolder that stops it being opened by sf

Thanks.

    iso3clow <- 'nga'
    #iso3clow <- 'mli'
    level <- 2

    #nigeria does return single result
    #mali returns two datasets first one is population
    querytext <- paste0('vocab_Topics:("common operational dataset - cod" AND "gazetteer" NOT "baseline population") AND groups:', iso3clow)

    rhdx::set_rhdx_config()
    datasets_list <- rhdx::search_datasets(fq = querytext)

    #query needs to return a single dataset (with multiple resources)
    ds <- datasets_list[[1]]

    #get list of resources
    list_of_rs <- rhdx::get_resources(ds)
    list_of_rs

    #selecting resource
    #nigeria "zipped shapefiles"
    #mali "zipped shapefile"
    ds_id <- which( rhdx::get_formats(ds) %in% c("zipped shapefiles","zipped shapefile"))

    rs <- rhdx::get_resource(ds, ds_id)

    # find which layers in file
    mlayers <- rhdx::get_resource_layers(rs, download_folder=getwd())


    #error for nigeria
    #<HDX Resource> aa69f07b-ed8e-456a-9233-b20674730be6
    #Name: nga_adm_osgof_20190417_SHP.zip
    #Format: ZIPPED SHAPEFILES
    #Error: This (spatial) data format is not yet supported
    #in hdx resources.r
    # supported_geo_format <- c("geojson", "zipped shapefile", "zipped geodatabase",
    #                           "zipped geopackage", "kmz", "zipped kml")
    #added "zipped shapefiles" option to supported_geo_format in my local branch of rhdx
    #now I get
    #Cannot open data source /vsizip/C:/rsprojects/afriadmin/nga_adm_osgof_20190417_shp.zip
    #Error in CPL_get_layers(dsn, options, do_count) : Open failed.
    #can I open a layer from the downloaded file directly ?
    #using default should open the first layer
    sflayer <- rhdx::read_resource(rs, download_folder=getwd())
    plot(sf::st_geometry(sflayer))
    #no this also fails
    #seemingly because there is a subfolder within the zip
    #aha, nigeria is in a folder within the zip and mali isn't so nigeria fails and mali works
    #is there a way of detecting and dealing with this ?

    # later read layer using layername
    # this relies on all country layers having adm* in their names    
    layername <- mlayers$name[ grep(paste0("adm",level),mlayers$name) ]

    sflayer <- read_resource(re, layer=layername, download_folder=getwd())
    
    #test plotting
    plot(sf::st_geometry(sflayer)) 
@dickoa
Copy link
Owner

dickoa commented Feb 19, 2020

Hi Andy,

I also think that having a way to quickly get the admin-AB COD of each country would be awesome. I'm no longer working with OCHA but I can still ask them to fix some issues like wrong file formats name, zip files with sub-folder but also how to get a list of all COD. I think they maintain a list of all admin-AB COD.

I will come back to you ASAP.
Thanks again

@andysouth
Copy link
Author

Thankyou Ahmadou,
I'm also happy to correspond with OCHA people if that helps.

@dickoa
Copy link
Owner

dickoa commented Feb 19, 2020

I will also explore other options using CKAN facets in parallel to solve this issue.
You can contact the OCHA FIS team at ocha-fis-data at un dot org (you'll probably have Tom Haythornthwaite who's in charge of vetting all COD-AB).
Let me know how it goes
Thanks

@hayttom
Copy link

hayttom commented Feb 19, 2020

Andy, I think I now understand your approach a bit better than I did when I just emailed you about live services; I do see now how you want to read from HDX systematically and that needn't involve the live services. Sorry.

Regarding the situation you discovered where the geodatabases are not arranged uniformly, that is down to me as I prepare them and I suppose I should standardize them. Which would be the preferred arrangement? Could your systems report which do not conform?

@SimonbJohnson
Copy link

@andysouth - does the source have to be HDX? There is a smaller subset (about 60 countries) on the COD ITOS service.

My example python code to download as geojson. https://github.com/simonbjohnson/cod_topo

I also convert them to topojson and standardise the attribute names

@andysouth
Copy link
Author

Thanks @hayttom !
To consume from R at step 3 above it is easier if the shapefiles are not in a subfolder in the zip (i.e. following current Mali rather than Nigeria.
For step 1, it would be good to have a tag(s) that only occurs once for each country to indicate the COD admin boundaries (it is close currently but some examples e.g. Mali return more than one record). It may be that the query above could be improved to avoid that.
Yes, I can run some code to report on which countries conform.

@andysouth
Copy link
Author

Thanks @SimonbJohnson
The source doesn't have to be HDX but we are aiming to cover as many countries as possible, which is why I started with HDX. I'll have a look into your live service code.

@hayttom
Copy link

hayttom commented Feb 19, 2020

Hi Andy,

I guess we have inconsistencies in the HDX zip file arrangement for shapefiles and also for geodatabases - gulp. Fortunately this is something I can work on without disrupting the dataset URLs or even the HDX resources - just by fixing the content. I'm adding it to the checklist I'm following for my own internal audit.

I'm not sure I follow about the hope for a unique COD admin boundary instances - the COD tag is not just for admin boundaries and so Mali legitimately has eight COD datasets. Are we on the same page? This not withstanding, we do encounter some cases of a country having more than one admin boundaries COD but it's not supposed to happen. The problem happens when some of my colleagues in country or regional offices get too enthusiastic.

@andysouth
Copy link
Author

Hi Tom,
Good to hear the structure is fixable without too much disruption.

For the tags I'd like to be able to query to just get the admin boundaries

So far with my query below
#nigeria does return single result
#mali returns two datasets first one is population
querytext <- paste0('vocab_Topics:("common operational dataset - cod" AND "gazetteer" NOT "baseline population") AND groups:', iso3clow)

Is there anything I can add to the Mali query to exclude the first record below ? Or alternatively if it did have a tag for 'baseline population' that would work too.

[[1]]
ce21c7db-d8f0-40f8-adc2-452d2d2d105c
Title: Mali administrative level 0-3 population statistics
Name: population-projection-2018-of-mali-admin-levels-3-disaggregated-by-sex
Date: 03/07/2018
Tags (up to 5): common operational dataset - cod, gazetteer
Locations (up to 5): mli
Resources (up to 5): Mali_Population_communes_sexe_2018.xls, mli_pop_adm0.csv, mli_pop_adm1.csv, mli_pop_adm2.csv, mli_pop_adm3.csv

[[2]]
d2ec62bb-5a93-436d-8297-88b3ee9b6818
Title: Mali administrative level 0-3 boundaries
Name: administrative-boundaries-cod-mli
Date: 06/01/2015
Tags (up to 5): common operational dataset - cod, gazetteer, geodata
Locations (up to 5): mli
Resources (up to 5): MLI COD-AB 2019_08_07.pdf, MLI_AdminBoundaries_TabularData.xlsx, mli_adm_1m_dnct_2019_SHP.zip, mli_adm_1m_dnct_2019_EMF.zip, mli_adm_1m_dnct_2019_KMZ.zip

@hayttom
Copy link

hayttom commented Feb 20, 2020

Hi Andy,

Regarding "To consume from R at step 3 above it is easier if the shapefiles are not in a subfolder in the zip (i.e. following current Mali rather than Nigeria" I have re-arranged Nigeria, updated our SOP, and will work through the rest of our CODs, starting with the Dark Continent.

Sincerely, Tom

@dickoa
Copy link
Owner

dickoa commented Feb 20, 2020

Thanks a lot @hayttom for this.

@andysouth
Copy link
Author

Many thanks @hayttom,

I just tried downloading the Nigeria shapefile from rhdx and directly from the HDX website and it doesn't seem to be working yet. This was the error message when I clicked the download button on the HDX website.

image

@SimonbJohnson
Copy link

@andysouth - I do recommend seeing whether the ITOS service matches your requirements, as it will save you a lot of time.

Full list of supported files:
https://github.com/SimonbJohnson/cod_topo/blob/master/itos_service.csv

Example script to download all countries(Python):
https://github.com/SimonbJohnson/cod_topo/blob/master/download.py

Script I used to standardise attributes (also converts the format I need):
https://github.com/SimonbJohnson/cod_topo/blob/master/convert.py

Geojson library as a result:
https://github.com/SimonbJohnson/cod_topo/tree/master/geoms/geojson

@andysouth
Copy link
Author

Many thanks @SimonbJohnson
That does sound useful.
I'll look into over the next couple of days.

@dickoa
Copy link
Owner

dickoa commented Feb 20, 2020

I think @SimonbJohnson code can be packaged into a nice R data package to serve COD-AB.
I can also have a quick look this weekend and start something. Thanks a lot @SimonbJohnson for this.
@andysouth I pushed a minor change to support directly "zipped shapefiles", you can read Nigeria data zipped shapefiles directly with it.

@hayttom
Copy link

hayttom commented Feb 21, 2020

@ALL, now that Ahmadou has made that fix and given my other tasks I won't be continuing to make the COD shapefile zipfile arrangements uniform, except in new cases or when other ad hoc adjustments are necessary. It's been a good learning lesson but our support must focus on the live services. Unfortunately all our 50 strategic counties (except Jordan) are now fulfilled so we do not expect to be expanding COD coverage in Africa.

@andysouth
Copy link
Author

Thankyou @hayttom that's understandable.

@dickoa want to collaborate on creating the new package ? I'll have a look at over the weekend too. :-)

@dickoa
Copy link
Owner

dickoa commented Feb 21, 2020

@andysouth I would love to collaborate on this.
Thanks

@andysouth
Copy link
Author

I started experimenting using the standardised Geojson boundaries created by @SimonbJohnson.

R code is temporarily here
https://github.com/afrimapr/afriadmin/blob/master/R/hdxlive.r

Here is an atlas comparing the hdxlive boundaries for Africa to gadm (the former are likely to be more recent).
https://rpubs.com/southmapr/579418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants