<a href="https://colab.research.google.com/github/NERC-CEH/data-api-examples/blob/master/CS_r_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Countryside Survey data-api demo

[This r notebook](https://github.com/NERC-CEH/data-api-examples/blob/master/cs-r.Rmd) demonstrates how to get data from the Countryside Survey data-api.

Run it in an r environment of your choice - for example RStudio, or through a browser at [rstudio.cloud](https://rstudio.cloud).

## Find out more about the Countryside Survey api

- Full information about Countryside Survey data: https://catalogue.ceh.ac.uk/documents/2069de82-619d-4751-9904-aec8500d07e6
- Information about Countryside Survey data api and more examples: https://data-eidc.ceh.ac.uk/docs/cs


In [1]:
## Install dependencies
library(jsonlite) # tools to work with the json data coming back from data-api
library(httr) # http request libray, needed to communicate with the data-api
library(glue) # useful string templating library


## Add your authorisation token
Create a token with [Data Api Auth](https://data-eidc.ceh.ac.uk/authentication) for the following licences: LICENCE_OGL.

You will now have a token that looks something like 30dd8762241695749a32d2910d354695

Enter it when prompted when running this code block:

In [2]:
token = readline(prompt = 'Enter authorisation token:')

Enter authorisation token:8bd84eb7531fcb18d725393b3bb8990a


## Set the base url

This is the url to the Countryside Survey data api.

In [3]:
url = 'https://data-eidc.ceh.ac.uk/1.0/2069de82-619d-4751-9904-aec8500d07e6'


## Setup functions for handling data api requests

These functions are the way requests for data are sent and responses received.

They use r's httr package to do this.

In [4]:
# The /metadata endpoint
# Metadata is returned in json format
# query: this is an optional object of key-value pairs for refining the query
cs_metadata <- function(query = {}) {
  response = GET(
    url = glue::glue('{url}/metadata'),
    add_headers(Authorization = glue::glue('bearer {token}')),
    query = query
  )
  # Pull out the content from the response and turn into a string, 
  # this will be a string representation of the json data
  return(rawToChar(response$content))
}
# The /metadata/key/{key} endpoint
# Metadata is returned in json format
# key: this is the key that you want metadata for
cs_keyvalues <- function(key) {
  response = GET(
    url = glue::glue('{url}/metadata/key/{key}'),
    add_headers(Authorization = glue::glue('bearer {token}'))
  )
  return(rawToChar(response$content))
}
# The /data endpoint
# Data is returned in csv format
# bodyjson: this is the body of the request that defines what data you want
cs_data <- function(bodyjson) {
  response = POST(
    url=glue::glue('{url}/data'),
    add_headers(Authorization=glue::glue('bearer {token}'),Accept='text/csv','Content-Type'='application/json'),
    body=bodyjson
  )
  return(rawToChar(response$content))
}


## Metadata Request

The Countryside Survey Data API has a metadata endpoint (/metadata) that describes csv files available for the different surveys. It can be used to automate your work and include new Countryside Survey data as it becomes available.

The metadata is derived from the [EIDC catalogue entry](https://catalogue.ceh.ac.uk/documents/2069de82-619d-4751-9904-aec8500d07e6)

It can be used to return all metadata, or refined by the use of metadata keys to create a subset of the metadata you need (eg years=2007). The following shows both types of query, the metadata keys are described later.


In [5]:
# This gets all the metadata for all the Countryside Survey csv files
# The json data is returned as a json string from the previously defined cs_metadata() method.
# prettify() allows a quick view of the data in a nice format.
prettify(cs_metadata())
# This gets just the vegetation survey metadata for the year 2007.
prettify(cs_metadata(list(tags = "veget", years = "2007")))

[
    {
        "acronym": "ebha",
        "columns": [
            {
                "name": "YEAR",
                "type": "int64"
            },
            {
                "name": "LAND_CLASS",
                "type": "int64"
            },
            {
                "name": "BROAD_HABITAT",
                "type": "string"
            },
            {
                "name": "BROAD_HABITAT_NAME",
                "type": "string"
            },
            {
                "name": "LAND_CLASS_AREA",
                "type": "float64"
            },
            {
                "name": "MEAN_ESTIMATE",
                "type": "float64"
            },
            {
                "name": "LOWER_ESTIMATE",
                "type": "float64"
            },
            {
                "name": "UPPER_ESTIMATE",
                "type": "float64"
            },
            {
                "name": "WORDS",
                "type": "string"
            },
            {
            

[
    {
        "acronym": "vp",
        "columns": [
            {
                "name": "YEAR",
                "type": "int64"
            },
            {
                "name": "SQUARE",
                "type": "string"
            },
            {
                "name": "PLOT",
                "type": "string"
            },
            {
                "name": "AMALG_PTYPE",
                "type": "string"
            },
            {
                "name": "BRC_NUMBER",
                "type": "float64"
            },
            {
                "name": "BRC_NAMES",
                "type": "string"
            },
            {
                "name": "NEST_LEVEL",
                "type": "float64"
            },
            {
                "name": "ZERO_COVER",
                "type": "float64"
            },
            {
                "name": "FIRST_COVER",
                "type": "float64"
            },
            {
                "name": "TOTAL_COVER",
     

## Metadata key request

Here is a breakdown of the metadata keys you saw when running the code in the previous example.  They allow the metadata request to be filtered to refine what is returned (eg years=2007).

key | description
 ---|---
`acronym` | Shorthand for datasets
`columns` | All the columns in the dataset, quicker than getting the data and interogating it, it also describe the format of each column
`files` | Each dataset has different files, if you want to know where the data originally came from. This also describes the columes of the data
`parent` | Each dataset is part of a parent dataset which you can go search for in the EIDC catalogue
`parent_title` | The title of the parent dataset
`tag` | A common tag for datasets which share similar data, this value is derived from the Catalogue
`title` | The title of the dataset in the catalogue
`uid` | The id of the dataset which you can use to find in the catalogue
`words` | Keywords from the title which may be useful for understanding the dataset without having to read all the metadata (alone)
`year` | The year the dataset was taken

To see what values are available for each key, use the metadata key endpoint (/metadata/key/{key}) as shown in the examples below.


In [6]:
#Using the function defined earlier, get all the years in the data.
prettify(cs_keyvalues('year'))
#Using the function defined earlier, get all the tags in the data.
prettify(cs_keyvalues('tag'))

[
    "1978",
    "1984",
    "1990",
    "2000",
    "2007"
]
 

[
    "featur",
    "habitat",
    "landscap",
    "mite",
    "river",
    "soil",
    "veget",
    "wat"
]
 

## Data endpoint

The data endpoint (/data) is where you can get the data in CSV format. It automatically combines common data together and will attempt to merge datasets together using an outer join on common column names.

A simple example follows, however more complex examples of data querying written in Python can be found in [these github examples](https://github.com/NERC-CEH/data-api-examples/blob/master/cs-python3-cs.ipynb)

### Example

This example pulls out some data and prints some values.

Pull out data for 'linear and point features' for 2007.  Just get the columns: SQUARE, LAND_CLASS, FEATURE, MEAN_ESTIMATE


In [9]:
# Setup the json filter that needs passing into the request
json_body <- jsonlite::toJSON(
  list(
    metadata = list(
      tags=list('featur'), 
      years=list('2007'),
      colums=list('SQUARE','LAND_CLASS','FEATURE','MEAN_ESTIMATE')
    )
  ), auto_unbox = TRUE
)
# Get the CS data using the cs_data() method previously defined and pass it into a data frame
df <- read.table(text = cs_data(json_body), sep =",", header = TRUE, stringsAsFactors = FALSE)
# Have a look at some of the data
unique(df[,"SQUARE", drop=FALSE])
unique(df[,"FEATURE", drop=FALSE])
mean(df[,"MEAN_ESTIMATE"], na.rm = TRUE)

Unnamed: 0_level_0,SQUARE
Unnamed: 0_level_1,<chr>
1,
316,XDEYXT
434,YDSEWL
860,YDGWZL
1001,EYAJMO
1327,PVFUKI
2099,USETNA
2273,NPPRGB
2538,RXBGEW
2712,UAFKOG


Unnamed: 0_level_0,FEATURE
Unnamed: 0_level_1,<chr>
1,Total
2,Hedges
3,Wall
4,Line of Trees + Fence
5,Line of Trees
6,Bank/Grass Strip
7,Fence
316,
