#  Open Data Science for Ornithology based on the British Trust for Ornithology (BTO) Bird Atlas 2007-11

#### The purpose of this python notebook is to promote the work of the BTO and their contribution to open data science using the [BTO atlas](https://www.bto.org/volunteer-surveys/birdatlas). 


#### The BTO atlas was a survey that covered all 10-km squares in the UK and Ireland and in both winter and the breeding season between 2007 and 2011. Fieldwork was completed following the breeding season of 2011, and the data was analysed and published in 2014. Details on the [methods](https://www.bto.org/volunteer-surveys/birdatlas/methods) and [results](https://www.bto.org/volunteer-surveys/birdatlas/results) are available.

Any research outputs based on the BTO atlas must respect its publication under a Creative Commons licence, with Attribution, Non-commercial v4.0 (CC-BY-NC). See here for full details of what is permitted: https://creativecommons.org/licenses/by-nc/4.0/_

_Citation: Gillings, S., Balmer, D.E., Caffrey, B.J., Downie, I.S., Gibbons, D.W., Lack, P.C., Reid, J.B., Sharrock, J.T.R., Swann, R.L. & Fuller, R.J. (in press) Breeding and wintering bird distributions in Britain and Ireland from citizen science bird atlases. Global Ecology and Biogeography._

_BTO, 02 February 2019_

The author of this python notebook was not involved in the research. The author is employed by [Open Data Services Coop](http://www.opendataservices.coop/) which is a cooperative with the mission to promote open data. Please consider [joining the BTO](https://www.bto.org/support-us) or making a donation to them.  

------------------------------------------------

## The Data

16 million records were used in the production of the research Atlas and further details on the size of the databases can be found [here](https://www.bto.org/volunteer-surveys/birdatlas/results/data). 

A downloadable link to the data set can be found [here](
https://www.bto.org/our-science/publications/peer-reviewed-papers/breeding-and-wintering-bird-distributions-britain-and).


In [1]:
# The method employed here is to download the zip file from the BTO site:
# https://www.bto.org/sites/default/files/atlas_open_data_files.zip
# create a folder in Downloads, name it "BTO_atlas_open_data"
# and extract the 5 files.

# note, to read the data into pandas requires the following encoding = "ISO-8859-1"

import pandas as pd

species_lookup = pd.read_csv('../Downloads/BTO_atlas_open_data/species_lookup.csv', encoding = "ISO-8859-1")
distributions = pd.read_csv('../Downloads/BTO_atlas_open_data/distributions.csv', encoding = "ISO-8859-1")
grid_square_coordinates = pd.read_csv('../Downloads/BTO_atlas_open_data/grid_square_coordinates_lookup.csv', encoding = "ISO-8859-1")
distribution_changes = pd.read_csv('../Downloads/BTO_atlas_open_data/distribution_changes.csv', encoding = "ISO-8859-1")
percent_benchmark_species_detected = pd.read_csv('../Downloads/BTO_atlas_open_data/percent_benchmark_species_detected.csv', encoding = "ISO-8859-1")

# Percent Benchmark Species Detected

The Percent benchmark species detected data is explored in the cells below. 
There are benchmark species percentages for breeding (ba) AND wintering (wa) areas for 2010. Only breeding OR wintering area benchmark species percentages are available for 1970, 1980 and 1990.  We can see that there is a tenkm variable which refers to the alphabetic OS grid system in Figure 1. The tenkm notation contains two alphabetic letters (100km squares) and two numbers which are assigned according to https://www.ordnancesurvey.co.uk/docs/support/guide-to-nationalgrid.pdf. For example, Land's End, the south west tip of England would be located within tenkm square SW32.

![map squares](https://www.bto.org/sites/default/files/u36/nationalgrid_0.jpg)


In [2]:
# here are the first few rows of the dataset

percent_benchmark_species_detected.head()

Unnamed: 0,tenkm,pbench_ba1970,pbench_wa1980,pbench_ba1990,pbench_ba2010,pbench_wa2010
0,HP40,97,42,85,76,34
1,HP50,100,100,100,100,98
2,HP51,88,62,95,81,73
3,HP60,100,97,97,100,100
4,HP61,97,95,98,100,100


In [3]:
# setting the tenkm variable as the index

percent_benchmark_species_detected = percent_benchmark_species_detected.set_index('tenkm')

In [4]:
percent_benchmark_species_detected.head()

Unnamed: 0_level_0,pbench_ba1970,pbench_wa1980,pbench_ba1990,pbench_ba2010,pbench_wa2010
tenkm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HP40,97,42,85,76,34
HP50,100,100,100,100,98
HP51,88,62,95,81,73
HP60,100,97,97,100,100
HP61,97,95,98,100,100


In [5]:
# looking at a single location, (e.g. Land's End = 'SW32') 

percent_benchmark_species_detected.loc['SW32', :]


pbench_ba1970    100
pbench_wa1980    100
pbench_ba1990    100
pbench_ba2010    100
pbench_wa2010    100
Name: SW32, dtype: int64