TO DO:
- compile list of birds into one df
- Consider geospatial uses of data
- Value counts on bird species to get top 10 or so to track over years
- Merge birds table with USFWS table to get specific region & download doc off data.gov website
- Separate birds tables into USFWS regions & not
- create new column for seasons of birding

IF YOU HAVE TIME:
- try to webscrape polygons off of USFWS website & group by wildlife region

COMPLETED:
- Write a read me
- Look into mixed data type columns in birding data - will read in as str due to "X" in some rows
- Look into “sensitive species” list
- Figure out how to read in specific tabs in the xlsx files for FIA
- Separate observation date into year columm to group by 
- Clean tables to drop extraneous columns
- Figure out what years to restrict data

General notes:
- Years to look at: 2007-2016
- 1st choice: group by wildlife region; else group by county
- 

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# reading in birds, 2007 to 2016
birds = pd.read_csv('../data/eBird_2007_to_2016_TN/eBird_2007_to_2016_TN.txt', sep='\t')

# cleaning birds df
birds = birds[['GLOBAL UNIQUE IDENTIFIER', 'OBSERVATION DATE', 'TAXONOMIC ORDER', 'CATEGORY', 'COMMON NAME', 'SCIENTIFIC NAME', 'AGE/SEX', 'COUNTRY', 'STATE', 'COUNTY', 'IBA CODE', 'BCR CODE', 'USFWS CODE', 'LATITUDE', 'LONGITUDE', 'OBSERVER ID', 'TRIP COMMENTS']]
birds.columns = ['global_unique_identifier', 'observation_date', 'tax_order', 'category', 'common_name', 'scientific_name', 'age_sex', 'country', 'state', 'county', 'iba_code', 'bcr_code', 'usfws_code', 'lat', 'long', 'observer_id', 'trip_comments']

# creating additional column for observation year
birds['observation_year'] = [x[:4] for x in birds.observation_date]
birds = birds.astype({'observation_year':int})
birds.head()

In [23]:
# reading in sensitive species list
sensitive_species = pd.read_csv('../data/sensitive_species_2000_2020_TN.txt', sep='\t')
sensitive_species = sensitive_species[['GLOBAL UNIQUE IDENTIFIER', 'OBSERVATION DATE', 'TAXONOMIC ORDER', 'CATEGORY', 'COMMON NAME', 'SCIENTIFIC NAME', 'AGE/SEX', 'COUNTRY', 'STATE', 'COUNTY', 'IBA CODE', 'BCR CODE', 'USFWS CODE', 'LATITUDE', 'LONGITUDE', 'OBSERVER ID', 'TRIP COMMENTS']]
sensitive_species.columns = ['global_unique_identifier', 'observation_date', 'tax_order', 'category', 'common_name', 'scientific_name', 'age_sex', 'country', 'state', 'county', 'iba_code', 'bcr_code', 'usfws_code', 'lat', 'long', 'observer_id', 'trip_comments']

# creating additional column for observation year
sensitive_species['observation_year'] = [x[:4] for x in sensitive_species.observation_date]
sensitive_species = sensitive_species.astype({'observation_year':int})

# restricting to relevant years
sensitive_species = sensitive_species.loc[(sensitive_species.observation_year < 2017)&(sensitive_species.observation_year > 2006)].sort_values('observation_year').reset_index(drop=True)

In [4]:
# importing tn counties shape
tn_counties = gpd.read_file('../data/county/tncounty.shp')

In [25]:
# reading in USFWS codes
usfws_codes = pd.read_csv('../data/eBird_2007_to_2016_TN/USFWSCodes.txt', sep='\t')

# cleaning usfws codes df
usfws_codes = usfws_codes.reset_index()
usfws_codes['usfws_name'] = np.where(usfws_codes['USFWS NAME'].isnull(), usfws_codes['USFWS CODE'], usfws_codes['USFWS NAME'])
usfws_codes = usfws_codes[['index','usfws_name']]
usfws_codes.columns = ['usfws_code', 'usfws_name']

  birds = pd.read_csv('../data/eBird_2007_to_2016_TN/eBird_2007_to_2016_TN.txt', sep='\t')


Unnamed: 0,global_unique_identifier,observation_date,tax_order,category,common_name,scientific_name,age_sex,country,state,county,iba_code,bcr_code,usfws_code,lat,long,observer_id,trip_comments,observation_year
0,URN:CornellLabOfOrnithology:EBIRD:OBS36173336,2007-01-18,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr104960,,2007
1,URN:CornellLabOfOrnithology:EBIRD:OBS269070598,2007-01-16,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr242764,,2007
2,URN:CornellLabOfOrnithology:EBIRD:OBS36173326,2007-01-18,11697,species,American Kestrel,Falco sparverius,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr104960,,2007
3,URN:CornellLabOfOrnithology:EBIRD:OBS36173330,2007-01-18,689,species,Bufflehead,Bucephala albeola,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr104960,,2007
4,URN:CornellLabOfOrnithology:EBIRD:OBS269070604,2007-01-16,689,species,Bufflehead,Bucephala albeola,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr242764,,2007


In [28]:
# creating subset of birds sighted within usfws polygon:
birds_usfws = birds.loc[~birds.usfws_code.isna()]

# joining birds_usfws with usfws codes
birds_usfws = pd.merge(birds_usfws, usfws_codes, on='usfws_code', how='left')

birds_usfws.shape

(100948, 19)

In [31]:
# creating separate list of sightings outside of USFWS regions
birds_outside_usfws = birds.loc[birds.usfws_code.isna()]
birds_outside_usfws.drop(columns=['usfws_code', 'iba_code'])
birds_outside_usfws['within_park'] = False

birds_outside_usfws.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  birds_outside_usfws['within_park'] = False


Unnamed: 0,global_unique_identifier,observation_date,tax_order,category,common_name,scientific_name,age_sex,country,state,county,iba_code,bcr_code,usfws_code,lat,long,observer_id,trip_comments,observation_year,within_park
0,URN:CornellLabOfOrnithology:EBIRD:OBS36173336,2007-01-18,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr104960,,2007,False
1,URN:CornellLabOfOrnithology:EBIRD:OBS269070598,2007-01-16,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr242764,,2007,False
2,URN:CornellLabOfOrnithology:EBIRD:OBS36173326,2007-01-18,11697,species,American Kestrel,Falco sparverius,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr104960,,2007,False
3,URN:CornellLabOfOrnithology:EBIRD:OBS36173330,2007-01-18,689,species,Bufflehead,Bucephala albeola,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr104960,,2007,False
4,URN:CornellLabOfOrnithology:EBIRD:OBS269070604,2007-01-16,689,species,Bufflehead,Bucephala albeola,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr242764,,2007,False


## EDA

In [32]:
birds.common_name.value_counts().head(20)

common_name
Northern Cardinal          100385
American Crow               87583
Carolina Chickadee          83442
Blue Jay                    80864
Carolina Wren               79536
Tufted Titmouse             75834
Mourning Dove               74332
American Robin              71385
Red-bellied Woodpecker      64634
Northern Mockingbird        60299
American Goldfinch          57098
Downy Woodpecker            55495
Eastern Towhee              53208
European Starling           52633
Eastern Bluebird            51842
Song Sparrow                50361
Canada Goose                44015
Great Blue Heron            43547
House Finch                 40041
White-breasted Nuthatch     39263
Name: count, dtype: int64

In [33]:
birds_usfws.common_name.value_counts().head(30)

common_name
Great Blue Heron            2399
American Crow               2171
Northern Cardinal           2002
Killdeer                    1945
Canada Goose                1807
Carolina Wren               1753
Mallard                     1750
Red-bellied Woodpecker      1603
Bald Eagle                  1579
Tufted Titmouse             1572
Turkey Vulture              1523
Double-crested Cormorant    1506
Blue Jay                    1469
Ring-billed Gull            1453
Carolina Chickadee          1451
Red-winged Blackbird        1415
Eastern Bluebird            1334
Pied-billed Grebe           1317
Great Egret                 1266
Mourning Dove               1252
Gadwall                     1227
Downy Woodpecker            1107
Red-tailed Hawk             1097
American Coot               1049
White-throated Sparrow      1029
American Goldfinch          1028
Song Sparrow                1027
Northern Flicker             951
Pileated Woodpecker          941
Eastern Towhee               93

In [34]:
# count of sightings by observation year & common name
birds.groupby(['observation_year', 'common_name']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,global_unique_identifier,observation_date,tax_order,category,scientific_name,age_sex,country,state,county,iba_code,bcr_code,usfws_code,lat,long,observer_id,trip_comments
observation_year,common_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2007,Acadian Flycatcher,160,160,160,160,160,0,160,160,160,81,160,46,160,160,160,66
2007,Accipiter sp.,2,2,2,2,2,0,2,2,2,0,2,0,2,2,2,2
2007,Alder Flycatcher,7,7,7,7,7,0,7,7,7,3,7,0,7,7,7,4
2007,Allen's Hummingbird,2,2,2,2,2,0,2,2,2,0,2,0,2,2,2,1
2007,American Avocet,20,20,20,20,20,0,20,20,20,3,20,1,20,20,20,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2016,teal sp.,2,2,2,2,2,0,2,2,2,0,2,0,2,2,2,0
2016,tern sp.,4,4,4,4,4,0,4,4,4,2,4,2,4,4,4,1
2016,thrush sp.,6,6,6,6,6,0,6,6,6,1,6,0,6,6,6,1
2016,vireo sp.,7,7,7,7,7,0,7,7,7,6,7,0,7,7,7,5


### attempting webscraping:

come back to later

In [None]:
# come back to later!!
# download Selenium webdriver

In [None]:
# # importing beautiful soup
# import requests
# from bs4 import BeautifulSoup as BS

In [None]:
# # establishing connection
# URL = 'https://www.fws.gov/refuge/tennessee/map'
# response = requests.get(URL)

In [None]:
# # checking connection
# response.status_code

In [None]:
# soup = BS(response.text)

In [None]:
# soup.findAll('path', attrs={'class':'leaflet-interactive'})