TO DO:
- Figure out what years to restrict data
- Consider geospatial uses of data
- Value counts on bird species to get top 10 or so to track over years
- Merge birds table with USFWS table to get specific region & download doc off data.gov website
- Separate birds tables into USFWS regions & not
- create new column for seasons of birding

COMPLETED:

<b>X</b> Write a read me

<b>X</b> Look into mixed data type columns in birding data - will read in as str due to "X" in some rows

<b>X</b> Look into “sensitive species” list

<b>X</b> Figure out how to read in specific tabs in the xlsx files for FIA

<b>X</b> Separate observation date into year columm to group by 

<b>X</b> Clean tables to drop extraneous columns

General notes:
- 1st choice: group by wildlife region; else group by county
- try to webscrape polygons off of USFWS website

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline

In [21]:
# reading in sensitive species list
sensitive_species = pd.read_csv('../data/sensitive_species_2000_2020_TN.txt', sep='\t')
sensitive_species = sensitive_species[['GLOBAL UNIQUE IDENTIFIER', 'OBSERVATION DATE', 'TAXONOMIC ORDER', 'CATEGORY', 'COMMON NAME', 'SCIENTIFIC NAME', 'AGE/SEX', 'COUNTRY', 'STATE', 'COUNTY', 'IBA CODE', 'BCR CODE', 'USFWS CODE', 'LATITUDE', 'LONGITUDE', 'OBSERVER ID', 'TRIP COMMENTS']]
sensitive_species.columns = ['global_unique_identifier', 'observation_date', 'tax_order', 'category', 'common_name', 'scientific_name', 'age_sex', 'country', 'state', 'county', 'iba_code', 'bcr_code', 'usfws_code', 'lat', 'long', 'observer_id', 'trip_comments']

# creating additional column for observation year
sensitive_species['observation_year'] = [x[:4] for x in sensitive_species.observation_date]

sensitive_species.head(30)

Unnamed: 0,global_unique_identifier,observation_date,tax_order,category,common_name,scientific_name,age_sex,country,state,county,iba_code,bcr_code,usfws_code,lat,long,observer_id,trip_comments,observation_year
0,URN:CornellLabOfOrnithology:EBIRD:OBS891731152,2020-03-01,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Dyer,,26,,36.104154,-89.569913,obsr926054,,2020
1,URN:CornellLabOfOrnithology:EBIRD:OBS865274217,2020-02-03,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Dyer,,26,,36.104154,-89.569913,obsr167616,,2020
2,URN:CornellLabOfOrnithology:EBIRD:OBS886003275,2020-02-13,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Dyer,,26,,36.104154,-89.569913,obsr163792,,2020
3,URN:CornellLabOfOrnithology:EBIRD:OBS1031134565,2020-03-01,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Dyer,,26,,36.104154,-89.569913,obsr910205,,2020
4,URN:CornellLabOfOrnithology:EBIRD:OBS1009893767,2020-11-08,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Davidson,,24,,36.152192,-86.928407,obsr212351,"Searching for Northern Saw-whet Owl, but had s...",2020
5,URN:CornellLabOfOrnithology:EBIRD:OBS699417110,2000-02-28,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Obion,US-TN_2831,27,,36.252372,-88.980503,obsr167616,Mark Greene and David Pitts observers. I met D...,2000
6,URN:CornellLabOfOrnithology:EBIRD:OBS1048174965,2020-12-18,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Lake,,26,,36.200235,-89.523296,obsr167616,,2020
7,URN:CornellLabOfOrnithology:EBIRD:OBS869886141,2020-02-22,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Dyer,,26,,36.104154,-89.569913,obsr167616,Birding with Victor Stoll & Tammy Griffey.,2020
8,URN:CornellLabOfOrnithology:EBIRD:OBS897749935,2020-02-15,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Dyer,,26,,36.104154,-89.569913,obsr714514,,2020
9,URN:CornellLabOfOrnithology:EBIRD:OBS866062794,2020-02-17,8964,species,Long-eared Owl,Asio otus,,United States,Tennessee,Dyer,,26,,36.104154,-89.569913,obsr347082,,2020


In [3]:
# importing tn counties shape
tn_counties = gpd.read_file('../data/county/tncounty.shp')

In [5]:
# reading in USFWS codes
usfws_codes = pd.read_csv('../data/eBird_2010_to_2014_TN/USFWSCodes.txt', sep='\t')

# cleaning usfws codes df
usfws_codes = usfws_codes.reset_index()
usfws_codes['usfws_name'] = np.where(usfws_codes['USFWS NAME'].isnull(), usfws_codes['USFWS CODE'], usfws_codes['USFWS NAME'])
usfws_codes = usfws_codes[['index','usfws_name']]
usfws_codes.columns = ['usfws_code', 'usfws_name']

In [22]:
# reading in birds 2010 to 2014
birds_2010_2014 = pd.read_csv('../data/eBird_2010_to_2014_TN/eBird_2010_to_2014_TN.txt', sep='\t')

# cleaning short birds df
short_birds = birds_2010_2014[['GLOBAL UNIQUE IDENTIFIER', 'OBSERVATION DATE', 'TAXONOMIC ORDER', 'CATEGORY', 'COMMON NAME', 'SCIENTIFIC NAME', 'AGE/SEX', 'COUNTRY', 'STATE', 'COUNTY', 'IBA CODE', 'BCR CODE', 'USFWS CODE', 'LATITUDE', 'LONGITUDE', 'OBSERVER ID', 'TRIP COMMENTS']]
short_birds.columns = ['global_unique_identifier', 'observation_date', 'tax_order', 'category', 'common_name', 'scientific_name', 'age_sex', 'country', 'state', 'county', 'iba_code', 'bcr_code', 'usfws_code', 'lat', 'long', 'observer_id', 'trip_comments']

# creating additional column for observation year
short_birds['observation_year'] = [x[:4] for x in short_birds.observation_date]
short_birds.head()

  birds_2010_2014 = pd.read_csv('../data/eBird_2010_to_2014_TN/eBird_2010_to_2014_TN.txt', sep='\t')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  short_birds['observation_year'] = [x[:4] for x in short_birds.observation_date]


Unnamed: 0,global_unique_identifier,observation_date,tax_order,category,common_name,scientific_name,age_sex,country,state,county,iba_code,bcr_code,usfws_code,lat,long,observer_id,trip_comments,observation_year
0,URN:CornellLabOfOrnithology:EBIRD:OBS81849899,2010-01-19,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,35.958338,-84.246758,obsr18758,beautiful warm sunny day,2010
1,URN:CornellLabOfOrnithology:EBIRD:OBS82004086,2010-01-22,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.217869,-84.086437,obsr18758,"Chilly, overcast, mist. After not seeing a wi...",2010
2,URN:CornellLabOfOrnithology:EBIRD:OBS80663800,2010-01-02,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,35.996341,-84.163513,obsr54372,All within Haw Ridge Park and within Knoxville...,2010
3,URN:CornellLabOfOrnithology:EBIRD:OBS82003744,2010-01-22,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr18758,,2010
4,URN:CornellLabOfOrnithology:EBIRD:OBS287169932,2010-01-15,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.007052,-84.262025,obsr276568,,2010


In [30]:
# creating subset of birds sighted within usfws polygon:
birds_usfws = short_birds.loc[~short_birds.usfws_code.isna()]

# joining birds_usfws with usfws codes
birds_usfws = pd.merge(birds_usfws, usfws_codes, on='usfws_code', how='left')

## EDA

In [38]:
birds_outside_usfws

Unnamed: 0,global_unique_identifier,observation_date,tax_order,category,common_name,scientific_name,age_sex,country,state,county,iba_code,bcr_code,usfws_code,lat,long,observer_id,trip_comments,observation_year
0,URN:CornellLabOfOrnithology:EBIRD:OBS81849899,2010-01-19,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,35.958338,-84.246758,obsr18758,beautiful warm sunny day,2010
1,URN:CornellLabOfOrnithology:EBIRD:OBS82004086,2010-01-22,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.217869,-84.086437,obsr18758,"Chilly, overcast, mist. After not seeing a wi...",2010
2,URN:CornellLabOfOrnithology:EBIRD:OBS80663800,2010-01-02,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,35.996341,-84.163513,obsr54372,All within Haw Ridge Park and within Knoxville...,2010
3,URN:CornellLabOfOrnithology:EBIRD:OBS82003744,2010-01-22,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.116386,-84.110001,obsr18758,,2010
4,URN:CornellLabOfOrnithology:EBIRD:OBS287169932,2010-01-15,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,,28,,36.007052,-84.262025,obsr276568,,2010
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1359372,URN:CornellLabOfOrnithology:EBIRD:OBS287920849,2014-12-30,26356,species,White-breasted Nuthatch,Sitta carolinensis,,United States,Tennessee,Wilson,,24,,36.205244,-86.290054,obsr411020,,2014
1359373,URN:CornellLabOfOrnithology:EBIRD:OBS287276272,2014-12-27,26356,species,White-breasted Nuthatch,Sitta carolinensis,,United States,Tennessee,Wilson,,24,,36.205244,-86.290054,obsr411020,,2014
1359374,URN:CornellLabOfOrnithology:EBIRD:OBS284780790,2014-12-10,5421,form,American Coot,Fulica americana,,United States,Tennessee,Wilson,,24,,36.289976,-86.445100,obsr439294,"<br />Submitted from BirdLog World for iOS, ve...",2014
1359375,URN:CornellLabOfOrnithology:EBIRD:OBS284636538,2014-12-08,5421,form,American Coot,Fulica americana,,United States,Tennessee,Wilson,,24,,36.277631,-86.510124,obsr550500,,2014


In [41]:
# creating separate list of sightings outside of USFWS regions
birds_outside_usfws = short_birds.loc[short_birds.usfws_code.isna()]
birds_outside_usfws.drop(columns=['usfws_code', 'iba_code'])

Unnamed: 0,global_unique_identifier,observation_date,tax_order,category,common_name,scientific_name,age_sex,country,state,county,bcr_code,lat,long,observer_id,trip_comments,observation_year
0,URN:CornellLabOfOrnithology:EBIRD:OBS81849899,2010-01-19,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,28,35.958338,-84.246758,obsr18758,beautiful warm sunny day,2010
1,URN:CornellLabOfOrnithology:EBIRD:OBS82004086,2010-01-22,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,28,36.217869,-84.086437,obsr18758,"Chilly, overcast, mist. After not seeing a wi...",2010
2,URN:CornellLabOfOrnithology:EBIRD:OBS80663800,2010-01-02,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,28,35.996341,-84.163513,obsr54372,All within Haw Ridge Park and within Knoxville...,2010
3,URN:CornellLabOfOrnithology:EBIRD:OBS82003744,2010-01-22,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,28,36.116386,-84.110001,obsr18758,,2010
4,URN:CornellLabOfOrnithology:EBIRD:OBS287169932,2010-01-15,21054,species,American Crow,Corvus brachyrhynchos,,United States,Tennessee,Anderson,28,36.007052,-84.262025,obsr276568,,2010
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1359372,URN:CornellLabOfOrnithology:EBIRD:OBS287920849,2014-12-30,26356,species,White-breasted Nuthatch,Sitta carolinensis,,United States,Tennessee,Wilson,24,36.205244,-86.290054,obsr411020,,2014
1359373,URN:CornellLabOfOrnithology:EBIRD:OBS287276272,2014-12-27,26356,species,White-breasted Nuthatch,Sitta carolinensis,,United States,Tennessee,Wilson,24,36.205244,-86.290054,obsr411020,,2014
1359374,URN:CornellLabOfOrnithology:EBIRD:OBS284780790,2014-12-10,5421,form,American Coot,Fulica americana,,United States,Tennessee,Wilson,24,36.289976,-86.445100,obsr439294,"<br />Submitted from BirdLog World for iOS, ve...",2014
1359375,URN:CornellLabOfOrnithology:EBIRD:OBS284636538,2014-12-08,5421,form,American Coot,Fulica americana,,United States,Tennessee,Wilson,24,36.277631,-86.510124,obsr550500,,2014


In [29]:
short_birds.common_name.value_counts().head(20)

common_name
Northern Cardinal         48842
American Crow             41626
Carolina Chickadee        40837
Blue Jay                  39305
Carolina Wren             39079
Mourning Dove             36616
Tufted Titmouse           36309
American Robin            35295
Northern Mockingbird      31399
Red-bellied Woodpecker    30536
American Goldfinch        27511
Downy Woodpecker          26749
Eastern Towhee            26523
European Starling         26250
Eastern Bluebird          26141
Song Sparrow              25308
Canada Goose              21227
Great Blue Heron          20367
House Finch               19358
White-throated Sparrow    18016
Name: count, dtype: int64

In [32]:
birds_usfws.common_name.value_counts().head(30)

common_name
Great Blue Heron            1118
Killdeer                     946
American Crow                927
Northern Cardinal            858
Canada Goose                 847
Mallard                      781
Carolina Wren                718
Double-crested Cormorant     681
Red-winged Blackbird         657
Tufted Titmouse              638
Bald Eagle                   625
Eastern Bluebird             621
Turkey Vulture               616
Red-bellied Woodpecker       608
Pied-billed Grebe            598
Great Egret                  587
Carolina Chickadee           587
Ring-billed Gull             575
Blue Jay                     569
Mourning Dove                550
Gadwall                      525
American Coot                515
Red-tailed Hawk              461
Song Sparrow                 450
Downy Woodpecker             430
American Goldfinch           426
White-throated Sparrow       416
Eastern Phoebe               410
Northern Mockingbird         406
Belted Kingfisher            39

In [28]:
# count of sightings by observation year & common name
short_birds.groupby(['observation_year', 'common_name']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,global_unique_identifier,observation_date,tax_order,category,scientific_name,age_sex,country,state,county,iba_code,bcr_code,usfws_code,lat,long,observer_id,trip_comments
observation_year,common_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2010,Acadian Flycatcher,255,255,255,255,255,1,255,255,255,106,255,5,255,255,255,113
2010,Accipiter sp.,3,3,3,3,3,0,3,3,3,0,3,0,3,3,3,1
2010,Alder Flycatcher,18,18,18,18,18,0,18,18,18,15,18,0,18,18,18,5
2010,Alder/Willow Flycatcher (Traill's Flycatcher),5,5,5,5,5,0,5,5,5,4,5,4,5,5,5,3
2010,American Avocet,6,6,6,6,6,0,6,6,6,3,6,3,6,6,6,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2014,tern sp.,2,2,2,2,2,0,2,2,2,0,2,0,2,2,2,1
2014,thrush sp.,22,22,22,22,22,0,22,22,22,1,22,0,22,22,22,6
2014,vireo sp.,3,3,3,3,3,0,3,3,3,0,3,0,3,3,3,0
2014,waterfowl sp.,1,1,1,1,1,0,1,1,1,0,1,0,1,1,1,0


In [None]:
# in thousand acres
land_by_ownership_class

### attempting webscraping:

come back to later

In [None]:
# come back to later!!
# download Selenium webdriver

In [None]:
# # importing beautiful soup
# import requests
# from bs4 import BeautifulSoup as BS

In [None]:
# # establishing connection
# URL = 'https://www.fws.gov/refuge/tennessee/map'
# response = requests.get(URL)

In [None]:
# # checking connection
# response.status_code

In [None]:
# soup = BS(response.text)

In [None]:
# soup.findAll('path', attrs={'class':'leaflet-interactive'})