TO DO:
- look into migration dfs uses
- double check work for notebook!
- export all csvs!
- have a lil party

COMPLETED:
- Write a read me
- Look into mixed data type columns in birding data - will read in as str due to "X" in some rows
- Look into “sensitive species” list
- Figure out how to read in specific tabs in the xlsx files for FIA
- Separate observation date into year columm to group by 
- Clean tables to drop extraneous columns
- Figure out what years to restrict data
- compile list of birds into one df
- Merge birds table with USFWS table to get specific region & download doc off data.gov website
- Separate birds tables into USFWS regions & not
- create new column for seasons of birding
- Value counts on bird species to get top 10
- Create subset of birds in top 10 & group by year
- Replace all USFWS with IBA
- Read in IBA df & join with non-null IBA
- home in on species of interest
- Create subsets of 50-75 percentile birds
- Get count of unique bird observer ids per year
- Get stat for count of birds per unique bird observer id per year

General notes:
- Years to look at: 2007-2016
- Group by Important Bird Areas (IBA)

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
# import matplotlib.pyplot as plt
# %matplotlib inline
# import folium

In [2]:
# reading in IBA map
iba_map = gpd.read_file('../data/iba_map.geojson')
iba_map = iba_map[['SITE_ID', 'SITE_NAME', 'STATE', 'LATITUDE', 'LONGITUDE', 'geometry']]
iba_map.columns = ['iba_code', 'iba_name', 'state', 'latitude', 'longitude', 'geometry']
iba_map = iba_map.loc[iba_map.state == 'Tennessee']

# converting iba code column to int to get rid of '.0'
iba_map = iba_map.astype({'iba_code':int})

In [3]:
# iba code & name subset
iba_names = iba_map[['iba_code', 'iba_name']]
iba_names.head(3)

Unnamed: 0,iba_code,iba_name
1185,2831,Hop-In Refuge
1186,2832,Tigrett Wildlife Management Area
1187,2833,Hatchie National Wildlife Refuge


In [4]:
# reading in birds, 2007 to 2016
birds = pd.read_csv('../data/eBird_2007_to_2016_TN/eBird_2007_to_2016_TN.txt', sep='\t')

# cleaning birds df
birds = birds[['GLOBAL UNIQUE IDENTIFIER', 'OBSERVATION DATE', 'CATEGORY', 'COMMON NAME', 'SCIENTIFIC NAME', 'STATE', 'COUNTY', 'IBA CODE', 'USFWS CODE', 'LATITUDE', 'LONGITUDE', 'OBSERVER ID', 'TRIP COMMENTS']]
birds.columns = ['global_unique_identifier', 'observation_date', 'category', 'common_name', 'scientific_name', 'state', 'county', 'iba_code', 'usfws_code', 'lat', 'long', 'observer_id', 'trip_comments']
birds = birds.loc[birds.category == 'species']
# creating additional column for observation year
birds['observation_year'] = [x[:4] for x in birds.observation_date]
birds = birds.astype({'observation_year':int})
birds.shape

  birds = pd.read_csv('../data/eBird_2007_to_2016_TN/eBird_2007_to_2016_TN.txt', sep='\t')


(2725988, 14)

In [5]:
# reading in sensitive species list
sensitive_species = pd.read_csv('../data/sensitive_species_2000_2020_TN.txt', sep='\t')
sensitive_species = sensitive_species[['GLOBAL UNIQUE IDENTIFIER', 'OBSERVATION DATE', 'CATEGORY', 'COMMON NAME', 'SCIENTIFIC NAME', 'STATE', 'COUNTY', 'IBA CODE', 'BCR CODE', 'USFWS CODE', 'LATITUDE', 'LONGITUDE', 'OBSERVER ID', 'TRIP COMMENTS']]
sensitive_species.columns = ['global_unique_identifier', 'observation_date', 'category', 'common_name', 'scientific_name', 'state', 'county', 'iba_code', 'bcr_code', 'usfws_code', 'lat', 'long', 'observer_id', 'trip_comments']

# creating additional column for observation year
sensitive_species['observation_year'] = [x[:4] for x in sensitive_species.observation_date]
sensitive_species = sensitive_species.astype({'observation_year':int})

# restricting to relevant years
sensitive_species = sensitive_species.loc[(sensitive_species.observation_year < 2017)&(sensitive_species.observation_year > 2006)].sort_values('observation_year').reset_index(drop=True)

In [6]:
# reading in USFWS codes
usfws_codes = pd.read_csv('../data/eBird_2007_to_2016_TN/USFWSCodes.txt', sep='\t')

# cleaning usfws codes df
usfws_codes = usfws_codes.reset_index()
usfws_codes['usfws_name'] = np.where(usfws_codes['USFWS NAME'].isnull(), usfws_codes['USFWS CODE'], usfws_codes['USFWS NAME'])
usfws_codes = usfws_codes[['index','usfws_name']]
usfws_codes.columns = ['usfws_code', 'usfws_name']
usfws_codes

Unnamed: 0,usfws_code,usfws_name
0,USFWS_1,FEATHERSTONE NATIONAL WILDLIFE REFUGE
1,USFWS_2,ARCHIE CARR NATIONAL WILDLIFE REFUGE
2,USFWS_3,BALD KNOB NATIONAL WILDLIFE REFUGE
3,USFWS_4,GREEN CAY NATIONAL WILDLIFE REFUGE
4,USFWS_5,SEAL BEACH NATIONAL WILDLIFE REFUGE
...,...,...
828,USFWS_829,SUSQUEHANNA NATIONAL WILDLIFE REFUGE
829,USFWS_830,TULE LAKE NATIONAL WILDLIFE REFUGE
830,USFWS_831,MISSISSIPPI SANDHILL CRANE NATIONAL WILDLIFE R...
831,USFWS_832,HANSON COUNTY WATERFOWL PRODUCTION AREA


### Adding columns for grouping

In [7]:
# creating seasons column: spring migration, fall migration, and offseason

# creating month column
birds['observation_month'] = [x[5:7] for x in birds.observation_date]
birds = birds.astype({'observation_month':int})

# spring: March to May
# fall: Sept to Oct

# # categorizing season based on date
for index, row in birds.iterrows():
    if (row['observation_month'] <= 5)&(row['observation_month'] >= 3):
        birds.loc[index, 'season'] = 'spring migration'
    elif (row['observation_month'] <= 10)&(row['observation_month'] >= 9):
        birds.loc[index, 'season'] = 'fall migration'
    else:
        birds.loc[index, 'season'] = 'offseason'

In [8]:
birds_by_season = birds.groupby('season').count()[['global_unique_identifier']]
birds_by_season.columns = ['count']
birds_by_season = birds_by_season.reset_index()
birds_by_season

Unnamed: 0,season,count
0,fall migration,406743
1,offseason,1255983
2,spring migration,1063262


In [9]:
# subset of fall & spring migration
fall_birds = birds.loc[birds.season == 'fall migration']
spring_birds = birds.loc[birds.season == 'spring migration']

In [None]:
# more to do with migration?

## Species of interest: Bald Eagle, Sandhill Crane, Baltimore Oriole, Indigo Bunting, Ruby-throated Hummingbird, Yellow Warbler, Summer Tanager

In [26]:
# creating birds of interest df
birds_of_interest = birds.loc[birds.common_name.isin(['Sandhill Crane', 'Bald Eagle', 'Baltimore Oriole', 'Indigo Bunting', 'Ruby-throated Hummingbird', 'Yellow Warbler', 'Summer Tanager'])]

# creating separate count list
birds_of_interest_count = birds_of_interest.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
birds_of_interest_count.columns = ['observation_year', 'common_name', 'count']

# adding total count per year as a column
total_birds =  birds_of_interest_count.groupby('observation_year').sum()[['count']].reset_index()
total_birds.columns=['observation_year', 'total_count_by_year']

# merging back
birds_of_interest_count = pd.merge(birds_of_interest_count, total_birds, on='observation_year', how='inner')

# adding perc column
birds_of_interest_count['perc_total_sightings'] = round(birds_of_interest_count['count']/birds_of_interest_count.total_count_by_year*100,2)

# total_birds
birds_of_interest_count

Unnamed: 0,observation_year,common_name,count,total_count_by_year,perc_total_sightings
0,2007,Bald Eagle,167,2073,8.06
1,2007,Baltimore Oriole,94,2073,4.53
2,2007,Indigo Bunting,688,2073,33.19
3,2007,Ruby-throated Hummingbird,631,2073,30.44
4,2007,Sandhill Crane,91,2073,4.39
...,...,...,...,...,...
65,2016,Indigo Bunting,5160,17259,29.90
66,2016,Ruby-throated Hummingbird,4046,17259,23.44
67,2016,Sandhill Crane,830,17259,4.81
68,2016,Summer Tanager,2672,17259,15.48


In [27]:
# creating geometry column for birds of interest
birds_of_interest['geometry'] = birds_of_interest.apply(lambda x: Point((float(x.long),
                                                                  float(x.lat))),
                                                 axis=1)
birds_of_interest

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  birds_of_interest['geometry'] = birds_of_interest.apply(lambda x: Point((float(x.long),


Unnamed: 0,global_unique_identifier,observation_date,category,common_name,scientific_name,state,county,iba_code,usfws_code,lat,long,observer_id,trip_comments,observation_year,observation_month,season,geometry
70,URN:CornellLabOfOrnithology:EBIRD:OBS178964395,2007-01-16,species,Bald Eagle,Haliaeetus leucocephalus,Tennessee,Blount,,,35.817535,-84.115276,obsr58986,,2007,1,offseason,POINT (-84.1152763 35.8175348)
71,URN:CornellLabOfOrnithology:EBIRD:OBS525915355,2007-01-01,species,Bald Eagle,Haliaeetus leucocephalus,Tennessee,Blount,,,35.546197,-84.059025,obsr58986,,2007,1,offseason,POINT (-84.05902469999999 35.5461975)
363,URN:CornellLabOfOrnithology:EBIRD:OBS525916943,2007-01-02,species,Sandhill Crane,Antigone canadensis,Tennessee,Blount,,,35.808211,-84.027954,obsr58986,,2007,1,offseason,POINT (-84.02795399999999 35.8082108)
364,URN:CornellLabOfOrnithology:EBIRD:OBS178963764,2007-01-16,species,Sandhill Crane,Antigone canadensis,Tennessee,Blount,,,35.808211,-84.027954,obsr58986,,2007,1,offseason,POINT (-84.02795399999999 35.8082108)
543,URN:CornellLabOfOrnithology:EBIRD:OBS36032974,2007-01-13,species,Bald Eagle,Haliaeetus leucocephalus,Tennessee,Campbell,,,36.307447,-84.214080,obsr18758,photographed the Ross''s Goose,2007,1,offseason,POINT (-84.2140801 36.3074474)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2811989,URN:CornellLabOfOrnithology:EBIRD:OBS448673919,2016-12-12,species,Sandhill Crane,Antigone canadensis,Tennessee,Weakley,,,36.351668,-88.840521,obsr167616,,2016,12,offseason,POINT (-88.840521 36.351668)
2812654,URN:CornellLabOfOrnithology:EBIRD:OBS448220315,2016-12-10,species,Ruby-throated Hummingbird,Archilochus colubris,Tennessee,Williamson,,,36.016033,-86.964869,obsr378275,,2016,12,offseason,POINT (-86.9648687 36.0160333)
2812664,URN:CornellLabOfOrnithology:EBIRD:OBS449182124,2016-12-14,species,Sandhill Crane,Antigone canadensis,Tennessee,Williamson,,,35.948203,-86.777315,obsr215915,,2016,12,offseason,POINT (-86.7773151 35.9482031)
2813273,URN:CornellLabOfOrnithology:EBIRD:OBS448479574,2016-12-09,species,Sandhill Crane,Antigone canadensis,Tennessee,Wilson,,,36.040910,-86.351945,obsr673187,<br>Submitted from eBird Android 1.3,2016,12,offseason,POINT (-86.3519453 36.0409101)


## Davidson County birds (downtown Nashville)

In [56]:
# interested in 50-75 percentile
nashville_birds = birds.loc[birds.county == 'Davidson']
nashville_birds_count = nashville_birds.common_name.value_counts().reset_index()
nashville_birds_count = nashville_birds_count.loc[(nashville_birds_count['count']>33)&(nashville_birds_count['count']<120)]

# creating list of populations of focus
nashville_birds_list = nashville_birds_count.common_name.to_list()

# saving back to nashville birds
nashville_birds = nashville_birds.loc[nashville_birds.common_name.isin(nashville_birds_list)]

# grouping by year & common name for count
nash_birds_count = nashville_birds.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
nash_birds_count.columns = ['observation_year', 'common_name', 'count']

# adding geometry column for all nash bird sightings
nashville_birds['geometry'] = nashville_birds.apply(lambda x: Point((float(x.long),
                                                                  float(x.lat))),
                                                 axis=1)

In [29]:
nashville_birds_count.describe()

Unnamed: 0,count
count,42.0
mean,68.785714
std,25.930052
min,37.0
25%,46.0
50%,61.0
75%,90.75
max,117.0


## IBA

In [30]:
# creating subset of birds sighted within IBAs
iba_birds = birds.loc[~birds.iba_code.isna()]
iba_birds['iba_code'] = [x[6:10] for x in iba_birds.iba_code]
iba_birds = iba_birds.astype({'iba_code':int})
iba_birds = pd.merge(iba_birds, iba_names, on='iba_code', how='left')
iba_birds = iba_birds[['global_unique_identifier', 'observation_date', 'category', 'common_name', 'scientific_name', 'state', 'county', 'iba_code', 'lat', 'long', 'observer_id', 'trip_comments', 'observation_year', 'iba_name']]
iba_birds.head(3)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iba_birds['iba_code'] = [x[6:10] for x in iba_birds.iba_code]


Unnamed: 0,global_unique_identifier,observation_date,category,common_name,scientific_name,state,county,iba_code,lat,long,observer_id,trip_comments,observation_year,iba_name
0,URN:CornellLabOfOrnithology:EBIRD:OBS178183365,2007-01-20,species,American Crow,Corvus brachyrhynchos,Tennessee,Blount,2865,35.604167,-83.784691,obsr58986,,2007,Southern Blue Ridge
1,URN:CornellLabOfOrnithology:EBIRD:OBS178183148,2007-01-06,species,American Crow,Corvus brachyrhynchos,Tennessee,Blount,2865,35.604167,-83.784691,obsr58986,,2007,Southern Blue Ridge
2,URN:CornellLabOfOrnithology:EBIRD:OBS178182618,2007-01-04,species,American Crow,Corvus brachyrhynchos,Tennessee,Blount,2865,35.604167,-83.784691,obsr58986,,2007,Southern Blue Ridge


In [31]:
# creating separate list of sightings outside of IBA regions
birds_outside_iba = birds.loc[birds.iba_code.isna()]
birds_outside_iba.drop(columns=['usfws_code', 'iba_code'])
birds_outside_iba['within_iba'] = False
birds_outside_iba = birds_outside_iba[['global_unique_identifier', 'observation_date', 'category', 'common_name', 'scientific_name', 'state', 'county', 'lat', 'long', 'observer_id', 'trip_comments', 'observation_year', 'within_iba']]

birds_outside_iba.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  birds_outside_iba['within_iba'] = False


Unnamed: 0,global_unique_identifier,observation_date,category,common_name,scientific_name,state,county,lat,long,observer_id,trip_comments,observation_year,within_iba
0,URN:CornellLabOfOrnithology:EBIRD:OBS36173336,2007-01-18,species,American Crow,Corvus brachyrhynchos,Tennessee,Anderson,36.116386,-84.110001,obsr104960,,2007,False
1,URN:CornellLabOfOrnithology:EBIRD:OBS269070598,2007-01-16,species,American Crow,Corvus brachyrhynchos,Tennessee,Anderson,36.116386,-84.110001,obsr242764,,2007,False
2,URN:CornellLabOfOrnithology:EBIRD:OBS36173326,2007-01-18,species,American Kestrel,Falco sparverius,Tennessee,Anderson,36.116386,-84.110001,obsr104960,,2007,False
3,URN:CornellLabOfOrnithology:EBIRD:OBS36173330,2007-01-18,species,Bufflehead,Bucephala albeola,Tennessee,Anderson,36.116386,-84.110001,obsr104960,,2007,False
4,URN:CornellLabOfOrnithology:EBIRD:OBS269070604,2007-01-16,species,Bufflehead,Bucephala albeola,Tennessee,Anderson,36.116386,-84.110001,obsr242764,,2007,False


## USFWS

In [32]:
# creating subset of birds sighted within usfws polygon:
birds_usfws = birds.loc[~birds.usfws_code.isna()]

# joining birds_usfws with usfws codes
birds_usfws = pd.merge(birds_usfws, usfws_codes, on='usfws_code', how='left')

birds_usfws.head(3)

Unnamed: 0,global_unique_identifier,observation_date,category,common_name,scientific_name,state,county,iba_code,usfws_code,lat,long,observer_id,trip_comments,observation_year,observation_month,season,usfws_name
0,URN:CornellLabOfOrnithology:EBIRD:OBS36528641,2007-01-29,species,American Black Duck,Anas rubripes,Tennessee,Decatur,US-TN_2874,USFWS_723,35.688577,-88.031288,obsr56053,"Mostly fair, wind L-M, -2 to -1C",2007,1,offseason,TENNESSEE NATIONAL WILDLIFE REFUGE
1,URN:CornellLabOfOrnithology:EBIRD:OBS36528636,2007-01-29,species,American Crow,Corvus brachyrhynchos,Tennessee,Decatur,US-TN_2874,USFWS_723,35.688577,-88.031288,obsr56053,"Mostly fair, wind L-M, -2 to -1C",2007,1,offseason,TENNESSEE NATIONAL WILDLIFE REFUGE
2,URN:CornellLabOfOrnithology:EBIRD:OBS36528647,2007-01-29,species,American Robin,Turdus migratorius,Tennessee,Decatur,US-TN_2874,USFWS_723,35.688577,-88.031288,obsr56053,"Mostly fair, wind L-M, -2 to -1C",2007,1,offseason,TENNESSEE NATIONAL WILDLIFE REFUGE


In [33]:
# creating separate list of sightings outside of USFWS regions
birds_outside_usfws = birds.loc[birds.usfws_code.isna()]
birds_outside_usfws.drop(columns=['usfws_code', 'iba_code'])
birds_outside_usfws['within_park'] = False

birds_outside_usfws.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  birds_outside_usfws['within_park'] = False


Unnamed: 0,global_unique_identifier,observation_date,category,common_name,scientific_name,state,county,iba_code,usfws_code,lat,long,observer_id,trip_comments,observation_year,observation_month,season,within_park
0,URN:CornellLabOfOrnithology:EBIRD:OBS36173336,2007-01-18,species,American Crow,Corvus brachyrhynchos,Tennessee,Anderson,,,36.116386,-84.110001,obsr104960,,2007,1,offseason,False
1,URN:CornellLabOfOrnithology:EBIRD:OBS269070598,2007-01-16,species,American Crow,Corvus brachyrhynchos,Tennessee,Anderson,,,36.116386,-84.110001,obsr242764,,2007,1,offseason,False
2,URN:CornellLabOfOrnithology:EBIRD:OBS36173326,2007-01-18,species,American Kestrel,Falco sparverius,Tennessee,Anderson,,,36.116386,-84.110001,obsr104960,,2007,1,offseason,False
3,URN:CornellLabOfOrnithology:EBIRD:OBS36173330,2007-01-18,species,Bufflehead,Bucephala albeola,Tennessee,Anderson,,,36.116386,-84.110001,obsr104960,,2007,1,offseason,False
4,URN:CornellLabOfOrnithology:EBIRD:OBS269070604,2007-01-16,species,Bufflehead,Bucephala albeola,Tennessee,Anderson,,,36.116386,-84.110001,obsr242764,,2007,1,offseason,False


## EDA

## Top 10 lists

#### with IBA code

In [34]:
# creating subset called outside iba top 10, grouping by year & common name
outside_iba_top10 = birds_outside_iba.loc[birds_outside_iba.common_name.isin(['Northern Cardinal', 'American Crow', 'Carolina Chickadee', 'Blue Jay', 'Carolina Wren', 'Tufted Titmouse', 'Mourning Dove', 'American Robin', 'Red-bellied Woodpecker', 'Northern Mockingbird'])]
outside_iba_top10 = outside_iba_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
outside_iba_top10.columns = ['observation_year', 'common_name', 'count']

# adding total count per year as a column
total_birds = outside_iba_top10.groupby('observation_year').sum()[['count']]
total_birds = total_birds.reset_index()
total_birds.columns=['observation_year', 'total_count_by_year']
# merging back
outside_iba_top10 = pd.merge(outside_iba_top10, total_birds, on='observation_year', how='inner')

# adding perc column
outside_iba_top10['perc_total_sightings'] = round(outside_iba_top10['count']/outside_iba_top10.total_count_by_year*100,2)

outside_iba_top10

Unnamed: 0,observation_year,common_name,count,total_count_by_year,perc_total_sightings
0,2007,American Crow,1784,18343,9.73
1,2007,American Robin,1455,18343,7.93
2,2007,Blue Jay,2055,18343,11.20
3,2007,Carolina Chickadee,2031,18343,11.07
4,2007,Carolina Wren,2010,18343,10.96
...,...,...,...,...,...
95,2016,Mourning Dove,14434,150702,9.58
96,2016,Northern Cardinal,19397,150702,12.87
97,2016,Northern Mockingbird,11260,150702,7.47
98,2016,Red-bellied Woodpecker,12827,150702,8.51


In [35]:
# creating subset called outside iba next top 10, grouping by year & common name
outside_iba_next_top10 = birds_outside_iba.loc[birds_outside_iba.common_name.isin(['American Goldfinch', 'Downy Woodpecker', 'Eastern Towhee', 'European Starling', 'Eastern Bluebird', 'Song Sparrow', 'Canada Goose', 'Great Blue Heron', 'House Finch', 'White-breasted Nuthatch'])]
outside_iba_next_top10 = outside_iba_next_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
outside_iba_next_top10.columns = ['observation_year', 'common_name', 'count']

# adding total count per year as a column
total_birds = outside_iba_next_top10.groupby('observation_year').sum()[['count']]
total_birds = total_birds.reset_index()
total_birds.columns=['observation_year', 'total_count_by_year']
# merging back
outside_iba_next_top10 = pd.merge(outside_iba_next_top10, total_birds, on='observation_year', how='inner')

# adding perc column
outside_iba_next_top10['perc_total_sightings'] = round(outside_iba_next_top10['count']/outside_iba_next_top10.total_count_by_year*100,2)

outside_iba_next_top10

Unnamed: 0,observation_year,common_name,count,total_count_by_year,perc_total_sightings
0,2007,American Goldfinch,1726,11814,14.61
1,2007,Canada Goose,901,11814,7.63
2,2007,Downy Woodpecker,1314,11814,11.12
3,2007,Eastern Bluebird,1330,11814,11.26
4,2007,Eastern Towhee,1296,11814,10.97
...,...,...,...,...,...
95,2016,European Starling,10576,91569,11.55
96,2016,Great Blue Heron,7937,91569,8.67
97,2016,House Finch,8895,91569,9.71
98,2016,Song Sparrow,9296,91569,10.15


In [36]:
# creating subset called iba top 10, grouping by year & common name
iba_top10 = iba_birds.loc[iba_birds.common_name.isin(['Great Blue Heron', 'American Crow', 'Northern Cardinal', 'Downy Woodpecker', 'American Robin', 'Carolina Wren', 'Carolina Chickadee', 'Red-bellied Woodpecker', 'Blue Jay', 'Tufted Titmouse'])]
iba_top10 = iba_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
iba_top10.columns = ['observation_year', 'common_name', 'count']

# adding total count per year as a column
total_birds = iba_top10.groupby('observation_year').sum()[['count']]
total_birds = total_birds.reset_index()
total_birds.columns=['observation_year', 'total_count_by_year']
# merging back
iba_top10 = pd.merge(iba_top10, total_birds, on='observation_year', how='inner')

# adding perc column
iba_top10['perc_total_sightings'] = round(iba_top10['count']/iba_top10.total_count_by_year*100,2)

iba_top10

Unnamed: 0,observation_year,common_name,count,total_count_by_year,perc_total_sightings
0,2007,American Crow,246,2059,11.95
1,2007,American Robin,175,2059,8.50
2,2007,Blue Jay,167,2059,8.11
3,2007,Carolina Chickadee,250,2059,12.14
4,2007,Carolina Wren,238,2059,11.56
...,...,...,...,...,...
95,2016,Downy Woodpecker,1770,23326,7.59
96,2016,Great Blue Heron,1898,23326,8.14
97,2016,Northern Cardinal,2846,23326,12.20
98,2016,Red-bellied Woodpecker,2195,23326,9.41


In [37]:
# creating subset called iba next top 10 (11-20), grouping by year & common name
iba_next_top10 = iba_birds.loc[iba_birds.common_name.isin(['Pileated Woodpecker', 'Canada Goose', 'American Goldfinch', 'Mourning Dove', 'Eastern Towhee', 'Eastern Bluebird', 'White-breasted Nuthatch', 'Mallard', 'Indigo Bunting', 'Red-eyed Vireo'])]
iba_next_top10 = iba_next_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
iba_next_top10.columns = ['observation_year', 'common_name', 'count']

# adding total count per year as a column
total_birds = iba_next_top10.groupby('observation_year').sum()[['count']]
total_birds = total_birds.reset_index()
total_birds.columns=['observation_year', 'total_count_by_year']
# merging back
iba_next_top10 = pd.merge(iba_next_top10, total_birds, on='observation_year', how='inner')

# adding perc column
iba_next_top10['perc_total_sightings'] = round(iba_next_top10['count']/iba_next_top10.total_count_by_year*100,2)
iba_next_top10

Unnamed: 0,observation_year,common_name,count,total_count_by_year,perc_total_sightings
0,2007,American Goldfinch,156,1430,10.91
1,2007,Canada Goose,169,1430,11.82
2,2007,Eastern Bluebird,160,1430,11.19
3,2007,Eastern Towhee,106,1430,7.41
4,2007,Indigo Bunting,131,1430,9.16
...,...,...,...,...,...
95,2016,Mallard,1412,13977,10.10
96,2016,Mourning Dove,1511,13977,10.81
97,2016,Pileated Woodpecker,1595,13977,11.41
98,2016,Red-eyed Vireo,1244,13977,8.90


#### list of iba birds for map

In [38]:
# creating subsets of iba birds with all sightings of top 10

iba_top10_list = iba_birds.loc[iba_birds.common_name.isin(['Great Blue Heron', 'American Crow', 'Northern Cardinal', 'Downy Woodpecker', 'American Robin', 'Carolina Wren', 'Carolina Chickadee', 'Red-bellied Woodpecker', 'Blue Jay', 'Tufted Titmouse'])]
iba_top10_list['geometry'] = iba_top10_list.apply(lambda x: Point((float(x.long),
                                                                  float(x.lat))),
                                                 axis=1)

# creating subsets of iba birds with all sightings of next top 10

iba_next_top10_list = iba_birds.loc[iba_birds.common_name.isin(['Pileated Woodpecker', 'Canada Goose', 'American Goldfinch', 'Mourning Dove', 'Eastern Towhee', 'Eastern Bluebird', 'White-breasted Nuthatch', 'Mallard', 'Indigo Bunting', 'Red-eyed Vireo'])]
iba_next_top10_list['geometry'] = iba_next_top10_list.apply(lambda x: Point((float(x.long),
                                                                  float(x.lat))),
                                                 axis=1)
# joining two dfs into top 20 list
iba_top20 = pd.concat([iba_top10_list, iba_next_top10_list])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iba_top10_list['geometry'] = iba_top10_list.apply(lambda x: Point((float(x.long),
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iba_next_top10_list['geometry'] = iba_next_top10_list.apply(lambda x: Point((float(x.long),


#### list of birds outside iba for map

In [41]:
# creating subsets of outside iba birds with all sightings of top 10

outside_iba_top10_list = birds_outside_iba.loc[birds_outside_iba.common_name.isin(['Northern Cardinal', 'American Crow', 'Carolina Chickadee', 'Blue Jay', 'Carolina Wren', 'Tufted Titmouse', 'Mourning Dove', 'American Robin', 'Red-bellied Woodpecker', 'Northern Mockingbird'])]
outside_iba_top10_list['geometry'] = outside_iba_top10_list.apply(lambda x: Point((float(x.long),
                                                                  float(x.lat))),
                                                 axis=1)

# creating subsets of outside iba birds with all sightings of next top 10

outside_iba_next_top10_list = birds_outside_iba.loc[birds_outside_iba.common_name.isin(['American Goldfinch', 'Downy Woodpecker', 'Eastern Towhee', 'European Starling', 'Eastern Bluebird', 'Song Sparrow', 'Canada Goose', 'Great Blue Heron', 'House Finch', 'White-breasted Nuthatch'])]
outside_iba_next_top10_list['geometry'] = outside_iba_next_top10_list.apply(lambda x: Point((float(x.long),
                                                                  float(x.lat))),
                                                 axis=1)

# joining two dfs into top 20 list
outside_iba_top20 = pd.concat([outside_iba_top10_list, outside_iba_next_top10_list])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  outside_iba_top10_list['geometry'] = outside_iba_top10_list.apply(lambda x: Point((float(x.long),
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  outside_iba_next_top10_list['geometry'] = outside_iba_next_top10_list.apply(lambda x: Point((float(x.long),


#### with USFWS code

In [42]:
# # creating subset called outside usfws top 10, grouping by year & common name
# outside_usfws_top10 = birds_outside_usfws.loc[birds_outside_usfws.common_name.isin(['Northern Cardinal', 'American Crow', 'Carolina Chickadee', 'Blue Jay', 'Carolina Wren', 'Tufted Titmouse', 'Mourning Dove', 'American Robin', 'Red-bellied Woodpecker', 'Northern Mockingbird'])]
# outside_usfws_top10 = outside_usfws_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
# outside_usfws_top10.columns = ['observation_year', 'common_name', 'count']

# # adding total count per year as a column
# total_birds = outside_usfws_top10.groupby('observation_year').sum()[['count']]
# total_birds = total_birds.reset_index()
# total_birds.columns=['observation_year', 'total_count_by_year']
# # merging back
# outside_usfws_top10 = pd.merge(outside_usfws_top10, total_birds, on='observation_year', how='inner')

# # adding perc column
# outside_usfws_top10['perc_total_sightings'] = round(outside_usfws_top10['count']/outside_usfws_top10.total_count_by_year*100,2)

# outside_usfws_top10
# # .loc[outside_usfws_top10.common_name == 'Northern Mockingbird']

In [43]:
# # creating subset called outside usfws next top 10, grouping by year & common name
# outside_usfws_next_top10 = birds_outside_usfws.loc[birds_outside_usfws.common_name.isin(['American Goldfinch', 'Downy Woodpecker', 'Eastern Towhee', 'European Starling', 'Eastern Bluebird', 'Song Sparrow', 'Canada Goose', 'Great Blue Heron', 'House Finch', 'White-breasted Nuthatch'])]
# outside_usfws_next_top10 = outside_usfws_next_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
# outside_usfws_next_top10.columns = ['observation_year', 'common_name', 'count']

# # adding total count per year as a column
# total_birds = outside_usfws_next_top10.groupby('observation_year').sum()[['count']]
# total_birds = total_birds.reset_index()
# total_birds.columns=['observation_year', 'total_count_by_year']
# # merging back
# outside_usfws_next_top10 = pd.merge(outside_usfws_next_top10, total_birds, on='observation_year', how='inner')

# # adding perc column
# outside_usfws_next_top10['perc_total_sightings'] = round(outside_usfws_next_top10['count']/outside_usfws_next_top10.total_count_by_year*100,2)

# outside_usfws_next_top10

In [44]:
# # creating subset called usfws top 10, grouping by year & common name
# usfws_top10 = birds_usfws.loc[birds_usfws.common_name.isin(['Great Blue Heron', 'American Crow', 'Northern Cardinal', 'Killdeer', 'Canada Goose', 'Carolina Wren', 'Mallard', 'Red-bellied Woodpecker', 'Bald Eagle', 'Tufted Titmouse'])]
# usfws_top10 = usfws_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
# usfws_top10.columns = ['observation_year', 'common_name', 'count']

# # adding total count per year as a column
# total_birds = usfws_top10.groupby('observation_year').sum()[['count']]
# total_birds = total_birds.reset_index()
# total_birds.columns=['observation_year', 'total_count_by_year']
# # merging back
# usfws_top10 = pd.merge(usfws_top10, total_birds, on='observation_year', how='inner')

# # adding perc column
# usfws_top10['perc_total_sightings'] = round(usfws_top10['count']/usfws_top10.total_count_by_year*100,2)

# usfws_top10

In [45]:
# # creating subset called usfws next top 10 (11-20), grouping by year & common name
# usfws_next_top10 = birds_usfws.loc[birds_usfws.common_name.isin(['Turkey Vulture', 'Double-crested Cormorant', 'Blue Jay', 'Ring-billed Gull', 'Carolina Chickadee', 'Red-winged Blackbird', 'Eastern Bluebird', 'Pied-billed Grebe', 'Great Egret', 'Mourning Dove'])]
# usfws_next_top10 = usfws_next_top10.groupby(['observation_year', 'common_name']).count()[['global_unique_identifier']].reset_index()
# usfws_next_top10.columns = ['observation_year', 'common_name', 'count']

# # adding total count per year as a column
# total_birds = usfws_next_top10.groupby('observation_year').sum()[['count']]
# total_birds = total_birds.reset_index()
# total_birds.columns=['observation_year', 'total_count_by_year']
# # merging back
# usfws_next_top10 = pd.merge(usfws_next_top10, total_birds, on='observation_year', how='inner')

# # adding perc column
# usfws_next_top10['perc_total_sightings'] = round(usfws_next_top10['count']/usfws_next_top10.total_count_by_year*100,2)
# usfws_next_top10

### Unique observer id count

In [46]:
# total number of unique birders 2007-2016
len(birds.observer_id.unique())

6419

In [47]:
# for all birds

# count of unique birders per year
birders_per_year = birds.groupby('observation_year')['observer_id'].unique().reset_index()
birders_per_year['count_birders'] = birders_per_year['observer_id'].str.len()

# getting count of sightings per year
birds_per_year = birds.groupby('observation_year').count()[['global_unique_identifier']].reset_index()
birds_per_year.columns = ['observation_year', 'count_birds']
birds_and_birders = pd.merge(birders_per_year, birds_per_year, on='observation_year', how='inner')
birds_and_birders['birds_per_birder'] = round(birds_and_birders.count_birds/birds_and_birders.count_birders,2)
birds_and_birders = birds_and_birders[['observation_year', 'count_birders', 'count_birds', 'birds_per_birder']]
birds_and_birders

Unnamed: 0,observation_year,count_birders,count_birds,birds_per_birder
0,2007,376,75357,200.42
1,2008,418,83373,199.46
2,2009,496,105835,213.38
3,2010,637,122670,192.57
4,2011,740,157554,212.91
5,2012,990,246335,248.82
6,2013,1244,331428,266.42
7,2014,1579,457302,289.61
8,2015,1889,533285,282.31
9,2016,2164,612849,283.2


In [48]:
# for IBA birds

# count of unique birders per year
birders_per_year = iba_birds.groupby('observation_year')['observer_id'].unique().reset_index()
birders_per_year['count_birders'] = birders_per_year['observer_id'].str.len()

# getting count of sightings per year
birds_per_year = iba_birds.groupby('observation_year').count()[['global_unique_identifier']].reset_index()
birds_per_year.columns = ['observation_year', 'count_birds']
birds_and_birders = pd.merge(birders_per_year, birds_per_year, on='observation_year', how='inner')
birds_and_birders['birds_per_birder'] = round(birds_and_birders.count_birds/birds_and_birders.count_birders,2)
iba_birds_and_birders = birds_and_birders[['observation_year', 'count_birders', 'count_birds', 'birds_per_birder']]
iba_birds_and_birders

Unnamed: 0,observation_year,count_birders,count_birds,birds_per_birder
0,2007,137,11683,85.28
1,2008,166,12904,77.73
2,2009,166,22781,137.23
3,2010,211,21803,103.33
4,2011,276,30147,109.23
5,2012,367,44223,120.5
6,2013,482,53587,111.18
7,2014,593,69653,117.46
8,2015,684,92578,135.35
9,2016,764,111285,145.66


In [57]:
# for all nashville birds (no restriction)

all_nashville_birds = birds.loc[birds.county == 'Davidson']

# count of unique birders per year
birders_per_year = all_nashville_birds.groupby('observation_year')['observer_id'].unique().reset_index()
birders_per_year['count_birders'] = birders_per_year['observer_id'].str.len()

# getting count of sightings per year
birds_per_year = all_nashville_birds.groupby('observation_year').count()[['global_unique_identifier']].reset_index()
birds_per_year.columns = ['observation_year', 'count_birds']
birds_and_birders = pd.merge(birders_per_year, birds_per_year, on='observation_year', how='inner')
birds_and_birders['birds_per_birder'] = round(birds_and_birders.count_birds/birds_and_birders.count_birders,2)
all_nash_birds_and_birders = birds_and_birders[['observation_year', 'count_birders', 'count_birds', 'birds_per_birder']]
all_nash_birds_and_birders

Unnamed: 0,observation_year,count_birders,count_birds,birds_per_birder
0,2007,50,1760,35.2
1,2008,66,3544,53.7
2,2009,72,14607,202.88
3,2010,106,14476,136.57
4,2011,116,23809,205.25
5,2012,148,30073,203.2
6,2013,222,29462,132.71
7,2014,239,34825,145.71
8,2015,333,50832,152.65
9,2016,381,58557,153.69


In [58]:
# for nashville birds (restricted to 3rd quartile)

# count of unique birders per year
birders_per_year = nashville_birds.groupby('observation_year')['observer_id'].unique().reset_index()
birders_per_year['count_birders'] = birders_per_year['observer_id'].str.len()

# getting count of sightings per year
birds_per_year = nashville_birds.groupby('observation_year').count()[['global_unique_identifier']].reset_index()
birds_per_year.columns = ['observation_year', 'count_birds']
birds_and_birders = pd.merge(birders_per_year, birds_per_year, on='observation_year', how='inner')
birds_and_birders['birds_per_birder'] = round(birds_and_birders.count_birds/birds_and_birders.count_birders,2)
nash_birds_and_birders = birds_and_birders[['observation_year', 'count_birders', 'count_birds', 'birds_per_birder']]
nash_birds_and_birders

Unnamed: 0,observation_year,count_birders,count_birds,birds_per_birder
0,2007,10,25,2.5
1,2008,17,57,3.35
2,2009,23,195,8.48
3,2010,23,110,4.78
4,2011,32,277,8.66
5,2012,36,339,9.42
6,2013,53,379,7.15
7,2014,56,409,7.3
8,2015,80,618,7.72
9,2016,73,480,6.58


#### list of sensitive species for map

In [62]:
sensitive_species['geometry'] = sensitive_species.apply(lambda x: Point((float(x.long),
                                                                  float(x.lat))),
                                                 axis=1)

## Export CSVs:

In [63]:
## TOP 10 LISTS/COUNTS

# iba top 10
iba_top10.to_csv('iba_top10.csv')

# iba next top 10
iba_next_top10.to_csv('iba_next_top10.csv')

# outside iba top 10
outside_iba_top10.to_csv('outside_iba_top10.csv')

# outside iba next top 10
outside_iba_next_top10.to_csv('outside_iba_next_top10.csv')

# nash birds count
nash_birds_count.to_csv('nash_birds_count.csv')

# birds of interest count
birds_of_interest_count.to_csv('birds_of_interest_count.csv')

## MISC

# birds by season
birds_by_season.to_csv('birds_by_season.csv')

# birds per birder
birds_and_birders.to_csv('birds_per_birder.csv')

# iba birds per birder
iba_birds_and_birders.to_csv('iba_birds_per_birder.csv')

# restricted nashville birds per birder
nash_birds_and_birders.to_csv('res_nash_birds_per_birder.csv')

# all nashville birds per birder
all_nash_birds_and_birders.to_csv('all_nash_birds_per_birder.csv')

# MAPS

# iba top 20 for map
iba_top20.to_csv('iba_top20_map.csv')

# outside iba top 20 for map
outside_iba_top20.to_csv('outside_iba_top20_map.csv')

# birds of interest for map
birds_of_interest.to_csv('birds_of_interest_map.csv')

# sensitive species list for map
sensitive_species.to_csv('sensitive_species_map.csv')

# nashville birds for map
nashville_birds.to_csv('nashville_birds_map.csv')

### attempting webscraping:

come back to later

In [None]:
# come back to later!!
# download Selenium webdriver

In [None]:
# # importing beautiful soup
# import requests
# from bs4 import BeautifulSoup as BS

In [None]:
# # establishing connection
# URL = 'https://www.fws.gov/refuge/tennessee/map'
# response = requests.get(URL)

In [None]:
# # checking connection
# response.status_code

In [None]:
# soup = BS(response.text)

In [None]:
# soup.findAll('path', attrs={'class':'leaflet-interactive'})