**Second notebook in analysis** 

**cleans site metadata**

In [1]:
import os
import pandas as pd
import numpy as np
datapath = '../data'
bombusMeta = pd.read_csv(os.path.join(datapath, 'BombusMetadata.tsv'), sep = '\t')

# Clean site metadata

- creating sites for samples captured within the same geographic area
- separate 5k and 10 diameter sites

## Helper Functions

In [2]:
from math import sin, cos, sqrt, atan2, radians

# approximate radius of earth in km
R = 6373.0

def getDistance(pos1, pos2): 
    '''
    gets the distance between two coordinate pairs in km
    takes 2 lat lon pairs and returns a float
    '''
    
    lat1, lon1 = map(radians, pos1)
    lat2, lon2 = map(radians, pos2)
    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    return R * c

def checkSite(sites, dist, df, latCol, lonCol, siteCol):
    '''
    makes sure the max distance between points in a site
    is less than a threshold
    takes site, ditstance threshold, df, and cols
    takes sites as a list so that multiple can be combined
    '''
    maxd=0
    df2 = df[df[siteCol].isin(sites)].dropna(subset = [latCol, lonCol])
    if len(df2[siteCol].unique()) <2:
        return (False, 0)
    
    df2.reset_index(inplace=True)
    for index, row in df2.iterrows():
        for index2, row2 in df2.iloc[index:].iterrows():
            #print((row[latCol],row[lonCol]),(row2[latCol],row2[lonCol]))
            d = getDistance((row[latCol],row[lonCol]),(row2[latCol],row2[lonCol]))
            if d >dist:
                return (False, d)
            if d >maxd:
                maxd = d
            
    return (True, maxd)

## 5k cut off

All samples have a collection site recorded by field research assistants. We want to cluster or split these into categorical collection sites. There's a lot of redundancy that should be easy to collapse (ie separate site names for different colby college parking lots). We expect that the maximum foraging range of worker bees is 5km (Goulson, 2010), and want to form sites with diameters of 5 km, while respecting potentially meaningful differences between sites recorded by research assistants. 

**make a new site column**

In [3]:
bombusMeta[['site_collapsed']] = bombusMeta[['collection_site']]

**check if any pairs of sites lend themselves to collapsing (All samples within 5k of each other)**

In [8]:
sites = bombusMeta['site_collapsed'].unique()
ignore = np.hstack((sites[44:49],sites[-7]))
for i in range(len(sites)):
    if not (sites[i] in ignore):
        for j in range(i+1, len(sites)):
            if not (sites[j] in ignore):
                #print(sites[j])
                collapse, maxd = checkSite([sites[i],sites[j]],
                                           5, bombusMeta, 'Latitude', 'Longitude', 'site_collapsed')
                if collapse:
                    print(sites[i], sites[j], maxd)

Eustis Parking Lot; Colby College Eustis Parking Lot; Colby College  0.3687506999934061
Eustis Parking Lot; Colby College Colby Observatory 0.533342815423434
Eustis Parking Lot; Colby College Colby Heights Steps 0.36279221087044566
Eustis Parking Lot; Colby College Dana walkway; Colby College 0.27846650956433433
Eustis Parking Lot; Colby College Colby College; Pierce Dorm 0.33061499034233516
Eustis Parking Lot; Colby College Quarry Rd; Waterville 2.2823447489135518
Allen Island; Garden Allen Island 0.6205811432910221
Allen Island; Garden Allen Island; Trail 0
Allen Island; Garden Garden; Allen Island 0.060593773942190746
Allen Island; Garden Allen Island North tip 0
Allen Island; Garden Allen Island Garden 0.008017975258743365
Port Clyde Tidal River; St. George 2.9209017130889414
Port Clyde Herrier Gut 0.5288453515674207
Kingfield Kingfield, ME 0
Kingfield New Portland 4.603317269339028
Stratton Stratton, ME 0
Eustis Parking Lot; Colby College  Colby Observatory 0.4522266924282057
Eust

**Check if any sites need splitting**

In [9]:
sites = bombusMeta['site_collapsed'].unique()
for site in sites:
    collapse, maxd = checkSite([site],5, bombusMeta, 'Latitude', 'Longitude', 'site_collapsed')
    if not collapse:
        print(site, maxd)

Eustis Parking Lot; Colby College 0
Allen Island; Garden 0
Port Clyde 0
Eustis; ME 0
Kingfield 0
Stratton 0
Eustis Parking Lot; Colby College  0
Kingfield, ME 0
Stratton, ME 0
nan 0
Johnson Dorm 0
Carrabassett Valley, ME 0
Cherryfield, ME 0
Harrington, ME 0
Great Wass Island 0
Jonesport, ME 0
Vinalhaven 0
Allen Island 0
Seal Harbor Road 0
Owl's Head 0
Solon, ME 0
Jackman, ME 0
West Forks, ME 0
Caratunk, ME 0
Moose River, ME 0
Blue Mountain 0
Bar Harbor 0
2 cal6 st Waterville 0
Rockland 0
Peaks Island 0
Swans Island 0
Hancork Point, ME 0
Eustis  0
East Hancock 0
Swans Island; Roadside 0
Allen Island; Trail 0
Swans Island; Quarry 0
Schooner Head Rd 0
Baileyville Big Stop 0
Monhegan Island 0
Bingham 0
Little Pembroke Brook 0
Agamont Park 0
Beddington 0
Manset Union Church 0
Aurora 0
Round The Island Rd 0
Seboomook Lake  0
Arey Neck Road 0
Brown Family Farm 0
Love Lake Road 0
West Forks 0
Jackman 0
New Portland 0
Northeast Somerset 0
Pequot Rd 0
Asticou 0
Rumford 0
Garden; Allen Island 0
C

**Check if any sites have overlap** (some bees within 5k of each other, not all)

In [10]:
sites = bombusMeta['site_collapsed'].unique()
#sites = ['MDI_Bar_Harbor','MDI_Northeast_Harbor', 'MDI_Acadia']
# loop through sites
for i in range(len(sites)):
    dfi = bombusMeta[(bombusMeta.site_collapsed == sites[i])].dropna(subset = ['Latitude','Longitude'])
    dfi.reset_index(inplace=True)
    # list of sites that are close to given site
    check = []
    # inner loop for each bee at outer site
    for index, row in dfi.iterrows():
        # loop for sites not already checked
        for j in range(i+1,len(sites)):
            if not (sites[j] in check):
                dfj = bombusMeta[(bombusMeta.site_collapsed == sites[j])].dropna(subset = ['Latitude','Longitude'])
                dfj.reset_index(inplace=True)
                for indexj, rowj in dfj.iterrows():
                    d = getDistance((row['Latitude'],row['Longitude']),(rowj['Latitude'],rowj['Longitude']))
                    #print(d)
                    if d < 5:
                        check.append(sites[j])
                        break
    print(sites[i], check)
    

Eustis Parking Lot; Colby College ['Eustis Parking Lot; Colby College ', 'Colby Observatory', 'Colby Heights Steps', 'Dana walkway; Colby College', 'Colby College; Pierce Dorm', 'Quarry Rd; Waterville']
Allen Island; Garden ['Allen Island', 'Allen Island; Trail', 'Garden; Allen Island', 'Allen Island North tip', 'Allen Island Garden']
Port Clyde ['Tidal River; St. George', 'Herrier Gut', 'Rockland']
Eustis; ME ['Eustis ']
Kingfield ['Kingfield, ME', 'New Portland']
Stratton ['Stratton, ME']
Eustis Parking Lot; Colby College  ['Colby Observatory', 'Colby Heights Steps', 'Dana walkway; Colby College', 'Colby College; Pierce Dorm', 'Quarry Rd; Waterville']
Kingfield, ME ['New Portland']
Stratton, ME []
nan []
Johnson Dorm []
Carrabassett Valley, ME []
Cherryfield, ME []
Harrington, ME []
Great Wass Island ['Great Wass Island Bridge', 'Great Wass Island Middle', 'Great Wass Island Tip']
Jonesport, ME ['Great Wass Bridge', 'Great Wass Island Bridge']
Vinalhaven ['Round The Island Rd', 'Arey

**Fill in values where missing**
- A few islands have samples with missing coordinates

In [4]:
bombusMeta.loc[bombusMeta.site_collapsed == 'Monhegan Island', ['Latitude', 'Longitude']] = [43.764, -69.314]
bombusMeta.loc[bombusMeta.site_collapsed == 'Hancork Point, ME', ['Latitude', 'Longitude']] = [44.491123,
                                                                                               -68.233403]
bombusMeta.loc[bombusMeta.site_collapsed == 'Peaks Island', ['Latitude', 'Longitude']] = [43.662035,
                                                                                          -70.185537]
bombusMeta.loc[bombusMeta.site_collapsed == 'Swans Island', ['Latitude', 'Longitude']] = [44.180175,
                                                                                          -68.420100]
bombusMeta.loc[(bombusMeta.site_collapsed == 'Allen_Island') & (bombusMeta['Longitude'].isna()),
               ['Latitude', 'Longitude']] = [43.8755,-69.3113]
bombusMeta.loc[(bombusMeta.site_collapsed == 'MDI_Bar_Harbor') & (bombusMeta['Longitude'].isna()),
               ['Latitude', 'Longitude']] = [44.387874, -68.204671]

**Collapse Sites**

In [5]:
#colby sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Eustis Parking Lot; Colby College ',
                                        'Johnson Dorm','Colby Observatory',
                                       'Colby Heights Steps','Dana walkway; Colby College',
                                       'Colby College; Pierce Dorm','Eustis Parking Lot; Colby College',
                                        '2 cal6 st Waterville','Quarry Rd; Waterville'],
                                                                        'Colby_College')
# palermo sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['271 Parameter Hill Rd',
                                '271 Parmenter Hill Rd; Palermo; ME','344 N Palermo Rd; Palermo; ME'],
                                                                        'Palermo_ME')
#allen island sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Allen Island','Allen Island Garden',
                                    'Allen Island North tip','Allen Island; Garden','Allen Island; Trail',
                                                                        'Garden; Allen Island'],
                                                                        'Allen_Island')
# hancock sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Hancork Point, ME','Animal Hospital'],
                                                                        'Hancock_ME')
# vinalhaven sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Arey Neck Road',
                                        'Pequot Rd','Round The Island Rd', 'Vinalhaven'],
                                                                        'Vinalhaven_ME')
# solon /bingham sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Bingham','Solon, ME'],
                                                                        'Solon_ME')
# brooks /jackson sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Brooks','Jackson'],
                                                                        'Brooks_ME')
# jonesport
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Jonesport, ME','Great Wass Bridge'],
                                                                        'Jonesport_ME')
# great wass bridge
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Great Wass Island Bridge'],
                                                                        'Great_Wass_Bridge')
# great wass island
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Great Wass Island Middle',
                                                                         'Great Wass Island Tip'],
                                                                        'Great_Wass_Island')
# ellsworth sites
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Knowlton Park',
                                                                         'Hancock'],
                                                                        'Ellsworth_ME')
#  jackman 
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Jackman',
                                                                         'Jackman, ME'],
                                                                        'Jackman_ME')
# kingfield
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Kingfield',
                                                                         'Kingfield, ME'],
                                                                        'Kingfield_ME')
# monhegan island
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Monhegan',
                                                                         'Monhegan Island'],
                                                                        'Monhegan_Island')
# stratton 
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Stratton',
                                                                         'Stratton, ME'],
                                                                        'Stratton_ME')
# entirely missing site names 
bombusMeta.loc[bombusMeta.ID_number =='44',['site_collapsed']] = ['Palermo_ME']
bombusMeta.loc[bombusMeta.ID_number =='37',['collection_site','site_collapsed']] = ['Belgrade_Lakes','Belgrade_Lakes']


# fix blue mtn copying error (effects 2 bees)
bombusMeta.loc[bombusMeta.site_collapsed == 'Blue Mountain', ['Latitude','Longitude']] = [44.7129,-70.4183]
# same error with castle island 
bombusMeta.loc[bombusMeta.site_collapsed == 'Castle IslandRd', ['Latitude','Longitude',
                                                    'site_collapsed']] = [44.5108,-69.9050, 'Belgrade_Lakes']

natanis_pond_ids = ['TWM180615-009','TWM180615-010','MY180615-012''MY180615-010','MY180615-012','MY180615-010']
bombusMeta.loc[bombusMeta.ID_number.isin(natanis_pond_ids),'site_collapsed'] = 'Natanis_Pond'

great_wass_ids = ['FJ190628-002','FJ190628-001','JL190628-001','JL190628-002']
bombusMeta.loc[bombusMeta.ID_number.isin(great_wass_ids),'site_collapsed'] = 'Great_Wass_Island'
# copying error
bombusMeta.loc[bombusMeta.ID_number=='FJ190628-003',['Latitude','Longitude']] = [44.4808,-67.5948]
#no site
bangor_ids = ['JL190716-012','JL190716-011']
bombusMeta.loc[bombusMeta.ID_number.isin(bangor_ids),'site_collapsed'] = 'Bangor_ME'
amherst_ids = ['JL190716-010','FJ190716-013','FJ190716-012','JL190716-009']
bombusMeta.loc[bombusMeta.ID_number.isin(amherst_ids),'site_collapsed'] = 'Amherst_ME'
beddington_ids = ['FJ190716-010','FJ190716-011','JL190716-008']
bombusMeta.loc[bombusMeta.ID_number.isin(beddington_ids),'site_collapsed'] = 'Beddington_ME'
pickerel_lake_ids = ['FJ190716-009','FJ190716-006','JL190716-006','FJ190716-008','JL190716-007','FJ190716-007']
bombusMeta.loc[bombusMeta.ID_number.isin(pickerel_lake_ids),'site_collapsed'] = 'Pickerel_Lake'
wesley_ids = ['JL190716-005','JL190716-004','FJ190716-005','FJ190716-004','FJ190716-003',
              'FJ190716-002','JL190716-003','JL190716-001','JL190716-002']
bombusMeta.loc[bombusMeta.ID_number.isin(wesley_ids),'site_collapsed'] = 'Wesley_ME'
bombusMeta.loc[bombusMeta.ID_number=='FJ190716-001','site_collapsed'] = 'Crawford_ME'

# make it easier to work with 
bombusMeta.loc[bombusMeta.site_collapsed.isna(),'site_collapsed'] = 'No_Site'

bombusMeta.loc[bombusMeta.site_collapsed=='Eustis; ME','site_collapsed'] = 'Natanis_Pond'

bombusMeta.loc[bombusMeta.site_collapsed=='Great Wass Island','site_collapsed'] = 'Great_Wass_Island'

bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Herrier Gut',
                                                                         'Port Clyde'],
                                                                        'Coast_Port_Clyde_ME')

bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Tidal River; St. George',
                                                                         'Rockland'],
                                                                        'Port_Clyde_ME')

# fix lat lon
bombusMeta.loc[bombusMeta.site_collapsed=="Owl's Head",['Latitude','Longitude',
                                            'site_collapsed']] = [44.091529, -69.045848,'Owls_Head_ME']


bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Kingfield_ME',
                                                                         'New Portland'],
                                                                        'Kingfield_ME')

bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace('Albion Center',
                                                                        'Albion_ME')
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace('Aurora',
                                                                        'Aurora_ME')

bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace('Baileyville Big Stop',
                                                                        'Baring_ME')
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Seboomook Lake ','Moose River, ME'],
                                                                        'Seboomook_Lake_ME')
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace('Owls neck',
                                                                        'Owls_Head_ME')
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Swans Island',
                                                    'Swans Island; Quarry', 'Swans Island; Roadside'],
                                                                        'Swans_Island_ME')
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Swans Island',
                                                    'Swans Island; Quarry', 'Swans Island; Roadside'],
                                                                        'Swans_Island_ME')
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['The Forks','West Forks'],
                                                                        'The_Forks_ME')
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Northeast Somerset','West Forks, ME'],
                                                                        'S_Jackman_ME')


bombusMeta.loc[bombusMeta.collection_site.isin(["Agamont Park",'Bar Harbor']),
               'site_collapsed'] = 'MDI_Bar_Harbor'
bombusMeta.loc[bombusMeta.collection_site.isin(['Asticou']),
                                               'site_collapsed'] = 'MDI_Northeast_Harbor'
bombusMeta.loc[bombusMeta.collection_site.isin(['Brown Family Farm','Schooner Head Rd']),
               'site_collapsed'] = 'MDI_Acadia'



bombusMeta.loc[(bombusMeta['Latitude']==44.2969) &(bombusMeta['Longitude'] == -68.2406),
               'site_collapsed'] = 'MDI_Northeast_Harbor'
                        
bombusMeta[['site_collapsed']] = bombusMeta[['site_collapsed']].replace(['Manset Union Church'],
                                                                        'MDI_Southwest_Harbor')  
southwest_ids = ['JL190724-006','PF190724-007','JL190724-005','FJ190724-006','PF190724-008''JL190724-004',
                'PF190724-008','JL190724-004']
bombusMeta.loc[bombusMeta.ID_number.isin(southwest_ids),'site_collapsed'] = 'MDI_Southwest_Harbor'

bombusMeta.loc[bombusMeta.collection_site.isin(['Zekes Point Rd']),
               'site_collapsed'] = 'Zekes_Point_Vinalhaven'

## 10k cut off

- same as above but a 10k diameter

In [6]:
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed']]

**Check if anything can be collapsed**

In [13]:
sites = bombusMeta['site_collapsed_10k'].unique()
for i in range(len(sites)):
    for j in range(len(sites)):
        #print(sites[j])
        if i != j:
            collapse, maxd = checkSite([sites[i],sites[j]],
                                        10, bombusMeta, 'Latitude', 'Longitude', 'site_collapsed_10k')
            if collapse:
                print(sites[i], sites[j], maxd)

Allen_Island Coast_Port_Clyde_ME 7.851797410153128
Coast_Port_Clyde_ME Allen_Island 7.851797410153128
Coast_Port_Clyde_ME Port_Clyde_ME 5.331222075586153
Stratton_ME Eustis  6.715834839005879
Harrington, ME Harrington 8.060345543514314
Great_Wass_Island Jonesport_ME 9.119554830570904
Great_Wass_Island Great_Wass_Bridge 6.542313247328654
Jonesport_ME Great_Wass_Island 9.119554830570904
Jonesport_ME Great_Wass_Bridge 2.6501523638248266
Vinalhaven_ME Zekes_Point_Vinalhaven 8.076712874994902
Crawford_ME Wesley_ME 6.062259719887744
Crawford_ME Love Lake Road 5.110442929837136
Wesley_ME Crawford_ME 6.062259719887744
MDI_Northeast_Harbor MDI_Southwest_Harbor 7.41102270877501
MDI_Northeast_Harbor MDI_Acadia 9.524782179304294
Pickerel_Lake Little Pembroke Brook 6.0698948808829485
Port_Clyde_ME Coast_Port_Clyde_ME 5.331222075586153
MDI_Bar_Harbor MDI_Acadia 6.865534181265851
Beddington_ME Beddington 5.530428943196188
MDI_Southwest_Harbor MDI_Northeast_Harbor 7.41102270877501
Eustis  Stratton_ME 

**Check if anything overlaps**

In [14]:
sites = bombusMeta['site_collapsed_10k'].unique()
#sites = ['MDI_Bar_Harbor','MDI_Northeast_Harbor', 'MDI_Acadia']
# loop through sites
for i in range(len(sites)):
    # get things at site
    dfi = bombusMeta[(bombusMeta.site_collapsed_10k == sites[i])].dropna(subset = ['Latitude','Longitude'])
    dfi.reset_index(inplace=True)
    # list of sites that are close to given site
    check = []
    # inner loop for each bee at outer site
    for index, row in dfi.iterrows():
        # loop for sites not already checked
        for j in range(len(sites)):
            if not ((i==j) or (sites[j] in check)):
                dfj = bombusMeta[(bombusMeta.site_collapsed_10k == sites[j])].dropna(subset = ['Latitude','Longitude'])
                dfj.reset_index(inplace=True)
                for indexj, rowj in dfj.iterrows():
                    d = getDistance((row['Latitude'],row['Longitude']),(rowj['Latitude'],rowj['Longitude']))
                    #print(d)
                    if d < 10:
                        check.append(sites[j])
                        break
    print(sites[i], check)
    

Colby_College []
Allen_Island ['Coast_Port_Clyde_ME', 'Port_Clyde_ME']
Coast_Port_Clyde_ME ['Allen_Island', 'Port_Clyde_ME']
Natanis_Pond []
Kingfield_ME []
Stratton_ME ['Eustis ']
No_Site []
Carrabassett Valley, ME []
Cherryfield, ME []
Harrington, ME ['Harrington']
Great_Wass_Island ['Jonesport_ME', 'Great_Wass_Bridge']
Jonesport_ME ['Great_Wass_Island', 'Great_Wass_Bridge']
Vinalhaven_ME ['Zekes_Point_Vinalhaven']
Seal Harbor Road ['Port_Clyde_ME']
Owls_Head_ME []
Solon_ME []
Jackman_ME []
S_Jackman_ME []
Caratunk, ME []
Seboomook_Lake_ME []
Blue Mountain []
Crawford_ME ['Wesley_ME', 'Love Lake Road']
Wesley_ME ['Crawford_ME']
MDI_Northeast_Harbor ['MDI_Southwest_Harbor', 'MDI_Acadia']


KeyboardInterrupt: 

**Collapse Sites**

In [7]:
#
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Zekes_Point_Vinalhaven",
                                                                        'Vinalhaven_ME')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Beddington",
                                                                        'Beddington_ME')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Waldo",
                                                                        'Brooks_ME')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Eustis ",
                                                                        'Stratton_ME')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Beals_ME",
                                                                        'Great_Wass_Island')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace(["Harrington","Harrington, ME"],
                                                                        'Harrington_ME')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Coast_Port_Clyde_ME",
                                                                        'Port_Clyde_ME')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Little Pembroke Brook",
                                                                        'Pickerel_Lake')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Love Lake Road",
                                                                        'Crawford_ME')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("MDI_Acadia",
                                                                        'MDI_Bar_Harbor')
bombusMeta[['site_collapsed_10k']] = bombusMeta[['site_collapsed_10k']].replace("Great_Wass_Bridge",
                                                                        'Great_Wass_Island')


**Look at yearly distribution of sites**

In [8]:
pd.crosstab(bombusMeta['site_collapsed_10k'], bombusMeta['Year'])

Year,2017,2018,2019
site_collapsed_10k,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Albion_ME,1,0,0
Allen_Island,58,63,49
Amherst_ME,0,0,4
Aurora_ME,0,4,0
Bangor_ME,0,0,2
Baring_ME,0,4,0
Beddington_ME,0,4,3
Belgrade_Lakes,6,0,0
Blue Mountain,0,0,10
Brooks_ME,3,0,0


**Save metadata**

In [9]:
bombusMeta.to_csv(os.path.join(datapath, 'BombusMetadata.tsv'), sep = "\t", index = False)