# Snowshoe hares at Bonanza Creek Experimental Forest

This dataset contains Snowshoe hair densities at 5 locales in the Tanana valley. The data was collected from 1999 to 2002. This dataset does not contain sensitive data. There is no publication associated with this data.

Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. 
https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 
Accessed 2024-10-17

In [27]:
import pandas as pd
import numpy as np

![Snowshoe hare](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg/1452px-SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg?20170313021652)

Photo by: ALAN SCHMIERER

In [2]:
hares = pd.read_csv('https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-bnz.55.22&entityid=f01f5d71be949b8c700b6ecd1c42c701')

In [3]:
# Dimensions
hares.shape

(3380, 14)

In [4]:
# Data types
hares.dtypes

date           object
time           object
grid           object
trap           object
l_ear          object
r_ear          object
sex            object
age            object
weight        float64
hindft        float64
notes          object
b_key         float64
session_id      int64
study          object
dtype: object

In [5]:
# Columns with na values
hares.isna().sum()

date             0
time          3116
grid             0
trap            12
l_ear           48
r_ear          169
sex            352
age           2111
weight         535
hindft        1747
notes         3137
b_key           47
session_id       0
study          163
dtype: int64

In [6]:
# Min and max of weight and hind ft
hares.describe()

Unnamed: 0,weight,hindft,b_key,session_id
count,2845.0,1633.0,3333.0,3380.0
mean,1346.081547,130.872627,500.640864,53.232249
std,345.160112,16.155295,299.421121,33.171355
min,0.0,60.0,1.0,1.0
25%,1180.0,128.0,235.0,22.0
50%,1400.0,135.0,464.0,55.0
75%,1580.0,140.0,755.0,82.0
max,2365.0,160.0,1034.0,113.0


In [9]:
print("Unique values for study include", hares['study'].unique())
print("Unique values for trap include", hares['trap'].unique())
print("Unique values for sex include", hares['sex'].unique())
#print("Unique values for notes include", hares['notes'].unique())

Unique values for study include ['Population' 'Collar' nan 'Metabolic' 'Metabolic/Collar']
Unique values for trap include ['1A' '2C' '2D' '2E' '3B' '3D' '4A' '4B' '4C' '4E' '5A' '5C' '5D' '5E'
 '10C' '1C' '1E' '2A' '2B' '3C' '3E' '5B' '6A' '6B' '6C' '7B' '7C' '7E'
 '8A' '8B' '8E' '9A' '9D' '1D' '6E' '7D' '8C' '8D' '9B' '3A' '10B' '1B'
 '7A' '9E' '4D' '10A' '6D' '9C' '10D' '10E' '10b' '2a' '2b' '2d' '3b' '4a'
 '4c' '4e' '5b' '6c' '7a' '7b' '7d' '7e' '8e' '9a' '1b' '2c' '2e' '3c'
 '1e' '3e' '5d' '3d' '4d' '7c' '8c' '10c' '1c' '1d' '9d' '5e' '6a' '8a'
 '8b' '6b' '10e' '6e' nan '4b' '5c' '9c' '10a' '5a' '9b' '9e' '6d' '1a'
 '3a' '10d' '8d' '4f' '5f' '3f' '2f' '2g' '5g' '4g' '1g' '7f' '6f' '6g'
 '3g' '4c ' '4e ' '1e ' '1b ' '2b ' '6b ' '2c ' '5c ' '4b ']
Unique values for sex include [nan 'M' 'F' '?' 'F?' 'M?' 'pf' 'm' 'f' 'f?' 'm?' 'f ' 'm ']


In [8]:
hares.tail()

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study
3375,8/8/2002,18:00:00,bonrip,1b,1201,1202,,,1400.0,,,63.0,64,Population
3376,8/8/2002,6:00:00,bonrip,4b,1201,1202,,,,,,63.0,64,Population
3377,8/7/2002,,bonrip,4b,1217,1218,,,1000.0,134.0,,69.0,64,Population
3378,8/8/2002,,bonrip,6d,1217,1218,,,990.0,,,69.0,64,Population
3379,8/6/2002,,bonrip,4b,1058,1060,M,,1460.0,119.0,,32.0,64,Population


| Sex |     Definitions     |
|-----|---------------------|
|  m  |         male        |
|  f  |        female       |
|  m? | male (not confirmed)|

In [19]:
hares['sex'].value_counts()

sex
F     1161
M      730
f      556
m      515
?       40
F?      10
f        4
m        4
f?       3
M?       2
m?       2
pf       1
Name: count, dtype: int64

In [21]:
# Include NaN value counts
hares['sex'].value_counts(dropna = False)

sex
F      1161
M       730
f       556
m       515
NaN     352
?        40
F?       10
f         4
m         4
f?        3
M?        2
m?        2
pf        1
Name: count, dtype: int64

The values do not correspond to the values declared in the metadata. A potential cause of multiple codes could be multiple data collectors that weren't in agreement. Potential erros from data entry. Some values seem to be repeated: f? and F? seem to be the same but are recorded as separate values.

In [26]:
hares['sex'].str.lower().str.strip().value_counts(dropna = False)

sex
f      1721
m      1249
NaN     352
?        40
f?       13
m?        4
pf        1
Name: count, dtype: int64

## Brainstorm:
We would likely use string detect to detect "f" or "m" to combine those values. Anything that is just "?"  gets assigned to NaN.

In [39]:
condition = [hares['sex'].isin(["F", "f", "f_"]),
            hares['sex'].isin(["M", "m", "m_"])]

gender = ["female", "male"]

hares['sex_simple'] = np.select(condition, gender, default = np.nan)

print(hares['sex_simple'])

0        nan
1       male
2       male
3        nan
4        nan
        ... 
3375     nan
3376     nan
3377     nan
3378     nan
3379    male
Name: sex_simple, Length: 3380, dtype: object


In [41]:
hares.groupby('sex_simple').weight.mean()

sex_simple
female    1366.920372
male      1352.145553
nan       1176.511111
Name: weight, dtype: float64

In [None]:
import pandas as pd
import numpy as np

hares = pd.read_csv('https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-bnz.55.22&entityid=f01f5d71be949b8c700b6ecd1c42c701')

hares['sex'].str.lower().str.strip().value_counts(dropna = False)

condition = [hares['sex'].isin(["F", "f", "f_"]),
            hares['sex'].isin(["M", "m", "m_"])]

gender = ["female", "male"]

hares['sex_simple'] = np.select(condition, gender, default = np.nan)

print(hares['sex_simple'])

hares.groupby('sex_simple').weight.mean()