# Snowshoe hares at Bonanza Creek Experimental Forest
### Week 3 discussion section

### 1. Archive exploration

This data contains capture-recapture study data of the physical attributes of snowshoe hares in the Bonanza Creek Experimental Forest, collected from 1999 to 2012.

Citation: Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 (Accessed 2025-10-16).

![Snowshoe hare: Photograph by Alan Schmierer](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg/1452px-SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg?20170313021652)

### 3. Data loading and preliminary exploration

In [62]:
import pandas as pd
import numpy as np

In [80]:
url = 'https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701'

hares = pd.read_csv(url)

In [13]:
hares.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3380 entries, 0 to 3379
Data columns (total 14 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        3380 non-null   object 
 1   time        264 non-null    object 
 2   grid        3380 non-null   object 
 3   trap        3368 non-null   object 
 4   l_ear       3332 non-null   object 
 5   r_ear       3211 non-null   object 
 6   sex         3028 non-null   object 
 7   age         1269 non-null   object 
 8   weight      2845 non-null   float64
 9   hindft      1633 non-null   float64
 10  notes       243 non-null    object 
 11  b_key       3333 non-null   float64
 12  session_id  3380 non-null   int64  
 13  study       3217 non-null   object 
dtypes: float64(3), int64(1), object(10)
memory usage: 369.8+ KB


In [14]:
hares.isna().sum()

date             0
time          3116
grid             0
trap            12
l_ear           48
r_ear          169
sex            352
age           2111
weight         535
hindft        1747
notes         3137
b_key           47
session_id       0
study          163
dtype: int64

In [21]:
hares[['weight', 'hindft']].max()

weight    2365.0
hindft     160.0
dtype: float64

In [20]:
hares[['weight', 'hindft']].min()

weight     0.0
hindft    60.0
dtype: float64

In [40]:
hares['sex'].unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

Is there an association between weight and hind feet length?

### 4. Detecting messy values

| Value   | Description|
|---------|------------|
| m       |  male      |
| f       | female     |
| m?      | male not confirmed |
| f?      | female not confirmed |

In [58]:
hares['sex'].value_counts()

sex
F     1161
M      730
f      556
m      515
?       40
F?      10
f        4
m        4
f?       3
M?       2
m?       2
pf       1
Name: count, dtype: int64

In [59]:
hares['sex'].value_counts(dropna = False)

sex
F      1161
M       730
f       556
m       515
NaN     352
?        40
F?       10
f         4
m         4
f?        3
M?        2
m?        2
pf        1
Name: count, dtype: int64

The values in the sex column don't match the metadata, possibly due to citizen science entries varying slightly and unclear sampling methods/data entry errors. There are spaces, capital and non capital.

In [61]:
# Confirm duplicate values
hares['sex'].nunique()

12

### 5. Brainstorm

Remove white space, tolower, delete m? and f?

### 6. Clean values

In [81]:
# Or x = hares.sex, conditons = [x.isisn(["m..."]), x.isin(["f..."])]

conditions = [
    ((hares['sex'] == 'm') | (hares['sex'] == 'm_')|(hares['sex'] == 'M')),
    ((hares['sex'] == 'f') | (hares['sex'] == 'f_')| (hares['sex'] == 'F')),
]
choices = ['Male', 'Female']
hares['simple_sex'] = np.select(conditions, choices, default=np.nan)

In [85]:
hares.head(10)

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study,simple_sex
0,11/26/1998,,bonrip,1A,414D096A08,,,,1370.0,160.0,,917.0,51,Population,
1,11/26/1998,,bonrip,2C,414D320671,,M,,1430.0,,,936.0,51,Population,Male
2,11/26/1998,,bonrip,2D,414D103E3A,,M,,1430.0,,,921.0,51,Population,Male
3,11/26/1998,,bonrip,2E,414D262D43,,,,1490.0,135.0,,931.0,51,Population,
4,11/26/1998,,bonrip,3B,414D2B4B58,,,,1710.0,150.0,,933.0,51,Population,
5,11/26/1998,,bonrip,3D,414D193011,,F,,1890.0,145.0,,926.0,51,Population,Female
6,11/26/1998,,bonrip,4A,414D0F5B3D,,,,2170.0,140.0,,920.0,51,Population,
7,11/26/1998,,bonrip,4B,414D12350C,,,,2170.0,,,923.0,51,Population,
8,11/26/1998,,bonrip,4C,414D197C34,,M,,1510.0,134.0,,927.0,51,Population,Male
9,11/26/1998,,bonrip,4E,414D1D0559,,M,,1590.0,,,928.0,51,Population,Male


### 7. Calculate mean weight

In [84]:
hares.groupby(by = "simple_sex")['weight'].mean()

simple_sex
Female    1366.920372
Male      1352.145553
nan       1176.511111
Name: weight, dtype: float64

Female snowshoe hares tend to weigh about 14 more grams than male snowshoe hares.

### 8. Workflow

In [1]:
import pandas as pd
import numpy as np

# Import data
url = 'https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701'

hares = pd.read_csv(url)

# Clean sex value column and calculate mean weight
conditions = [
    ((hares['sex'] == 'm') | (hares['sex'] == 'm_')|(hares['sex'] == 'M')),
    ((hares['sex'] == 'f') | (hares['sex'] == 'f_')| (hares['sex'] == 'F')),
]
choices = ['Male', 'Female']
hares['simple_sex'] = np.select(conditions, choices, default=np.nan)

# Calculate mean weight
hares.groupby(by = "simple_sex")['weight'].mean()

simple_sex
Female    1366.920372
Male      1352.145553
nan       1176.511111
Name: weight, dtype: float64