In your notebook: use a markdown cell to add a brief description of the dataset, including a citation, date of access, and a link to the archive.

## brief description of the dataset
- This data was collected using conducted capture-recapture studies of snowshoe hares. This was done at 5 locales in the Tanana valley, from Tok in the east to Clear in the west. The sampling time was from 1999 to 2002. This data set has 14 columns. In the study population they weren't able to detect declines in apparent survival during declining densities. 

## citation
- Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 (Accessed 2025-10-17).

## date of access
- 10/16/2025

## link to the archive
- https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-bnz.55.22

# 2. Adding an image


![Snow Shoe hair cute!](https://upload.wikimedia.org/wikipedia/commons/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg)

Copyright: Publicly avalible under creative commons CCO 1.0 Public Domain

ALAN SCHMIERER, Set 72157600401137773, ID 18988734889, Original title SNOWSHOE HARE (Lepus americanus) (5-28-2015) quoddy head, washington co, maine -01


# 3. Data loading and preliminary exploration

In [35]:
# Import Libraries
import pandas as pd
import numpy as np

# Read in Data
hares = pd.read_csv("https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701")
hares.head()

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study
0,11/26/1998,,bonrip,1A,414D096A08,,,,1370.0,160.0,,917.0,51,Population
1,11/26/1998,,bonrip,2C,414D320671,,M,,1430.0,,,936.0,51,Population
2,11/26/1998,,bonrip,2D,414D103E3A,,M,,1430.0,,,921.0,51,Population
3,11/26/1998,,bonrip,2E,414D262D43,,,,1490.0,135.0,,931.0,51,Population
4,11/26/1998,,bonrip,3B,414D2B4B58,,,,1710.0,150.0,,933.0,51,Population


In [4]:
hares.dtypes

date           object
time           object
grid           object
trap           object
l_ear          object
r_ear          object
sex            object
age            object
weight        float64
hindft        float64
notes          object
b_key         float64
session_id      int64
study          object
dtype: object

In [5]:
hares.shape

(3380, 14)

In [7]:
hares.isna().sum()

date             0
time          3116
grid             0
trap            12
l_ear           48
r_ear          169
sex            352
age           2111
weight         535
hindft        1747
notes         3137
b_key           47
session_id       0
study          163
dtype: int64

In [8]:
#check maximum weight value
hares["weight"].max()

2365.0

In [9]:
#check minimum weight value
hares["weight"].min()

0.0

In [10]:
#check maximum weight value
hares["hindft"].max()

160.0

In [12]:
#check maximum weight value
hares["hindft"].min()

60.0

In [17]:
hares["notes"].unique()

array([nan, 'No right ear tag', 'Escapee', 'Mortality', 'Mortality ',
       'Old tag lost in L ear',
       'Bunny escaped before second ear tag was added',
       'Rabbit too bloody, released', 'R Front Foot Injured',
       'L Hind Leg Injured',
       'Left Front Foot Injured by Mink. Mink Still Around, Not Shy',
       'Injured Bunny, Released, No Tags', 'Died after release',
       'Dead in trap', 'Dead', 'non-pregnant',
       'pregnant (2 peanut sized babies)', 'pregnant', 'Pregnant',
       'Pregnant; last collar was chewed off',
       '149.074 recapture; collar loose, removed and replaced; non-pregnant',
       'previous collar was chewed off',
       '149.013 came off/removed; replaced',
       '149.033 recapture; collar loose, removed and replaced',
       'previous collar fell off',
       'collar previously chewed off (put back on the same bunny!)',
       'collar broke off, caught in cage', 'dead in trap',
       '149.754 recapture; no VHF signal, removed and replaced',

In [18]:
hares["sex"].unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

## Study question:
Is there a correleation between snowshoe hare wieght and foot size?

| Value | Description |
| ------| ----------- |
| f     | female      |
| m     | male        |
| ?     | unconfirmed |
| p     | unkown      |

In [20]:
# How many times does each unique value in sex appear?
hares["sex"].value_counts()

sex
F     1161
M      730
f      556
m      515
?       40
F?      10
f        4
m        4
f?       3
M?       2
m?       2
pf       1
Name: count, dtype: int64

In [22]:
# checking number without NAs (there were none)
hares["sex"].value_counts(dropna=False)

sex
F      1161
M       730
f       556
m       515
NaN     352
?        40
F?       10
f         4
m         4
f?        3
M?        2
m?        2
pf        1
Name: count, dtype: int64

Do the values in the sex column correspond to the values declared in the metadata?
- No, there are alot of extra values not stated in the meta data

What could have been potential causes for multiple codes?
- Poor training given to data collectors

Are there seemingly repated values? If so, what could be the cause?
- There are 4 duplicates in the df and they seem to be because of accinetal recaptures of the same hare.


In [30]:
hares[hares.duplicated()]

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study
2893,7/1/2011,,bonbs,10a,,,,,,,juvenile,,23,Population
2894,7/1/2011,,bonbs,10a,,,,,,,juvenile,,23,Population
2895,7/1/2011,,bonbs,10a,,,,,,,juvenile,,23,Population
3071,9/11/2012,,bonbs,10d,b2834,b2835,f,j,840.0,114.0,,838.0,31,Population


In [31]:
hares["sex"].unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

Intrsctions
- Gather haress sex column
- use funtion replace() to assign values in collumn to standard name
- f = F = female, M and m = male ? = unknown

In [36]:
# set condition to select from
conditions = [
    (hares["sex"].isin(["m", "M", "m_",])),
    (hares["sex"].isin(["f", "F", "f_",])),
]

#set choices corresponding to array
choice = ["male", "female"]

#otherwise sex = unknow
default = "unknown"
 
# use `np.select()` to index through sex column and make a new col with outputs
hares["sex_simple"] = np.select(conditions, choice, default = default)

In [37]:
# check that this wroked and we only have our desire value
hares["sex_simple"].value_counts()

sex_simple
female     1717
male       1245
unknown     418
Name: count, dtype: int64

In [38]:
hares.groupby("sex_simple").mean("weight")

Unnamed: 0_level_0,weight,hindft,b_key,session_id
sex_simple,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,1366.920372,131.011161,482.617407,57.75597
male,1352.145553,133.38301,490.851406,49.228112
unknown,1176.511111,103.469697,615.119681,46.576555
