## 9.1 Data description & 9.2 Add an image

### You should have your own description of the data

![image description](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/SNOWSHOE_HARE_%28Lepus_americanus%29_%288-20-13%29_stunner_c_g%2C_n-w_conejos_co%2C_co_%282%29_%289592453799%29.jpg/660px-SNOWSHOE_HARE_%28Lepus_americanus%29_%288-20-13%29_stunner_c_g%2C_n-w_conejos_co%2C_co_%282%29_%289592453799%29.jpg)

In [2]:
# Import the package (pandas and numpy) 
import pandas as pd
import numpy as np

## 9.3 Data loading

In [12]:
# read in data
hares = pd.read_csv('https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-bnz.55.22&entityid=f01f5d71be949b8c700b6ecd1c42c701')
hares.head()

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study
0,11/26/1998,,bonrip,1A,414D096A08,,,,1370.0,160.0,,917.0,51,Population
1,11/26/1998,,bonrip,2C,414D320671,,M,,1430.0,,,936.0,51,Population
2,11/26/1998,,bonrip,2D,414D103E3A,,M,,1430.0,,,921.0,51,Population
3,11/26/1998,,bonrip,2E,414D262D43,,,,1490.0,135.0,,931.0,51,Population
4,11/26/1998,,bonrip,3B,414D2B4B58,,,,1710.0,150.0,,933.0,51,Population


## 9.4 Metadata exploration

## 9.5 Detecting messy values

In [6]:
# use `value_counts()` to check counts of unique values in the sex column 
hares.sex.value_counts()

sex
F     1161
M      730
f      556
m      515
?       40
F?      10
f        4
m        4
f?       3
M?       2
m?       2
pf       1
Name: count, dtype: int64

In [7]:
# check if it has nans (yes)
print(hares.sex.hasnans)

# add the dropna=False parameter
hares.sex.value_counts(dropna=False)


True


sex
F      1161
M       730
f       556
m       515
NaN     352
?        40
F?       10
f         4
m         4
f?        3
M?        2
m?        2
pf        1
Name: count, dtype: int64

In [8]:
# get the unique values for the sex column
hares.sex.unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

## 9.6 Clean values

In [9]:
# create a new column called `sex_simple` where
#   'F','f', and 'f ' get assigned to 'female'
#   'M','m', and 'm ' get assigned to 'male'
#   anything else gets assigned np.nan
# HINT: use np.select like we did on Monday

conditions = [(hares.sex == 'F') | (hares.sex == 'f') | (hares.sex == 'f '),
              (hares.sex == 'M') | (hares.sex == 'm') | (hares.sex == 'm ')]

choices = ['female', 'male']

hares['sex_simple'] = np.select(conditions, choices, default=np.nan)

# check the counts of unique values in the new `sex_simple` column
hares.sex_simple.value_counts(dropna=False)


sex_simple
female    1721
male      1249
nan        410
Name: count, dtype: int64

## 9.7 Calculate mean weight

In [10]:
# Calculate mean weight in the new sex groups
hares.groupby('sex_simple').weight.mean()

sex_simple
female    1365.164792
male      1349.935542
nan       1193.364055
Name: weight, dtype: float64