# Gator data

The Florida Fish and Wildlife Conservation Commission keeps track of [gators killed by hunters](http://myfwc.com/wildlifehabitats/managed/alligator/harvest/data-export/). A cut of this data lives in `../data/gators.csv`.

Let's take a look!

In [None]:
# import pandas
import pandas as pd

In [None]:
# create a data frame
df = pd.read_csv('../data/gators.csv')

In [None]:
# check it out with head()
df.head()

In [None]:
# get the info()
df.info()

In [None]:
# what's the year range?
df['Year'].unique()

In [None]:
# let's peep the carcass size values to get the pattern
df['Carcass Size'].unique()

In [None]:
# let's coerce the values in the date column to native dates

# first, we need to remove some data that just has a space in the date column
# we found these by carefully reading the error messages that cropped up
# when we first tried to coerce the date column without removing these records
df_with_date = df[df['Harvest Date'] != ' ']

# how many did we discard?
discarded = len(df) - len(df_with_date)
print('Discarded {} records.'.format(discarded))

# now coerce column values with to_datetime()
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
df_with_date['Harvest Date'] = pd.to_datetime(df_with_date['Harvest Date'], format='%m-%d-%Y')

In [None]:
# check the output with head()
df_with_date.head()

In [None]:
'''
Now let's create a new column to get the gator's length in a constant value: inches

We're going to write a function to do these steps:
    - given a row of data, capture the feet and inch values in the carcass size column
    - multiply feet by 12
    - add that to the inch value and return the result

We will then call this function on the data frame using the apply() method
'''

def get_inches(row):
    carcass_size = row['Carcass Size']
    ft_, in_ = carcass_size.split('ft.')
    inches = int(in_.replace('in.', '').strip())
    feet = int(ft_.strip())
    return inches + (feet * 12)

df_with_date['length_in'] = df_with_date.apply(get_inches, axis=1)

In [None]:
# check the output with head()
df_with_date.head()

In [None]:
# sort by length descending, check it out with head()
df_with_date.sort_values('length_in', ascending=False).head()

In [None]:
# get average length harvested by year
length_by_year = pd.pivot_table(df_with_date, values='length_in', index=['Year'])

print(length_by_year)

In [None]:
# what else?