# Gator data

The Florida Fish and Wildlife Conservation Commission keeps track of [gators killed by hunters](http://myfwc.com/wildlifehabitats/managed/alligator/harvest/data-export/). A cut of this data lives in `../data/gators.csv`.

Let's take a look!

In [1]:
# import pandas
import pandas as pd

In [2]:
# create a data frame
df = pd.read_csv('../data/gators.csv')

In [None]:
# check it out with head()
df.head()

### Check it out

First, let's take a look at our data and examine some of the column values that we might be interested in analyzing. We're already starting to think about the questions we want this data to help us answer.

In [None]:
# get the info()
df.info()

In [None]:
# what's the year range, with counts?
df['Year'].value_counts()

In [None]:
# let's also peep the carcass size values to get the pattern
df['Carcass Size'].unique()

In [None]:
'''
Let's create a new column to get the gator's length in a constant value: inches

We're going to write a function to do these steps:
    - given a row of data, capture the feet and inch values in the carcass size column
    - multiply feet by 12
    - add that to the inch value and return the result

We will then call this function on the data frame using the apply() method
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
'''

def get_inches(row):
    carcass_size = row['Carcass Size']
    ft_, in_ = carcass_size.split('ft.')
    inches = int(in_.replace('in.', '').strip())
    feet = int(ft_.strip())
    return inches + (feet * 12)

df['length_in'] = df.apply(get_inches, axis=1)

👉 Learn more about functions in [this notebook](../appendix/Functions.ipynb).

In [None]:
# check the output with head()
df.head()

In [None]:
# sort by length descending, check it out with head()
df.sort_values('length_in', ascending=False).head()

In [None]:
# get average length harvested by year
length_by_year = pd.pivot_table(df, values='length_in', index=['Year'])

print(length_by_year)

### Treating dates as dates

This data include the date on which the gator was killed, but the date values are being stored as strings. If we want to do some time-based analysis -- comparing the gator hunt by month, or whatever -- we'd want to deal directly with native dates, which means we need to check for null values. According to `info()`, which we used earlier, this column doesn't have any null values.

🤔HMMMMMMMMM OK let's run with it.

Noting the format (month-day-year), what happens when we use the [`to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) method to convert the dates into native date objects?

In [3]:
df['Harvest Date'] = pd.to_datetime(df['Harvest Date'], format='%m-%d-%Y')

ValueError: time data ' ' does not match format '%m-%d-%Y' (match)

Womp womp. Looks like some of the values in that column are spaces, which are not read as nulls. We'll need to remove those by filtering them out.

In [None]:
# remove some data that just has a space in the date column
df_with_date = df[df['Harvest Date'] != ' ']

# how many did we discard?
# you would want to note this in your story
discarded = len(df) - len(df_with_date)
print('Discarded {} records.'.format(discarded))

# NOW we can coerce column values with to_datetime()
df_with_date['Harvest Date'] = pd.to_datetime(df_with_date['Harvest Date'], format='%m-%d-%Y')

In [None]:
# check the output with head()
df_with_date.head()

👉 [Read more about date formatting here](https://docs.python.org/3/library/datetime.html); also, bookmark [this handy website](http://strftime.org/).

### Gator hunt by month

[According to](http://myfwc.com/media/310257/Alligator-processors.pdf) the Florida Fish and Wildlife Conservation Commission, the gator hunt season is in the fall:

![gatorhunt](../img/gatorhunt.png "gatorhunt")

Let's look at the totals by month:
- Create a new column for the month
- Do value counts by month

In [None]:
df_with_date['month'] = df_with_date['Harvest Date'].apply(lambda x: x.month)

In [None]:
df_with_date.month.value_counts().sort_index()

In [None]:
# what else?