# Gator hunt

The Florida Fish and Wildlife Conservation Commission keeps track of [gators killed by hunters](http://myfwc.com/wildlifehabitats/managed/alligator/harvest/data-export/). A cut of this data lives in `../data/gators.csv`.

Let's take a look.

In [None]:
# import pandas


In [None]:
# read in the CSV


In [None]:
# check the output with `.head()`


### Check it out

First, let's take a look at our data and examine some of the column values that we might be interested in analyzing. We're already starting to think about the questions we want this data to help us answer.

In [None]:
# get the info()


In [None]:
# what's the year range, with counts?


In [None]:
# let's also peep the carcass size values to get the pattern


### Come up with a list of questions

- What's the longest gator in our data?
- Average length by year?
- How many gators are killed by month?

### Write a function to calculate gator length in inches

Right now, the value for the gator's length is a string following this pattern: `{} ft. {} in.`.

Let's create a new column to get the gator's length in a constant, numeric value: inches.

We're going to write a function to do these steps:
- Given a row of data, capture the feet and inch values in the carcass size column -- we can split the string on 'ft.' and clean up each piece from there
- Multiply feet by 12
- Add that number to the inch value
- `return` the result

We shall call this function on the data frame using the [`.apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) method.

In [None]:
# define a function called `get-inches` that accepts one row as an argument

    # grab the carcass size value

    # split the value on 'ft.'

    # get the feet value, strip whitespace, coerce to integer

    # get the inch value, replace 'in.', strip whitespace, coerce to integer

    # return inches plus feet*12


In [None]:
# create a new column and fill it by applying our function to every row using `.apply()`


ðŸ‘‰ Learn more about functions in [this notebook](../reference/Functions.ipynb).

In [None]:
# check the output with head()


In [None]:
# sort by length descending, check it out with head()


### Count by year

Our friend `value_counts()` is _on it_.

In [None]:
# get value counts by year


### Average length by year

To get the average length of gators by year, we'll run a [pivot table](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html).

ðŸ‘‰ For more details on creating pivot tables, [see this notebook](../reference/Grouping%20data%20in%20pandas.ipynb#Pivot-tables).

In [None]:
# get average length harvested by year
# values is length
# index is Year
# aggfunc is 'mean'


In [None]:
# see results


### Treating dates as dates

This data include the date on which the gator was killed, but the date values are being stored as strings. If we want to do some time-based analysis -- comparing the gator hunt by month, or whatever -- we'd want to deal directly with native dates.

Noting the format (month-day-year), let's use the [`to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) method to convert the dates into native date objects. We'll tell pandas to use the [correct date specification](http://strftime.org/) and to coerce errors to null values rather than throw a giant exception.

In [None]:
# format is %m-%d-%Y
# errors='coerce'


In [None]:
# check the output with `head()`


If you want to doublecheck that the data type is correct, you can access the `dtypes` attribute.

In [None]:
# check dtypes


[You can read more about date formatting here](https://docs.python.org/3/library/datetime.html).

### Gator hunt by month

[According to](http://myfwc.com/media/310257/Alligator-processors.pdf) the Florida Fish and Wildlife Conservation Commission, the gator hunt season is in the fall:

![gatorhunt](../img/gatorhunt.png "gatorhunt")

Let's look at the totals by month:
- Create a new column for the month using a [lambda function](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions)
- Do value counts by month

In [None]:
# create a new column, 'month', and fill it by apply a lambda function to extract the month


In [None]:
# check unique values in our new column


In [None]:
# do value_counts on the month column and sort_index()


What if we wanted to get a count by month _by year_? Pivot tables to the rescue, again.

We'll provide the `pivot_table` method with five things:
- `df` specifies what data frame we're pivoting
- `index='month'` specifies the column we're grouping on
- `columns='Year'` specifies the columns value
- `aggfunc='count'` tells pandas how to aggregate the data -- we want to count the values
- `values='length_in'` specifies the column of data to apply the aggregation to -- we're going to count up every record of a carcass that has a length

In [None]:
# create a pivot table called by_month_by_year


In [None]:
# check the output


I have OCD and those `NaN`s mixed in with our numbers gives me a case of the dang fantods. Let's use the `.fillna()` method to replace those with `0`.

In [None]:
# run fillna(0) on it
