# Gator hunt

The Florida Fish and Wildlife Conservation Commission keeps track of [gators killed by hunters](http://myfwc.com/wildlifehabitats/managed/alligator/harvest/data-export/). A cut of this data lives in `../data/gators.csv`.

Let's take a look.

In [None]:
# import pandas


In [None]:
# read in the csv


In [None]:
# check the output with `head()`


### Check it out

First, let's take a look at our data and examine some of the column values that we might be interested in analyzing. We're already starting to think about the questions we want this data to help us answer.

In [None]:
# check the output with `info()`


In [None]:
# what's the year range, with counts? use `value_counts()` to find out


In [None]:
# let's also use `unique()` to get the carcass size values to see the pattern


### Come up with a list of questions

- What's the longest gator in our data?
- Average length by year?
- How many gators are killed by month?

### Write a function to calculate gator length in inches

Right now, the value for the gator's length is a string following this pattern: `{} ft. {} in.`.

Let's create a new column to get the gator's length in a constant, numeric value: inches.

We're going to write a function to do these steps:
- Given a row of data, capture the feet and inch values in the carcass size column -- we can split the string on 'ft.' and clean up each piece from there
- Multiply feet by 12
- Add that number to the inch value
- `return` the result

We shall call this function on the data frame using the [`.apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) method.

In [None]:
# define a function that accepts a row of data from our dataframe

    # grab the carcass size value

    # use `split()` to separate the feet and inches numbers

    # turn the first number in that list -- the feet -- into a number

    # turn the second number in that list -- the inches -- into a number, after removing the "in." text

    # return a constant inches value

# create a new column that applies that function to every row


👉 Learn more about functions in [this notebook](../appendix/Functions.ipynb).

In [None]:
# check the output with head()


In [None]:
# sort by length descending, check it out with head()


### Count by year

Our friend `value_counts()` is _on it_.

In [None]:
# let's do a simple count by year


### Average length by year

To get the average length of gators by year, we'll run a [pivot table](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html).

In [None]:
# run a pivot table to get average length harvested by year


In [None]:
# print that table

### Treating dates as dates

This data include the date on which the gator was killed, but the date values are being stored as strings. If we want to do some time-based analysis -- comparing the gator hunt by month, or whatever -- we'd want to deal directly with native dates, which means we need to check for null values.

Noting the format (month-day-year), let's see what happens when we use the [`to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) method to convert the dates into native date objects.

In [None]:
# attempt to create a new datetime column for harvest date using the format we see


This error is telling us that some of the values in the `Harvest Date` column are spaces. Let's try again, but this time, we'll pass an _additional_ argument to the `to_datetime()` method: `errors='coerce'`. In other words, if you run into problems converting a value into a date, coerce that value into a null value.

In [None]:
# let's do the same thing but fix that error by using `coerce`


In [None]:
# check the output with `head()`


If you want to doublecheck that the data type is correct, you can access the `dtypes` attribute.

In [None]:
# check the dtypes


👉 [Read more about date formatting here](https://docs.python.org/3/library/datetime.html); also, bookmark [this handy website](http://strftime.org/).

### Gator hunt by month

[According to](http://myfwc.com/media/310257/Alligator-processors.pdf) the Florida Fish and Wildlife Conservation Commission, the gator hunt season is in the fall:

![gatorhunt](../img/gatorhunt.png "gatorhunt")

Let's look at the totals by month:
- Create a new column for the month
- Do value counts by month

In [None]:
# use a lambda function to grab the month value into its own column 


In [None]:
# check the unique values


In [None]:
# use value counts to get carcasses by month for all years


What if we wanted to get a count by month _by year_? Pivot tables to the rescue, again.

In [None]:
# use a pivot table to get carcasses by month by year


I have OCD and those `NaN`s mixed in with our numbers gives me a case of the dang fantods. Let's use the `.fillna()` method to replace those with `0`.

In [None]:
# fill nulls with zeroes
