# [Seattle Pet Names][1]

Seattle's open data portal has a dataset of registered pets [here](https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb). While they don't include the sex or age of the animal, they were kind enough to leave in the license issue date, animal's name, species, breed, and zip code. This should open up some fun explorations!

h/t to [Jacqueline Nolis](https://twitter.com/skyetetra/status/1093737135847309312) for sharing this data!

A few articles examined the most popular pet names in 2018, one from [Seattle](https://seattle.curbed.com/2019/1/2/18165658/seattle-popular-pet-names-2018) specifically, and another from [Australia](https://www.countryliving.com/uk/wildlife/pets/a25302522/2018s-popular-pet-names-bella-luna-max/). 

[1]: https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-03-26

## Data Dictionary

|variable           |class     |description |
|:------------------|:---------|:-----------|
|license_issue_date | date | Date the animal was registered with Seattle           |
|license_number     | numeric | Unique license number          |
|animals_name       |character | Animal's name          |
|species            |character | Animal's species (dog, cat, goat, etc)           |
|primary_breed      |character | Primary breed of the animal          |
|secondary_breed    |character | Secondary breed if mixed          |
|zip_code           | numeric | Zip code animal registered under           |

## Setup

In [None]:
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sns.set()

%config InlineBackend.figure_formats = ['retina']

## Get the data!

In [None]:
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-26/seattle_pets.csv"
seattle_pets = pd.read_csv(
    url, 
    parse_dates=['license_issue_date'],
    dtype = {'species': 'category'})

### Data types, shape, etc.

In [None]:
seattle_pets.info(True)

There are 52,519 observations in total, and we have a handful of NAs in the `animals_name` and `zip_code` fields and about half of the `secondary_breed` entries are NA.

In [None]:
seattle_pets.describe()

Looks like only 4 different species in total (which is why we parsed it as a categorical). Let's see what the breakdown is between the species.

In [None]:
seattle_pets['species'].value_counts(normalize=True)

Roughly two-thirds of all pets registered were dogs and one-third cats, and just a handful of goats and pigs (weird). Now, let's take a look at the breed breakdown.

In [None]:
seattle_pets['primary_breed'].value_counts()[:15]

And, the 

In [None]:
seattle_pets['license_issue_date'].hist(bins=30);

In [None]:
seattle_pets['license_issue_date'].dt.year.value_counts().sort_index(ascending=False)

In [None]:
from matplotlib.ticker import StrMethodFormatter

ax = seattle_pets.loc[seattle_pets['species'].isin(['Cat', 'Dog']), 'primary_breed'] \
    .value_counts(normalize=True)[:10] \
    .sort_values(ascending=True) \
    .plot(kind='barh') \
    .xaxis.set_major_formatter(StrMethodFormatter('{x:.1%}'));

In [None]:
seattle_pets[seattle_pets['species'].isin(['Cat', 'Dog'])] \
    .groupby(['species', 'primary_breed']) \
    .size() \
    .reset_index(name='count') \
    .groupby('species') \
    .sort_values()
    .nlargest(3)

In [None]:
df = seattle_pets[seattle_pets['species'].isin(['Cat', 'Dog'])] \
    .groupby(['species', 'primary_breed']) \
    .size() \
    .reset_index(name='count')
df

In [None]:
df

In [None]:
g = sns.FacetGrid(df, col='species')

In [None]:
g = sns.FacetGrid(df, col='species')
def func(*args, **kwargs):
    print(args)
    print(kwargs)
    #plt.scatter(*args, **kwargs)
g.map(func);