# More Pandas

### Introduction
You have decided that you want to start your own animal shelter, but you want to get an idea of what that will entail and to get more information about planning. In this lecture, we'll look at a real data set collected by Austin Animal Center over several years and use our pandas skills from the last lecture and learn some new ones in order to explore this data further.

#### Our goals today are to be able to: <br/>

- Apply and use `.map()` and `.applymap()` from the Pandas library
- Explain what a groupby object is and split a DataFrame using `.groupby()`
- Explain lambda functions and use them on a DataFrame
- Reshape a DataFrame using joins, merges, pivoting, stacking, and melting
- Use one-hot encoding to make use of categorical variables

#### Getting started

Let's take a moment to download and to examine the [Austin Animal Center data set](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238/data). What kinds of questions can we ask this data and what kinds of information can we get back?

Let's take a look at the data:

In [61]:
import numpy as np
import pandas as pd
animals = pd.read_csv('/Users/jarodc33/Downloads/Austin_Animal_Center_Outcomes.csv')
animals.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A789027,Lennie,02/17/2019 11:44:00 AM,02/17/2019 11:44:00 AM,02/13/2017,Adoption,,Dog,Neutered Male,2 years,Chihuahua Shorthair Mix,Cream
1,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
2,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
3,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black
4,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby


What do we notice about this dataset?

In [62]:
animals.isnull()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,False,False,False,False,False,False,True,False,False,False,False,False
1,False,False,False,False,False,False,True,False,False,False,False,False
2,False,True,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,True,False,False,False,False,False
4,False,False,False,False,False,False,True,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...
114913,False,True,False,False,False,False,False,False,False,False,False,False
114914,False,False,False,False,False,False,True,False,False,False,False,False
114915,False,True,False,False,False,False,True,False,False,False,False,False
114916,False,False,False,False,False,False,True,False,False,False,False,False


In [63]:
animals.isnull().sum()

Animal ID               0
Name                36075
DateTime                0
MonthYear               0
Date of Birth           0
Outcome Type            6
Outcome Subtype     62943
Animal Type             0
Sex upon Outcome        4
Age upon Outcome       27
Breed                   0
Color                   0
dtype: int64

In [64]:
animals.fillna("Unknown")

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A789027,Lennie,02/17/2019 11:44:00 AM,02/17/2019 11:44:00 AM,02/13/2017,Adoption,Unknown,Dog,Neutered Male,2 years,Chihuahua Shorthair Mix,Cream
1,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,Unknown,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
2,A674754,Unknown,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
3,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,Unknown,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black
4,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,Unknown,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby
...,...,...,...,...,...,...,...,...,...,...,...,...
114913,A760365,Unknown,10/18/2017 01:27:00 PM,10/18/2017 01:27:00 PM,10/17/2016,Transfer,Partner,Cat,Intact Female,1 year,Domestic Shorthair Mix,Silver Tabby
114914,A767465,Loco,03/01/2018 06:28:00 PM,03/01/2018 06:28:00 PM,03/01/2014,Return to Owner,Unknown,Dog,Neutered Male,4 years,Chihuahua Shorthair Mix,Black/Cream
114915,A774386,Unknown,06/23/2018 11:59:00 AM,06/23/2018 11:59:00 AM,04/07/2018,Adoption,Unknown,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Brown Tabby
114916,A772554,Muneca,05/21/2018 12:59:00 PM,05/21/2018 12:59:00 PM,11/01/2012,Return to Owner,Unknown,Dog,Spayed Female,5 years,Norfolk Terrier Mix,Tan


In [65]:
animals.fillna(np.nan)


Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A789027,Lennie,02/17/2019 11:44:00 AM,02/17/2019 11:44:00 AM,02/13/2017,Adoption,,Dog,Neutered Male,2 years,Chihuahua Shorthair Mix,Cream
1,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
2,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
3,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black
4,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby
...,...,...,...,...,...,...,...,...,...,...,...,...
114913,A760365,,10/18/2017 01:27:00 PM,10/18/2017 01:27:00 PM,10/17/2016,Transfer,Partner,Cat,Intact Female,1 year,Domestic Shorthair Mix,Silver Tabby
114914,A767465,Loco,03/01/2018 06:28:00 PM,03/01/2018 06:28:00 PM,03/01/2014,Return to Owner,,Dog,Neutered Male,4 years,Chihuahua Shorthair Mix,Black/Cream
114915,A774386,,06/23/2018 11:59:00 AM,06/23/2018 11:59:00 AM,04/07/2018,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Brown Tabby
114916,A772554,Muneca,05/21/2018 12:59:00 PM,05/21/2018 12:59:00 PM,11/01/2012,Return to Owner,,Dog,Spayed Female,5 years,Norfolk Terrier Mix,Tan


### 1. Applying and using map and applymap from the Pandas library

The Pandas library has several useful tools built in. Let's explore some of them.

#### DataFrame.applymap() and Series.map()

The ```.applymap()``` method takes a function as input that it will then apply to every entry in the dataframe.

In [66]:
animals.applymap(str).head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color
0,A789027,Lennie,02/17/2019 11:44:00 AM,02/17/2019 11:44:00 AM,02/13/2017,Adoption,,Dog,Neutered Male,2 years,Chihuahua Shorthair Mix,Cream
1,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff
2,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby
3,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black
4,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby


The .map() method takes a function as input that it will then apply to every entry in the Series.

In [67]:
# This line of code will split the IDs into two parts and add the parts as new columns.

animals[['Animal ID Prefix', 'Animal ID Num']] =\
animals['Animal ID'].str.split('A', expand=True)

In [68]:
animals.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,Animal ID Prefix,Animal ID Num
0,A789027,Lennie,02/17/2019 11:44:00 AM,02/17/2019 11:44:00 AM,02/13/2017,Adoption,,Dog,Neutered Male,2 years,Chihuahua Shorthair Mix,Cream,,789027
1,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,4 months,Anatol Shepherd/Labrador Retriever,Buff,,720371
2,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,6 days,Domestic Shorthair Mix,Orange Tabby,,674754
3,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,Black,,689724
4,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,2 months,Domestic Shorthair Mix,White/Orange Tabby,,680969


In [69]:
# Now: How can we convert the Animal ID Num column to integers?

animals['Animal ID Num'] = animals['Animal ID Num'].map(int)

Or we could have just used the `.astype()` method:

In [70]:
animals['Animal ID Num'] = animals['Animal ID Num'].astype(int)

#### Anonymous Functions (Lambda Abstraction)

Simple functions can be defined right in the function call. This is called 'lambda abstraction'; the function thus defined has no name and hence is "anonymous".

In [71]:
animals['Animal ID Num'].map(lambda x: x*2)[:4]

0    1578054
1    1440742
2    1349508
3    1379448
Name: Animal ID Num, dtype: int64

**Exercise: Use an anonymous function to add 'approximately' in front of the entries in Age upon Outcome**

In [72]:
# Your code here!
animals['Age upon Outcome'] = animals['Age upon Outcome'].map(lambda x: 'Approximately ' + str(x))
animals.head()

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,Animal ID Prefix,Animal ID Num
0,A789027,Lennie,02/17/2019 11:44:00 AM,02/17/2019 11:44:00 AM,02/13/2017,Adoption,,Dog,Neutered Male,Approximately 2 years,Chihuahua Shorthair Mix,Cream,,789027
1,A720371,Moose,02/13/2016 05:59:00 PM,02/13/2016 05:59:00 PM,10/08/2015,Adoption,,Dog,Neutered Male,Approximately 4 months,Anatol Shepherd/Labrador Retriever,Buff,,720371
2,A674754,,03/18/2014 11:47:00 AM,03/18/2014 11:47:00 AM,03/12/2014,Transfer,Partner,Cat,Intact Male,Approximately 6 days,Domestic Shorthair Mix,Orange Tabby,,674754
3,A689724,*Donatello,10/18/2014 06:52:00 PM,10/18/2014 06:52:00 PM,08/01/2014,Adoption,,Cat,Neutered Male,Approximately 2 months,Domestic Shorthair Mix,Black,,689724
4,A680969,*Zeus,08/05/2014 04:59:00 PM,08/05/2014 04:59:00 PM,06/03/2014,Adoption,,Cat,Neutered Male,Approximately 2 months,Domestic Shorthair Mix,White/Orange Tabby,,680969


What went wrong? How can we fix it?

In [73]:
#cast string

### 2. Methods for Re-Organizing DataFrames: .groupby()

Those of you familiar with SQL have probably used the GROUP BY command. (And if you haven't, you'll see it very soon!) Pandas has this, too.

The .groupby() method is especially useful for aggregate functions applied to the data grouped in particular ways.

In [74]:
a = animals.groupby('Animal Type')

#### .groups and .get_group()

In [75]:
animals.groupby('Animal Type').groups

{'Bird': Int64Index([   142,    350,    546,   1414,   1474,   1645,   1994,   2017,
               2297,   2358,
             ...
             113276, 113396, 113529, 113859, 114057, 114328, 114611, 114661,
             114759, 114761],
            dtype='int64', length=539),
 'Cat': Int64Index([     2,      3,      4,      5,      6,      7,      8,     11,
                 16,     23,
             ...
             114888, 114891, 114893, 114899, 114904, 114905, 114906, 114913,
             114915, 114917],
            dtype='int64', length=43294),
 'Dog': Int64Index([     0,      1,      9,     12,     13,     14,     15,     17,
                 18,     19,
             ...
             114900, 114901, 114902, 114903, 114908, 114910, 114911, 114912,
             114914, 114916],
            dtype='int64', length=65186),
 'Livestock': Int64Index([   524,   1515,  20531,  24586,  25785,  49127,  49838,  53593,
              60238,  64375,  75397,  79772,  80123,  83588,  90862,  9112

In [76]:
animals.groupby('Animal Type').get_group('Livestock')

Unnamed: 0,Animal ID,Name,DateTime,MonthYear,Date of Birth,Outcome Type,Outcome Subtype,Animal Type,Sex upon Outcome,Age upon Outcome,Breed,Color,Animal ID Prefix,Animal ID Num
524,A795191,Loki,05/18/2019 03:37:00 PM,05/18/2019 03:37:00 PM,11/17/2018,Return to Owner,,Livestock,Intact Male,Approximately 5 months,Pig,White,,795191
1515,A668167,,11/30/2013 12:18:00 PM,11/30/2013 12:18:00 PM,05/28/2013,Return to Owner,,Livestock,Intact Female,Approximately 6 months,Pig Mix,Black/White,,668167
20531,A673651,,03/11/2014 02:39:00 PM,03/11/2014 02:39:00 PM,02/28/2013,Adoption,Foster,Livestock,Neutered Male,Approximately 1 year,Pig Mix,Black/White,,673651
24586,A718910,,01/27/2016 12:00:00 AM,01/27/2016 12:00:00 AM,01/09/2015,Transfer,Partner,Livestock,Intact Male,Approximately 1 year,Pig Mix,White,,718910
25785,A803469,,09/08/2019 08:00:00 AM,09/08/2019 08:00:00 AM,09/01/2017,Return to Owner,,Livestock,Intact Female,Approximately 2 years,Pygmy,Tan/Black,,803469
49127,A811675,,01/08/2020 09:41:00 AM,01/08/2020 09:41:00 AM,01/07/2018,,,Livestock,Intact Female,Approximately nan,Goat,Black/White,,811675
49838,A701250,,05/11/2015 12:00:00 AM,05/11/2015 12:00:00 AM,04/26/2013,Transfer,Partner,Livestock,Intact Female,Approximately 2 years,Pig Mix,Pink,,701250
53593,A674214,,03/29/2014 02:00:00 PM,03/29/2014 02:00:00 PM,02/22/2014,Adoption,Foster,Livestock,Unknown,Approximately 5 weeks,Pig Mix,Black,,674214
60238,A715047,,12/07/2015 12:00:00 AM,12/07/2015 12:00:00 AM,10/30/2014,Transfer,Partner,Livestock,Unknown,Approximately 1 year,Goat Mix,Brown,,715047
64375,A663228,,10/03/2013 10:59:00 AM,10/03/2013 10:59:00 AM,09/15/2008,Transfer,Partner,Livestock,Intact Male,Approximately 5 years,Miniature,Liver/Cream,,663228


#### Aggregating

In [78]:
animals.groupby('Animal Type').std()

Unnamed: 0_level_0,Animal ID Num
Animal Type,Unnamed: 1_level_1
Bird,46031.824323
Cat,49467.991669
Dog,59539.292821
Livestock,52471.760994
Other,42036.699046


#### Datetime Objects

'Datetime' is a special data type for dates. And we can convert an appropriately formatted variable to the datetime type simply by calling `pd.to_datetime()`.

In [87]:
animals['Date of Birth'] = pd.to_datetime(animals['Date of Birth'])

**Exercise: Find the latest date of birth per animal type.**

In [89]:
# First redefine Date of Birth as a series of datetime objects.
# Then group by Animal Type and calculate the max.
animals.groupby('Animal Type')['Date of Birth'].max()



Animal Type
Bird        2019-11-02
Cat         2020-01-09
Dog         2020-01-04
Livestock   2018-11-17
Other       2019-12-21
Name: Date of Birth, dtype: datetime64[ns]

### 3. Reshaping a DataFrame

#### .pivot()

Those of you familiar with Excel have probably used Pivot Tables. Pandas has a similar functionality.

In [90]:
animals.pivot(values='Age upon Outcome', columns='Animal Type').head(10)

Animal Type,Bird,Cat,Dog,Livestock,Other
0,,,Approximately 2 years,,
1,,,Approximately 4 months,,
2,,Approximately 6 days,,,
3,,Approximately 2 months,,,
4,,Approximately 2 months,,,
5,,Approximately 2 years,,,
6,,Approximately 7 years,,,
7,,Approximately 2 days,,,
8,,Approximately 9 months,,,
9,,,Approximately 2 years,,


### 4. Methods for Combining DataFrames: .join(), .merge(), .concat(), .melt()

#### .join()

In [91]:
toy1 = pd.DataFrame([[63, 142], [33, 47]], columns=['age', 'HP'])
toy2 = pd.DataFrame([[63, 100], [33, 200]], columns=['age', 'MP'])

In [92]:
toy1

Unnamed: 0,age,HP
0,63,142
1,33,47


In [93]:
toy2

Unnamed: 0,age,MP
0,63,100
1,33,200


In [100]:
toy1.set_index('age').join(toy2.set_index('age'))

Unnamed: 0_level_0,HP,MP
age,Unnamed: 1_level_1,Unnamed: 2_level_1
63,142,100
33,47,200


For more on this method, check out the [doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html)!

#### .merge()

In [109]:
ds_chars = pd.read_csv('../pandas2_seattle-ds/ds_chars.csv', index_col=0)
ds_chars

Unnamed: 0,name,HP,home_state
0,greg,200,WA
1,miles,200,WA
2,alan,170,TX
3,alison,300,DC
4,rachel,200,TX


In [110]:
states = pd.read_csv('/Users/jarodc33/dsc_projects_by_week/week_1/pandas2_seattle-ds/states.csv', index_col=0)
states

Unnamed: 0,state,nickname,capital
0,WA,evergreen,Olympia
1,TX,alamo,Austin
2,DC,district,Washington
3,OH,buckeye,Columbus
4,OR,beaver,Salem


In [111]:
ds_chars.merge(states, left_on='home_state', right_on='state', how='inner')

Unnamed: 0,name,HP,home_state,state,nickname,capital
0,greg,200,WA,WA,evergreen,Olympia
1,miles,200,WA,WA,evergreen,Olympia
2,alan,170,TX,TX,alamo,Austin
3,rachel,200,TX,TX,alamo,Austin
4,alison,300,DC,DC,district,Washington


#### pd.concat()

This method takes a *list* of pandas objects as arguments.

N.B. The cell below will likely produce a **Deprecation Warning**.

In [112]:
ds_full = pd.concat([ds_chars, states])
ds_full

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


Unnamed: 0,HP,capital,home_state,name,nickname,state
0,200.0,,WA,greg,,
1,200.0,,WA,miles,,
2,170.0,,TX,alan,,
3,300.0,,DC,alison,,
4,200.0,,TX,rachel,,
0,,Olympia,,,evergreen,WA
1,,Austin,,,alamo,TX
2,,Washington,,,district,DC
3,,Columbus,,,buckeye,OH
4,,Salem,,,beaver,OR


`pd.concat()`––and many other pandas operations––make use of an `axis` parameter. For this particular method I need to specify whether I want to concatenate the DataFrames *row-wise* (`axis=0`) or *column-wise* (`axis=1`). The default is `axis=0`, so let's override that!

In [114]:
ds_full = pd.concat([ds_chars, states], axis = 1)
ds_full

Unnamed: 0,name,HP,home_state,state,nickname,capital
0,greg,200,WA,WA,evergreen,Olympia
1,miles,200,WA,TX,alamo,Austin
2,alan,170,TX,DC,district,Washington
3,alison,300,DC,OH,buckeye,Columbus
4,rachel,200,TX,OR,beaver,Salem


#### pd.melt()

Melting removes the structure from your DataFrame and puts the data in a 'variable' and 'value' format.

In [115]:
pd.melt(ds_full)

Unnamed: 0,variable,value
0,name,greg
1,name,miles
2,name,alan
3,name,alison
4,name,rachel
5,HP,200
6,HP,200
7,HP,170
8,HP,300
9,HP,200


### 5. Making Use of Categories: One-Hot Encoding

Pandas has a one-hot encoder called `get_dummies()`, which is good for exploratory data analysis (EDA).

This might be good to use if we're in the **data-understanding** stage (Stage 2) of our CRISP-DM process.

We can call it on a DataFrame as a whole or on a Series (column).

In [116]:
pd.get_dummies(animals['Animal Type'])

Unnamed: 0,Bird,Cat,Dog,Livestock,Other
0,0,0,1,0,0
1,0,0,1,0,0
2,0,1,0,0,0
3,0,1,0,0,0
4,0,1,0,0,0
...,...,...,...,...,...
114913,0,1,0,0,0
114914,0,0,1,0,0
114915,0,1,0,0,0
114916,0,0,1,0,0


If however we're in a later stage of the process and we're interested, say, in preparing a data pipeline, `pandas.get_dummies()` will prove inferior to other tools.

In practice, we will **not** use `pandas.get_dummies()`. The library Scikit-Learn (`sklearn`, included with your Anaconda installation) has a `OneHotEncoder` class that creates an object that persists. This makes it much more apt for production environments, and so it's good to get in the habit of using it.

Ultimately, we will use **many** tools from sklearn.

In [117]:
from sklearn.preprocessing import OneHotEncoder

In [118]:
ohe = OneHotEncoder()

In [119]:
ohe.fit(animals[['Animal Type']])

OneHotEncoder(categorical_features=None, categories=None, drop=None,
              dtype=<class 'numpy.float64'>, handle_unknown='error',
              n_values=None, sparse=True)

Now that the `OneHotEncoder` has been fitted to our data, it has newly available attributes and methods. In particular, it has access to the different categories that we're replacing:

In [120]:
ohe.get_feature_names()

array(['x0_Bird', 'x0_Cat', 'x0_Dog', 'x0_Livestock', 'x0_Other'],
      dtype=object)

We'll have much more to say about `sklearn` syntax and about Python's object structure. But let's now transform our data to see what the new table looks like:

In [121]:
ohe.transform(animals[['Animal Type']])

<114918x5 sparse matrix of type '<class 'numpy.float64'>'
	with 114918 stored elements in Compressed Sparse Row format>

For the sake of saving storage space, the return is a **sparse matrix**, but we can "re-inflate it if we want to see it in tabular form:

In [122]:
types_encoded = ohe.transform(animals[['Animal Type']]).todense()
types_encoded

matrix([[0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0.],
        ...,
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0.]])

Let's put it into a DataFrame:

In [124]:
x = pd.DataFrame(types_encoded, columns=ohe.get_feature_names()).head()