# Lesson 4: Manipulating DataFrames with pandas

These notes are based on the sequence that concepts are introduced in the DataCamp lesson of the same name. They are most useful as a reference, after you've completed the DataCamp lesson. They are not intended to replace that lesson.

[View this lesson on DataCamp](https://learn.datacamp.com/courses/manipulating-dataframes-with-pandas)

## Chapter 1

### Indexing & Slicing DataFrames 

| Concept/Command |  Operation                                                   |
|----------|--------------------------------------------------------------|
| `df['column']['row']`      |  slice specified DataFrame (df) section--note the square brackets around each    |
| `.loc`      |  · uses labels ['row,', 'column'] | 
|  | · Can use labels for slicing - right end point is inclusive |
|  | · Can also use lists instead of slices |
|  | · ['b' : 'a' :-1] will slice from a to b in reverse order     |
| `.iloc `      |  · uses numeric indexes | 
|  | · [1:5] first number is inclusive, second number is exclusive 

### Practice:

Import `numpy` as `np`, and `pandas` as `pd`:

In [3]:
import numpy as np
import pandas as pd

The DataFrame below contains the mean number of hugs, kisses, and barks, per week over the last two months, from five puppies:

In [4]:
# Run this cell as is to define the DataFrame "puppies" 

puppies = pd.DataFrame({'hugs': [5, 8, 2, 10, 8],
                      'kisses': [7, 9, 4, 13, 8],
                      'barks': [21, 15, 33, 5, 32]},
                       index=['Rascal', 'Coco', 'Benji', 'Zinnia', 'Greta'])

Print the DataFrame `puppies`:

In [5]:
print(puppies)

        hugs  kisses  barks
Rascal     5       7     21
Coco       8       9     15
Benji      2       4     33
Zinnia    10      13      5
Greta      8       8     32


Just for fun, output the DataFrame `puppies` without `print` (i.e., just type the DataFrame's name) to view it as a formatted table:

Beauty. Print its dimensions using the method `.shape`

Thought question: In the output immediately above, which number represents rows and which represent columns?

In the next few cells, answer each respective question by slicing and printing portions of the `puppies` DataFrame:

In [4]:
# How many kisses did Zinnia give?

print(puppies['kisses']['Zinnia'])

13


In [5]:
# How many kisses did Greta give?



In [6]:
# How many hugs did Rascal give?



In [7]:
# How many hugs did Benji give?



In [8]:
# How many times did Coco bark?



In [9]:
# How many times did Zinnia bark?



In the next few cells, practice using `.loc` to answer each respective question:

In [10]:
# How many hugs did Benji give?

print(puppies.loc['Benji', 'hugs'])

2


In [11]:
# How many hugs did Rascal give?



In [12]:
# How many kisses did Coco give?



In [13]:
# How many times did Zinnia bark?



In [5]:
# How many hugs and kisses did Zinnia give?

print(puppies.loc['Zinnia' , ['hugs', 'kisses']])

hugs      10
kisses    13
Name: Zinnia, dtype: int64


In [15]:
# How many kisses did Greta give, and how many times did she bark?



In [16]:
# How many hugs did Rascal and Coco each give?



In [17]:
# How many times did Rascal, Benji, and Greta bark?



In [18]:
# How many hugs did Coco, Zinnia, and Greta each give, and how many times did each of them bark?



In the next few cells, practice using `.iloc` to answer each respective question (note: remember python's zero-indexing):

In [6]:
# How many hugs did Benji give?

print(puppies.iloc[2 , 0])

2


In [20]:
# How many kisses did Coco give?



In [21]:
# How many times did Rascal bark?



In [22]:
# How many kisses did Zinnia and Greta each give, and how many times did they each bark?



In [23]:
# How many hugs did Rascal give, kisses did Benji give, and times did Benji bark?



# 

### Filtering DataFrames
|           |                                       |
|-----------|---------------------------------------|
| `Boolean Filters` |   Can be combined using standard logical operators | 
|  | · `&` (*and*)  |
|  | · <code>&#124;</code> (*or*) |
| `.all()`  |  selects columns that contain all non zero values |
| `.any()`  |  selects columns that have any non zero values |
| `.isnull()`  |  identifies NaN values (chained with .all() or .any()) |
| `.notnull()`  |  identifies non NaN values (chained with .all() or .any()) |
| `.dropna(how='any')`  |  drops rows with any NaNs |
| `.dropna(how='all')`  |  drops rows with all NaNs |

### Practice:

Run the following cell to broadcast a new column to the DataFrame `puppies`:

In [6]:
puppies['tricks'] = [1, 6, 0, 9, 0]

Print the updated `puppies` DataFrame:

View the DataFrame in all its formatted glory by typing its name and running the cell without `print`:

In the next few cells, use Boolean filters to answer the following questions:

In [7]:
# Which puppies gave more than 6 hugs?

print(puppies[puppies['hugs'] > 6])

        hugs  kisses  barks  tricks
Coco       8       9     15       6
Zinnia    10      13      5       9
Greta      8       8     32       0


In [26]:
# Which puppies gave fewer than 6 kisses?



In [27]:
# Which puppies barked more than 10 times, and fewer than 20 times?



In [28]:
# Which puppies did any tricks?



Use `.all()` to see which columns have all non-zero values (i.e., contain no zeros):

Use `.any()` to see which columns have any non-zero values:

Though question: Look at the Boolean value for 'tricks' in the two outputs above. Why are they different?

## Chapter 2

|           |                                       |
|-----------|---------------------------------------|
| Vectorized functions |   Built in functions, loop over data more efficiently     |
| | `.map()` performs transformations element wise according to a python dictionary |
| | `.apply` used on a pandas df to apply a python function to every element |

|           |                                       |
|-----------|---------------------------------------|
| Indexes |   · Immutable - if you want to change the index, have to change the whole thing     |
| | · Homogeneous in data type |
| | `.index()` can index and slice from the index just like any other python list |
| | `.name` can assign a name label to an index |
| | · Can assign the index by assinging a column to `df.index` |
| Hierarchical Indexes | · Multi-level index |
| | · create using `.set_index()` with an ordered list of column labels as input |
| | `.sort_index()` use to sort the index | 
| | `slice(None)` use for the outer levels when trying to slice only innermost dimensions of the hierarchical index

### 

### Show & Tell (...or, Tell & Show, I guess):

We're going to use `.map` with a dictionary. First, run the next cell to create a clunky new dictionary (we'll explore better ways later):

In [8]:
dict = {0:'No', 1:'Yes', 2:'Yes', 3:'Yes', 4:'Yes', 5:'Yes', 6:'Yes', 7:'Yes', 8:'Yes', 9:'Yes'}

Broadcast a new column and `.map` the dictionary values:

In [9]:
puppies['trickster'] = puppies['tricks'].map(dict)

Output the `puppies` DataFrame (without `print`) to see your new column:

In [10]:
puppies

Unnamed: 0,hugs,kisses,barks,tricks,trickster
Rascal,5,7,21,1,Yes
Coco,8,9,15,6,Yes
Benji,2,4,33,0,No
Zinnia,10,13,5,9,Yes
Greta,8,8,32,0,No


Which column is the index column? Let's print it using `.index` (note, no `()` this time):

In [11]:
print(puppies.index)

Index(['Rascal', 'Coco', 'Benji', 'Zinnia', 'Greta'], dtype='object')


To ensure that this DataFrame is easily understood by anyone who views it, let's assign a name to our index column by chaining `.index` and `.name`. The column contains the puppies' names, so let's name it 'name':

In [12]:
puppies.index.name = 'name'

Output the DataFrame again and see the difference. Note the position of the column headers now:

Print the column names using `.columns`. Note that the output includes the column names of all the data columns, and not the index column entitled 'name':

OK, let's say that what we really want to do is to sort the puppies into two key categories--those who do tricks, and those who don't--while maintaining all of the other information. The 'trickster' column is handy, but it would be nice to reorganize the data in a more meaningful format. Here's where we benefit from a hierarchical index. <br><br>
Let's use `.reset_index()` to convert the index column ('name') into a data column just like the rest. For reasons we'll discuss later, let's assign the resulting DataFrame to a new name `pups_hier`:

In [13]:
pups_hier = puppies.reset_index()

Output the DataFrame `pups_hier`:

Use `.index` again to print the new index values:

Use `.columns` again to print the updated column names:

Cool. Time to get all hierarchical. We'll use `.set_index()`, and pass it an ordered list of column names. The first column name in the list will be the outermost index column, and the second will be the second. We ultimately want all the names sorted by trickery. We'll include the `inplace=True` argument to update the current DataFrame (versus outputting a new one that we'd have to reassign to another variable name):

In [14]:
pups_hier.set_index(['trickster', 'name'], inplace=True)

Output `pups_hier` again to see the change:

We now have the hierarchical index we want, but it's cluttered. Let's tidy it up with `.sort_index()`, including the argument `inplace=True`:

In [15]:
pups_hier.sort_index(inplace=True)

Output `pups_hier` to see the change:

This is good, but because we're mainly interested in the tricksters, it would be more intuitive to display them above the non-tricksters. Let's sort the index again, this time with the argument `ascending=False` and again with `inplace=True` to lock it in:

In [17]:
pups_hier.sort_index(ascending=False, inplace=True)

Output `pups_hier` one last time:

Niiiice. Now that's an informative DataFrame. Now let's slice out some rows:

Let's use `.loc` to produce the following outputs: 

In [19]:
# All the tricksters
# Note the format: DataFrame.loc['rows', 'columns']...it just so happens that we want all the columns, 
# so we can leave columns unspecified. Alternatively, though less efficiently, we could write `pups_hier.loc['Yes', :]`

pups_hier.loc['Yes']

Unnamed: 0_level_0,hugs,kisses,barks,tricks
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Zinnia,10,13,5,9
Rascal,5,7,21,1
Coco,8,9,15,6


In [20]:
# Both non-tricksters

pups_hier.loc['No']

Unnamed: 0_level_0,hugs,kisses,barks,tricks
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Greta,8,8,32,0
Benji,2,4,33,0


In [21]:
# Just Zinnia
# Note the format: DataFrame.loc['outer_index', 'inner_index']...and again, columns unspecified. 
# If we were to specify columns this time, we'd need to group the index levels together with parentheses 
# like this: DataFrame.loc[('Yes', 'Zinnia'), :]

pups_hier.loc['Yes', 'Zinnia']

hugs      10
kisses    13
barks      5
tricks     9
Name: (Yes, Zinnia), dtype: int64

In [22]:
# Just Benji

pups_hier.loc['No', 'Benji']

hugs       2
kisses     4
barks     33
tricks     0
Name: (No, Benji), dtype: int64

Using the cells above as reference, complete the next three on your own:

In [None]:
# Just Coco



In [None]:
# Just Rascal



In [None]:
# Just Greta



Cool. Now let's select some specific values from single rows:

In [23]:
# Zinnia kisses
# Note the format: DataFrame.loc[('outer_index', 'inner_index'), 'column']

pups_hier.loc[('Yes', 'Zinnia'), 'kisses']

13

In [24]:
# Greta hugs

pups_hier.loc[('No', 'Greta'), 'hugs']

8

Think you've got it? Go!

In [None]:
# Rascal tricks



In [132]:
# Coco hugs



In [None]:
# Zinnia tricks



Let's get multiple, but not all, values from single rows. We'll do the first one together, then you go:

In [25]:
# Zinnia hugs and kisses
# Note the same overall format as before: DataFrame.loc[rows, columns]
# ...just with parentheses to group row info and column info.
# Note the specific format: DataFrame.loc[('outer_index', 'inner_index'), ('column', 'other_column')]

pups_hier.loc[('Yes', 'Zinnia'), ('hugs', 'kisses')]

hugs      10
kisses    13
Name: (Yes, Zinnia), dtype: int64

In [None]:
# Rascal hugs and tricks



In [None]:
# Coco kisses and tricks



In [None]:
# Benji hugs and kisses



Cool. Now let's do two whole rows at a time from the same outer index:

In [26]:
# Zinnia and Rascal
# Note the format: DataFrame.loc[('outer_index', ['inner_index', 'other_inner_index']), 'column']
# Here, we have to specify the columns to ensure the commas will be correctly interpreted.

pups_hier.loc[('Yes', ['Zinnia','Rascal']), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses,barks,tricks
trickster,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Yes,Zinnia,10,13,5,9
Yes,Rascal,5,7,21,1


Follow this convention and do some on your own:

In [None]:
# Zinnia and Coco



In [None]:
# Greta and Benji



Now let's yoink full rows from different outer indexes:

In [27]:
# Zinnia and Benji

pups_hier.loc[[('Yes', 'Zinnia'), ('No','Benji')]]

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses,barks,tricks
trickster,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Yes,Zinnia,10,13,5,9
No,Benji,2,4,33,0


In [60]:
# Rascal and Greta



In [63]:
# Coco and Benji



In [64]:
# Challenge:
# Coco, Greta, and Benji



Don't worry about slicing specific column data from specific rows within different outer indexes at this point. You've sliced plenty for now. <br> Just one last nugget of information before moving on--the `slice(None)` argument. Read and run the following two cells: 

In [28]:
# This works:
# Zinnia and Benji

pups_hier.loc[[('Yes', 'Zinnia'), ('No','Benji')]]

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses,barks,tricks
trickster,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Yes,Zinnia,10,13,5,9
No,Benji,2,4,33,0


In [29]:
# And this does the same thing, using `slice(None)` to bypass the outer index, 
# and then passing an ordered list of row index names.

pups_hier.loc[(slice(None), ['Zinnia', 'Benji']), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses,barks,tricks
trickster,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Yes,Zinnia,10,13,5,9
No,Benji,2,4,33,0


### 

## Chapter 3
Reshaping DataFrames

|           |                                       |
|-----------|---------------------------------------|
| `.pivot()` | allows for reshaping of the data |
| | `index=` specify which column to use a index |
| | `columns=` specify which to use as columns |
| | `values=` specify which variable will populate the values in the cells |
| `.unstack()` | moves the second level of the index to the columns |
| | `level=` can be str or an integer |
| `.stack()` | moves a level of the hierarchical columns to the index | 
| | `level=` (same as above)
| `.swaplevel()` | used to switch index levels, input index numbers |
| `pd.melt` | restores a pivoted dataframe to its original form |
| | `id_vars` - specify columns to remain in reshaped df |
| | `value_vars=` specify which columns to turn into values |
| | `value_name=` assign name to value column |
| `.pivot_table` | allows you to see your variables as a function of 2 other variables |
| | Same arguments as with `.pivot()` |
| | `aggfunc=` argument to specify aggregation functions |
| | `margins=True` adds totals into the margins |

### 

### Practice:

Let's zoom in on our tricks data: <br>
First, we'll reset the indexes with `.reset_index` and assign a new variable name so we don't overwrite our original. We're going to come back to it in a bit.

In [30]:
pups_tricks = pups_hier.reset_index()

Output `pups_tricks` to check it:

In [31]:
pups_tricks

Unnamed: 0,trickster,name,hugs,kisses,barks,tricks
0,Yes,Zinnia,10,13,5,9
1,Yes,Rascal,5,7,21,1
2,Yes,Coco,8,9,15,6
3,No,Greta,8,8,32,0
4,No,Benji,2,4,33,0


Second, we'll use `.pivot()` with appropriate arguments to isolate our data of interest and set an index at the same time:

In [33]:
pups_tricks.pivot(index='trickster', columns='name', values='tricks')

name,Benji,Coco,Greta,Rascal,Zinnia
trickster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
No,0.0,,0.0,,
Yes,,6.0,,1.0,9.0


Thought question: 'Sup with the NaNs? Why do you think they're there, and how do they differ from the `0.0` values? <br>
Hint: Refer to the respective values in the `tricks` column prior to the pivot.

Let's return to the full `pups_hier` DataFrame. We're going to `.unstack()` the two index levels one at a time to see the differences. <br>
Output it, as is, to view the "before", then run the next two cells to view the "afters":

In [43]:
# As is:

pups_hier

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses,barks,tricks
trickster,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Yes,Zinnia,10,13,5,9
Yes,Rascal,5,7,21,1
Yes,Coco,8,9,15,6
No,Greta,8,8,32,0
No,Benji,2,4,33,0


In [56]:
# Unstack `pups_hier` with the argument `level=0`, and assign it to a variable named `pups_hier_0`.
# Output `pups_hier_0`:

pups_hier_0 = pups_hier.unstack(level=0)
pups_hier_0

Unnamed: 0_level_0,hugs,hugs,kisses,kisses,barks,barks,tricks,tricks
trickster,No,Yes,No,Yes,No,Yes,No,Yes
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Benji,2.0,,4.0,,33.0,,0.0,
Coco,,8.0,,9.0,,15.0,,6.0
Greta,8.0,,8.0,,32.0,,0.0,
Rascal,,5.0,,7.0,,21.0,,1.0
Zinnia,,10.0,,13.0,,5.0,,9.0


Thought question: Note what happened to the 'trickster' index column above. Where did it go?

In [57]:
# Unstack `pups_hier` with the argument `level=1`, and assign it to a variable named `pups_hier_1`.
# Output `pups_hier_1`:

pups_hier_1 = pups_hier.unstack(level=1)
pups_hier_1

Unnamed: 0_level_0,hugs,hugs,hugs,hugs,hugs,kisses,kisses,kisses,kisses,kisses,barks,barks,barks,barks,barks,tricks,tricks,tricks,tricks,tricks
name,Benji,Coco,Greta,Rascal,Zinnia,Benji,Coco,Greta,Rascal,Zinnia,Benji,Coco,Greta,Rascal,Zinnia,Benji,Coco,Greta,Rascal,Zinnia
trickster,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
No,2.0,,8.0,,,4.0,,8.0,,,33.0,,32.0,,,0.0,,0.0,,
Yes,,8.0,,5.0,10.0,,9.0,,7.0,13.0,,15.0,,21.0,5.0,,6.0,,1.0,9.0


Thought question: Note what happened to the 'name' index column above. Where did it go?

To practice the opposite, try to put `pups_hier_0`'s and `pups_hier_1`'s unstacked column back into its index using `.stack()`. Experiment with the `level=0` and `level=1` arguments (same format as used with `.unstack()` above):

In [81]:
# Stack pups_hier_0's unstacked column back into the index, level 0:



In [83]:
# Stack pups_hier_0's unstacked column back into the index, level 1:



In [88]:
# Stack pups_hier_1's unstacked column back into the index, level 0:



In [87]:
# Stack pups_hier_1's unstacked column back into the index, level 1:



Which output from the four cells above came the closest to the original `pups_hier` DataFrame?

Thought question: How does it still differ, and what additional method and argument would fix it? <br>
Hint: We've used it in a previous question above.

One output above came close, but wasn't quite right. Run the cell below and see if you can figure out what each of the chained methods and their arguments (particularly `.swaplevel(0,1)` are contributing to the output:

In [92]:
pups_hier_0.stack(level=1).swaplevel(0,1).sort_index(ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses,barks,tricks
trickster,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Yes,Zinnia,10.0,13.0,5.0,9.0
Yes,Rascal,5.0,7.0,21.0,1.0
Yes,Coco,8.0,9.0,15.0,6.0
No,Greta,8.0,8.0,32.0,0.0
No,Benji,2.0,4.0,33.0,0.0


## Chapter 4

Grouping data

|           |                                       |
|-----------|---------------------------------------|
| `.groupby()` | · Input is column name |
| | · Can then use [] to select the columms on which to perform the aggregation |
| | · Usually chained with an aggregation/reduction method |
| `.agg` | can do several aggregations by passing them as input in the form of a list | 
| `.transform()` | used with .groupby to apply a function to groups of data independently |
| `.apply()` | when used with .groupby, performs a function on each of the groups, then recombines the data back into a series or df |
| `.filter()` | remove whole groups of rows from a df based on a boolean condition |

#   

## Show & Tell

Let's tinker with the commands above by first creating a copy of our `puppies` DataFrame, then broadcasting an additional column called `size`:

In [135]:
puppies_more = puppies
puppies_more['size'] = ['big', 'medium', 'small', 'small', 'small']

Output our new `puppies_more` DataFrame to view the change: 

In [136]:
puppies_more

Unnamed: 0_level_0,hugs,kisses,barks,tricks,trickster,size
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Rascal,5,7,21,1,Yes,big
Coco,8,9,15,6,Yes,medium
Benji,2,4,33,0,No,small
Zinnia,10,13,5,9,Yes,small
Greta,8,8,32,0,No,small


Now let's explore some `.groupby()` action. Read and run the following cells for a variety of applications:

In [152]:
# Let's group the dogs by size and add up how many hugs, kisses, barks, and tricks came from each dog size collectively:

puppies_more.groupby('size').sum()

Unnamed: 0_level_0,hugs,kisses,barks,tricks
size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
big,5,7,21,1
medium,8,9,15,6
small,20,25,70,9


In [153]:
# Let's group the dogs by trickery (whether they do tricks or not) and add up how many hugs, kisses, barks, 
# and tricks came from each trickery type collectively:

puppies_more.groupby('trickster').sum()

Unnamed: 0_level_0,hugs,kisses,barks,tricks
trickster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
No,10,12,65,0
Yes,23,29,41,16


In [176]:
# Let's group the dogs by size again and isolate the sum of barks (only) that came from each size group:

puppies_more.groupby('size')['barks'].sum()

size
big       21
medium    15
small     70
Name: barks, dtype: int64

In [166]:
# Let's group the dogs by size yet again and isolate the sum of barks and tricks that came from each size group 
# (note the additional square brackets):

puppies_more.groupby('size')[['barks', 'tricks']].sum()

Unnamed: 0_level_0,barks,tricks
size,Unnamed: 1_level_1,Unnamed: 2_level_1
big,21,1
medium,15,6
small,70,9


In [169]:
# Let's get fancy and group the dogs by both size and trickery, and output the number of 
# hugs and kisses that came from each trickery level of each size group in a multi-index DataFrame:

puppies_more.groupby(['size', 'trickster'])[['hugs', 'kisses']].sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses
size,trickster,Unnamed: 2_level_1,Unnamed: 3_level_1
big,Yes,5,7
medium,Yes,8,9
small,No,10,12
small,Yes,10,13


Thought question: Why is there no 'No' inner indexes for 'big' or 'medium' size dogs? <br>
Hint: Refer back to the full DataFrame.

In [175]:
# Let's get extra fancy and group the dogs by trickery and size (note the order of those two) and output the maximum 
# number of hugs, kisses, barks, and tricks that came collectively from each group:

puppies_more.groupby(['trickster', 'size'])[['hugs', 'kisses', 'barks', 'tricks']].max()

Unnamed: 0_level_0,Unnamed: 1_level_0,hugs,kisses,barks,tricks
trickster,size,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
No,small,8,8,33,0
Yes,big,5,7,21,1
Yes,medium,8,9,15,6
Yes,small,10,13,5,9


Thought question: Why is there no 'big' or 'medium' inner indexes for non-tricksters? <br>
Hint: Refer back to the full DataFrame.

Let's continue the fanciness with multiple aggregations:

In [182]:
# Let's group the dogs by size, isolate hugs and kisses, and use the `.agg()` method to output 
# the min() and max() that came from each size group:

puppies_more.groupby('size')[['hugs', 'kisses']].agg(['max', 'min'])

Unnamed: 0_level_0,hugs,hugs,kisses,kisses
Unnamed: 0_level_1,max,min,max,min
size,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
big,5,5,7,7
medium,8,8,9,9
small,10,2,13,4


Now you:

In [178]:
# Group the dogs by size, isolate barks and tricks, and use the `.agg()` method to output 
# the min() and max() that came from each size group:



In [179]:
# Group the dogs by trickery, isolate hugs and tricks, and use the `.agg()` method to output 
# the max() and sum() that came from each size group:



In [184]:
# Challenge:
# Group the dogs by trickery, isolate kisses and barks, and use the `.agg()` method to output 
# the mean() and std() (i.e., standard deviation) that came from each size group:

