In [None]:
import pandas as pd
import numpy as np

# Discussion 3

### Announcements:
- Project 1 due Saturday 1/27
- Lab 03 due Monday 1/29
- Saturday office hours (CSE 2217)
    - Dylan 12:00 - 2:30
    - Jasmine 4:30 - 6:00
- Lab solutions going up on Ed
    - Link also on course website

## SettingWithCopyWarning 

A warning you're likely to run into at some point is the SettingWithCopyWarning. It likely doesn't affect your behavior, but it is good practice to run code that won't throw warnings.

Run the below code to see an example of how it happens, and how to prevent it.

In [None]:
warn_df = pd.DataFrame({"Movie Title": ["Spider-Man: Across the Spider-Verse",
                                        "Scott Pilgrim vs. the World",
                                        "Monty Python and the Holy Grail",
                                        "Joker",
                                        "Fight Club"],
                        "Release Year": [2023,2010,1975,2019,1999],
                        "Rating": ["PG","PG-13","PG","R","R"],
                        "Pretty Visuals": [True,True,False,True,False],
                        "Funny": [False,True,True,False,False]})
warn_df

In [None]:
# Mask for 'Rating' to 'PG'.
is_pg = warn_df["Rating"] == "PG"

# Apply filter to DataFrame.
warn_df_pg = warn_df[is_pg]
warn_df_pg

In [None]:
# Add a new column on if I would show the movie to a kid.
warn_df_pg["Would Show to a Kid"] = [True, False]
warn_df_pg

### Oh no!

The above code threw a warning even though `warn_df_pg` looks correct, what happend?

Getting a series with brackets is called slicing. When we call `warn_df[is_pg]`, we slice the dataframe to show us a **view**, or subset, of the original DataFrame that contains PG movies. A view is not a new DataFrame, but rather you can imagine we just covered up the non-PG rows (hence why earlier I called `is_pg` a mask).

If you then try to change the contents of your view, Pandas has a decision: did you want to make a new DataFrame from the view? Or did you want to just edit the values of the original dataframe that are visible?

Pandas decides to assume you want to make a copy of the original, which is probably true in most cases. However, in case it isn't what you intended, Pandas will throw the SettingWithCopyWarning to let you know it made this assumption!

To avoid this warning, just explicitly call `.copy()` or `.loc[]` to specify whether you want to make a copy or change the original DataFrame, and now Pandas doesn't need to assume anything.

In [None]:
# Solution A: Explicitly set on a copy using .copy().
is_pg = warn_df["Rating"] == "PG"
copy_df = warn_df[is_pg].copy()
copy_df["Would Show to a Kid"] = [True, False]
copy_df

In [None]:
# Solution B: Explicitly set on the original using .loc[]
# Note that this edits the original warn_df, not a copy!
is_pg = warn_df["Rating"] == "PG"
warn_df.loc[is_pg, "Would Show to a Kid"] = [True, False]
warn_df

## Working With `groupby() `
<br/>
<div>
<img src="https://i.imgflip.com/8ddsrh.jpg"/ width="300">
</div>
<br/>

When you group an object, there are a lot of options as to how to work with it. Most simple would be built-in functions such as `count()`, `sum()`, and `mean()`, but we can also use `transform()`, `apply()`, or `agg()` to perform custom operations.

In [None]:
df = pd.DataFrame({"animal": ["Manta Ray",
                              "Quokka",
                              "Rain Frog",
                              "Binturong",
                              "Sailfish",
                              "Sturgeon",
                              "Rhino",
                              "Platypus"],
                   "who": ["water_thing", "cute", "cute", "weird", "water_thing", "water_thing", "weird", "weird"],
                   "weight (lbs)": [6600, 6, 0.025, 60, 120, 800, 1600, 3],
                   "lifespan": [30, 10, 5, 18, 5, 100, 50, 15]
                  }).set_index("animal")
df

In [None]:
def diffs(x):
    print("\tSingle Iteration Input: ")
    print(x)
    print("-"*40)
    return x.max() - x.min()

In [None]:
df.groupby("who").mean()

### .transform()

Use when you want an aggregate calculation in a dataframe that matches the original dataframe's dimensions

In [None]:
df.groupby("who").transform('mean')

In [None]:
df.groupby("who").transform(diffs)

### apply()

Row operations

*Note that it has different behavior and parameters for DataFrames*

In [None]:
# selecting the columns is just to avoid a warning, 
# it has the same output if you don't select the columns explicitly.
df.groupby("who")[["weight (lbs)", "lifespan"]].apply(np.mean)

In [None]:
df.groupby("who").apply(diffs)

### .agg()

Use when you need to do different operations on an aggregation.

In [None]:
df.groupby("who").agg("mean")

In [None]:
df.groupby("who").agg(diffs)

### Some special uses of .agg() and .apply()

In [None]:
df.groupby("who").agg(["mean", diffs])
# df.groupby("who").transform(["mean", diffs]) # Error!
# df.groupby("who").apply(["mean", diffs]) # Error!

In [None]:
def diff_cols(x):
    return x["weight (lbs)"].mean() - x["lifespan"].mean()

df.groupby("who").apply(diff_cols)
# df.groupby("who").transform(diff_cols) # Error!
# df.groupby("who").agg(diff_cols) # Error!

## Bad Boolean Zen

Something small that I see in a number of students' code...

If an operation evaluates to `True` or `False`, you do not then have to check if the output is `True` to return `True`, or `False` to return `False`. Instead, you can generally just return the operation output directly.

As you can see below, we define two functions that return True if a value is less than 10, and False otherwise. `is_small_bad()` has an example of a bad boolean zen implementation, while `is_small_good()` corrects the implementation.

As a general caution, double check your work if you directly `return True` or `return False`. This is not a guarantee that your function has bad boolean zen, but it can be a sign of it.

In [None]:
# Bad boolean zen
def is_small_bad(n):
    if (n < 10) == True:
        return True
    else:
        return False
    
# Good boolean zen
def is_small_good(n):
    return n < 10

print(is_small_bad(5))
print(is_small_good(5))