<a id='section0'></a>
### Table of Contents

1. [Filter A DataFrame Based On A Condition](#section1)
2. [Filter with More than One Condition (AND)](#section2)
3. [Filter with More than One Condition (OR)](#section3)
4. [The .isin() Method](#section4)
5. [The .isnull() and .notnull() Methods](#section5)
6. [The .between() Method](#section6)
7. [The .duplicated() Method](#section7)
8. [The .drop_duplicates() Method](#section8)
9. [The .unique() and .nunique() Methods](#section9)


In [2]:
import pandas as pd

In [3]:
df = pd.read_csv("../employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-10-03 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-10-03 06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,2021-10-03 11:17:00,130590,11.858,False,Finance


## Filter A `DataFrame` Based On A Condition
<a id='section1'></a>
[Index](#section0)

In [None]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

## Filter with More than One Condition (AND)
<a id='section2'></a>
[Index](#section0)

In [None]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

In [None]:
mask1 = df["Gender"] == "Male"
mask2 = df["Team"] == "Marketing"

df[mask1 & mask2]

## Filter with More than One Condition (OR)
<a id='section3'></a>
[Index](#section0)

In [None]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

In [None]:
mask1 = df["Senior Management"]
mask2 = df["Start Date"] < "1990-01-01"

df[mask1 | mask2]

In [None]:
mask1 = df["First Name"] == "Robert"
mask2 = df["Team"] == "Client Services"
mask3 = df["Start Date"] > "2016-06-01"

df[(mask1 & mask2) | mask3]

## The `.isin()` Method
<a id='section4'></a>
[Index](#section0)

In [4]:
df = pd.read_csv("../employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-10-04 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-10-04 06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,2021-10-04 11:17:00,130590,11.858,False,Finance


In [None]:
mask1 = df["Team"] == "Legal"
mask2 = df["Team"] == "Sales"
mask3 = df["Team"] == "Product"

df[mask1 | mask2 | mask3]

In [7]:
#####################################################
df["Team"].isin(["Legal", "Sales", "Product"])
####################################################

0      False
1      False
2      False
3      False
4      False
       ...  
995    False
996    False
997     True
998    False
999     True
Name: Team, Length: 1000, dtype: bool

In [8]:
mask1 = df["Team"].isin(["Legal", "Sales", "Product"])
df[mask1]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
5,Dennis,Male,1987-04-18,2021-10-04 01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,2021-10-04 16:20:00,65476,10.012,True,Product
11,Julie,Female,1997-10-26,2021-10-04 15:19:00,102508,12.637,True,Legal
13,Gary,Male,2008-01-27,2021-10-04 23:40:00,109831,5.831,False,Sales
15,Lillian,Female,2016-06-05,2021-10-04 06:09:00,59414,1.256,False,Product
...,...,...,...,...,...,...,...,...
981,James,Male,1993-01-15,2021-10-04 17:19:00,148985,19.280,False,Legal
985,Stephen,,1983-07-10,2021-10-04 20:10:00,85668,1.909,False,Legal
989,Justin,,1991-02-10,2021-10-04 16:58:00,38344,3.794,False,Legal
997,Russell,Male,2013-05-20,2021-10-04 12:39:00,96914,1.421,False,Product


## The `.isnull()` and `.notnull()` Methods
<a id='section5'></a>
[Index](#section0)

In [None]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

In [None]:
mask = df["Team"].isnull()

df[mask]

In [None]:
condition = df["Gender"].notnull()

df[condition]

## The `.between()` Method
<a id='section6'></a>
[Index](#section0)

In [None]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

In [None]:
df[df["Salary"].between(60000, 70000)]

In [None]:
df[df["Bonus %"].between(2.0, 5.0)]

In [None]:
df[df["Start Date"].between("1991-01-01", "1992-01-01")]

In [None]:
df[df["Last Login Time"].between("08:30AM", "12:00PM")]

## The `.duplicated()` Method
<a id='section7'></a>
[Index](#section0)

In [None]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.sort_values("First Name", inplace = True)
df.head(3)

In [None]:
mask = ~df["First Name"].duplicated(keep = False)
df[mask]

## The `.drop_duplicates()` Method
<a id='section8'></a>
[Index](#section0)

In [None]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.sort_values("First Name", inplace = True)
df.head(3)

In [None]:
len(df)

In [None]:
################ en vez de sum ##############

len(df.drop_duplicates())

In [None]:
df.drop_duplicates(subset = ["First Name"], keep = False)

In [None]:
df.drop_duplicates(subset = ["First Name", "Team"], inplace = True)

In [None]:
df.head(2)

In [None]:
len(df)

## The `.unique()` and `.nunique()` Methods
<a id='section9'></a>
[Index](#section0)

In [10]:
df = pd.read_csv("../employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-10-04 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-10-04 06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,2021-10-04 11:17:00,130590,11.858,False,Finance


In [11]:
df["Gender"].unique()

df["Team"].unique()

array(['Marketing', nan, 'Finance', 'Client Services', 'Legal', 'Product',
       'Engineering', 'Business Development', 'Human Resources', 'Sales',
       'Distribution'], dtype=object)

In [12]:
len(df["Team"].unique())

11

In [14]:
df["Team"].nunique(dropna = True)

10

In [15]:
df["Team"].nunique(dropna = False)

11