# Filtering
Pandas makes it pretty simple to filter out data so that we get what we want and get rid of what we don't. We've already seen basic filtering with 'loc' as it has helped us filter out columns.

In [1]:
'''
+ Ex. 1: That condition creates a 'filter mask'. Basically it's a series/sequence of boolean values, where true means a certain row index meets the criteria and will be returned, whilst false 
means it doesn't and will be discarded. So for index 0, row 0, the value of the sequence is 'True' because the user in the zeroth index (first row) has the first name 'Kevin'. 
Then it's false for the 1st index (second row) since the 'first' value isn't 'Kevin'.

Then you can 'apply' the filter on your data frame, and that will give you back a data frame or series will the filtered out rows. You could also use `.loc`. Remember that .loc is used for getting 
rows by label, but passing in a series of booleans works as well. 

'''
import pandas as pd
users = {
  'first': ["Kevin", "Abby"],
  'last': ["Nguyen", "Wendel"],
  'email': ["knguyen44@gmail.com", "abbyWendel@outlook.com"]
}
usersDF = pd.DataFrame(users)


myFilter = (usersDF["first"] == "Kevin")


filteredUsersDF = usersDF[myFilter] # or you could do: usersDF.loc[myFilter]

filteredUsersDF

'''
+ Ex. 2: Using AND and OR operators with filters. These operators are associated with & and |.
1. Find all users with the name Lebron James.
2. Find all users whose name wasn't Lebron James.
3. Find all users with the first name Kyrie or Michael.
'''
andFilter = (usersDF["first"] == "Lebron" & usersDF["last"] == "James")
negateFilter = ~(usersDF["first"] == "Lebron" & usersDF["last"] == "James")
orFilter = (usersDF["first"] == "Kyrie" | usersDF["first"] == "Michael")


Unnamed: 0,first,last,email
0,Kevin,Nguyen,knguyen44@gmail.com


In [None]:
'''
+ Ex. 3: Let's take a look at some real world examples

1. Let's find all programmers with a salary above a certain threshold. However let's get some columns such as the country they're from, the languages they're working with, and also their actual salary
2. Let's also do another query. Find all rows with a given 'Country' value that's in our list.
3. In the 'LanguageWorkedWith' column it looks like this "Java;JavaScript;Python". Akin to 'LIKE' in SQL, we can do a filter where the column matches a certain regex pattern. This is useful if we want 
to do something like, finding all developers where 'Python' is one of their languages. Here our filter does that exact thing and the 'na' just means that for any rows that have the 'LanguagesWorkedWith'
column as 'NaN', then we'll ignore and skip them. Then 
'''

import pandas as pd
csvPath = "../data/survey_results_public.csv"
df = pd.read_csv(csvPath, index_col="Respondent")
highSalaryFilter = (df["ConvertedComp"] > 70000)
df.loc[highSalaryFilter, ["Country", "LanguageWorkedWith", "ConvertedComp"]]

targetCountries = ["United States", "Germany", "Canada"]
countriesFilter = df["Country"].isin(targetCountries)
df.loc(countriesFilter, ["Country", "LanguageWorkedWith", "ConvertedComp"])


languagesFilter = df["LanguagesWorkedWith"].str.contains("Python", na=False)
df.loc(languagesFilter, ["Country", "LanguagesWorkedWith", "ConvertedComp"])