# Pandas Tutorial
### Filtering - Using Conditionals to Filter Rows and Columns
- **Source:** Corey Schafer - [Python Pandas Playlist](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS)

- Data used from [StackOverflow Developer Survey 2021](https://insights.stackoverflow.com/survey)

In [1]:
import pandas as pd

### Creating a sample DataFrame

In [2]:
people = {
    "first" : ["Parampreet", "Corey", "Anant"],
    "last" : ["Singh", "Schafer", "Luthra"],
    "email" : ["ParampreetSingh@gmail.com", "CoreySchafer@gmail.com", "AnantLuthra@gmail.com"]
}

In [3]:
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [4]:
# Checking whose last name is "Singh"
filter_last = (df["last"] == "Singh")
filter_last

0     True
1    False
2    False
Name: last, dtype: bool

In [5]:
df[filter_last]

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com


In [6]:
# It is same as above
df[ df["last"] == "Singh"]

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com


In [7]:
# 3rd way
df.loc[filter_last]

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com


In [8]:
df.loc[filter_last, "email"]

0    ParampreetSingh@gmail.com
Name: email, dtype: object

### And `&`, Or `|`, Not `~` logical operators

In [9]:
# And operator
df[ (df["first"]=="Parampreet") & (df["last"]=="Singh")]

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com


In [10]:
# Or Operator
df [ (df["last"]=="Singh") | (df["last"]=="Luthra")]

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [11]:
# Not Operator (tilde sign ~)
df [ ~(df["last"]=="Singh")]

Unnamed: 0,first,last,email
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


### DataFrames of StackOverflow survey result 2021

In [12]:
df = pd.read_csv("./data/survey_results_public.csv", index_col="ResponseId")
schema_df = pd.read_csv("./data/survey_results_schema.csv", index_col="qname")

In [13]:
pd.set_option("display.max_columns", df.shape[1])

In [14]:
df.head()

Unnamed: 0_level_0,MainBranch,Employment,Country,US_State,UK_Country,EdLevel,Age1stCode,LearnCode,YearsCode,YearsCodePro,DevType,OrgSize,Currency,CompTotal,CompFreq,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSys,NEWStuck,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,NEWOtherComms,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1
1,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Slovakia,,,"Secondary school (e.g. American high school, G...",18 - 24 years,Coding Bootcamp;Other online resources (ex: vi...,,,"Developer, mobile",20 to 99 employees,EUR European Euro,4800.0,Monthly,C++;HTML/CSS;JavaScript;Objective-C;PHP;Swift,Swift,PostgreSQL;SQLite,SQLite,,,Laravel;Symfony,,,,,,PHPStorm;Xcode,Atom;Xcode,MacOS,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Multiple times per day,Yes,A few times per month or weekly,"Yes, definitely",No,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,62268.0
2,I am a student who is learning to code,"Student, full-time",Netherlands,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",7.0,,,,,,,JavaScript;Python,,PostgreSQL,,,,Angular;Flask;Vue.js,,Cordova,,Docker;Git;Yarn,Git,Android Studio;IntelliJ;Notepad++;PyCharm,,Windows,Visit Stack Overflow;Google it,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,"Yes, definitely",No,18-24 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,
3,"I am not primarily a developer, but I write co...","Student, full-time",Russian Federation,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",,,,,,,,Assembly;C;Python;R;Rust,Julia;Python;Rust,SQLite,SQLite,Heroku,,Flask,Flask,NumPy;Pandas;TensorFlow;Torch/PyTorch,Keras;NumPy;Pandas;TensorFlow;Torch/PyTorch,,,IPython/Jupyter;PyCharm;RStudio;Sublime Text;V...,IPython/Jupyter;RStudio;Sublime Text;Visual St...,MacOS,Visit Stack Overflow;Google it;Watch help / tu...,Stack Overflow;Stack Exchange,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",Yes,18-24 years old,Man,No,Prefer not to say,Prefer not to say,None of the above,None of the above,Appropriate in length,Easy,
4,I am a developer by profession,Employed full-time,Austria,,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",11 - 17 years,,,,"Developer, front-end",100 to 499 employees,EUR European Euro,,Monthly,JavaScript;TypeScript,JavaScript;TypeScript,,,,,Angular;jQuery,Angular;jQuery,,,,,,,Windows,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,Neutral,No,35-44 years old,Man,No,Straight / Heterosexual,White or of European descent,I am deaf / hard of hearing,,Appropriate in length,Neither easy nor difficult,
5,I am a developer by profession,"Independent contractor, freelancer, or self-em...",United Kingdom of Great Britain and Northern I...,,England,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",5 - 10 years,Friend or family member,17.0,10.0,"Developer, desktop or enterprise applications;...","Just me - I am a freelancer, sole proprietor, ...",GBP\tPound sterling,,,Bash/Shell;HTML/CSS;Python;SQL,Bash/Shell;HTML/CSS;Python;SQL,Elasticsearch;PostgreSQL;Redis,Cassandra;Elasticsearch;PostgreSQL;Redis,,,Flask,Flask,Apache Spark;Hadoop;NumPy;Pandas,Hadoop;NumPy;Pandas,Docker;Git;Kubernetes;Yarn,Docker;Git;Kubernetes;Yarn,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim;Vis...,Linux-based,Visit Stack Overflow;Go for a walk or other ph...,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per week,"Yes, somewhat",No,25-34 years old,Man,No,,White or of European descent,None of the above,,Appropriate in length,Easy,


In [15]:
schema_df.head()

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S0,QID16,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
MetaInfo,QID12,Browser Meta Info,False,Meta,Browser
S1,QID1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
MainBranch,QID2,Which of the following options best describes ...,True,MC,SAVR
Employment,QID24,Which of the following best describes your cur...,False,MC,MAVR


In [16]:
df.columns

Index(['MainBranch', 'Employment', 'Country', 'US_State', 'UK_Country',
       'EdLevel', 'Age1stCode', 'LearnCode', 'YearsCode', 'YearsCodePro',
       'DevType', 'OrgSize', 'Currency', 'CompTotal', 'CompFreq',
       'LanguageHaveWorkedWith', 'LanguageWantToWorkWith',
       'DatabaseHaveWorkedWith', 'DatabaseWantToWorkWith',
       'PlatformHaveWorkedWith', 'PlatformWantToWorkWith',
       'WebframeHaveWorkedWith', 'WebframeWantToWorkWith',
       'MiscTechHaveWorkedWith', 'MiscTechWantToWorkWith',
       'ToolsTechHaveWorkedWith', 'ToolsTechWantToWorkWith',
       'NEWCollabToolsHaveWorkedWith', 'NEWCollabToolsWantToWorkWith', 'OpSys',
       'NEWStuck', 'NEWSOSites', 'SOVisitFreq', 'SOAccount', 'SOPartFreq',
       'SOComm', 'NEWOtherComms', 'Age', 'Gender', 'Trans', 'Sexuality',
       'Ethnicity', 'Accessibility', 'MentalHealth', 'SurveyLength',
       'SurveyEase', 'ConvertedCompYearly'],
      dtype='object')

In [17]:
high_salary = (df["ConvertedCompYearly"] > 70_000)

In [18]:
df.loc[high_salary, ["Country", "LanguageHaveWorkedWith", "ConvertedCompYearly"]].head()

Unnamed: 0_level_0,Country,LanguageHaveWorkedWith,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
13,Germany,C;C++;Java;Perl;Ruby,77290.0
19,Singapore,C++;Python,160932.0
25,Germany,C++;HTML/CSS;Java;JavaScript;Kotlin;Node.js;Ty...,77831.0
27,Switzerland,C++;Python,81319.0
32,Israel,Bash/Shell;Go;Java;Node.js;Python;Scala;SQL,122580.0


In [19]:
countries = ["United States", "India", "United Kingdom", "Germany", "Canada"]
filt = df["Country"].isin(countries)

In [20]:
df.loc[filt, "Country"]

ResponseId
9          India
13       Germany
18        Canada
21       Germany
23         India
          ...   
83416    Germany
83418      India
83425    Germany
83433     Canada
83438     Canada
Name: Country, Length: 19148, dtype: object

In [21]:
python_programmers = df["LanguageHaveWorkedWith"].str.contains("Python", na=False)
python_programmers

ResponseId
1        False
2         True
3         True
4        False
5         True
         ...  
83435    False
83436    False
83437     True
83438     True
83439    False
Name: LanguageHaveWorkedWith, Length: 83439, dtype: bool

In [22]:
df.loc[python_programmers, "LanguageHaveWorkedWith"]

ResponseId
2                                        JavaScript;Python
3                                 Assembly;C;Python;R;Rust
5                           Bash/Shell;HTML/CSS;Python;SQL
6        C;C#;C++;HTML/CSS;Java;JavaScript;Node.js;Powe...
10                                              C++;Python
                               ...                        
83430               HTML/CSS;PHP;PowerShell;Python;SQL;VBA
83431          APL;Clojure;LISP;Python;Ruby;SQL;TypeScript
83432    C#;Dart;HTML/CSS;Java;JavaScript;Kotlin;Node.j...
83437                                   Groovy;Java;Python
83438                 Bash/Shell;JavaScript;Node.js;Python
Name: LanguageHaveWorkedWith, Length: 39792, dtype: object