## Pandas Tutorial

### DataFrame & Series Basics - Selecting Rows & Columns

- **Source**: Corey Schafer - [Python Pandas Playlist](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS)
- Data used from [StackOverflow Developer Survey 2021](https://insights.stackoverflow.com/survey)

In [1]:
import pandas as pd

### Creating a dataframe

In [2]:
people = {
    "first": ["Joey", "Chandler", "Ross"],
    "last": ["Tribiyani", "Bing", "Geller"],
    "email": ["howudoing@gmail.com", "waphaaa@gmail.com", "hiiii@gmail.com"]
}

In [3]:
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Joey,Tribiyani,howudoing@gmail.com
1,Chandler,Bing,waphaaa@gmail.com
2,Ross,Geller,hiiii@gmail.com


In [4]:
# Accessing the values(rows) of a signle column
# It rrturns a series
df["email"]

0    howudoing@gmail.com
1      waphaaa@gmail.com
2        hiiii@gmail.com
Name: email, dtype: object

In [5]:
# Series is a 1-dimensional array
# or series is the rows of a single column
type(df["email"])

pandas.core.series.Series

In [6]:
df.email    # same as df["email"]

0    howudoing@gmail.com
1      waphaaa@gmail.com
2        hiiii@gmail.com
Name: email, dtype: object

In [7]:
# Accessing multiple columns of a dataframe
# This is not a series, as it have multiple columns, so it is a dataframe, extracted from main dataframe
df[["last", "email"]]

Unnamed: 0,last,email
0,Tribiyani,howudoing@gmail.com
1,Bing,waphaaa@gmail.com
2,Geller,hiiii@gmail.com


In [8]:
# Returns allthe column names
df.columns

Index(['first', 'last', 'email'], dtype='object')

### Accessing data using `iloc`

- It return the data based on indexed location (iloc)
- Slicing includes both start & stop
- While accessing rows, the indexes are column names
- While accessing columns, the default(provided) indexes are returned.

In [9]:
# Here, indexes are column names
df.iloc[0]

first                   Joey
last               Tribiyani
email    howudoing@gmail.com
Name: 0, dtype: object

In [10]:
# getting multiple rows using iloc
df.iloc[[0, 2]]

Unnamed: 0,first,last,email
0,Joey,Tribiyani,howudoing@gmail.com
2,Ross,Geller,hiiii@gmail.com


In [11]:
# Getting multiple rows of a specified column
df.iloc[[0, 2], 2]

0    howudoing@gmail.com
2        hiiii@gmail.com
Name: email, dtype: object

In [12]:
# Getting multiple rows of multiple columns
df.iloc[[0, 2], [0, 1]]

Unnamed: 0,first,last
0,Joey,Tribiyani
2,Ross,Geller


In [13]:
# Slicing same as numpy slicing
df.iloc[0:,1:]

Unnamed: 0,last,email
0,Tribiyani,howudoing@gmail.com
1,Bing,waphaaa@gmail.com
2,Geller,hiiii@gmail.com


### Accessing data using loc

- It returns the data based on the labels location (loc) or boolean array
- The integers are interpreted as a label of the index
- Slicing includes both start and stop

In [14]:
df

Unnamed: 0,first,last,email
0,Joey,Tribiyani,howudoing@gmail.com
1,Chandler,Bing,waphaaa@gmail.com
2,Ross,Geller,hiiii@gmail.com


In [15]:
# Accessing first row of the dataframe
# NOTE: 0 is being treated as label, not as integer
df.loc[0]

first                   Joey
last               Tribiyani
email    howudoing@gmail.com
Name: 0, dtype: object

In [16]:
# Accessing multiple rows of the dataframe
df.loc[[0, 1]]

Unnamed: 0,first,last,email
0,Joey,Tribiyani,howudoing@gmail.com
1,Chandler,Bing,waphaaa@gmail.com


In [17]:
# Accessing multiple rows of specific column
df.loc[[0, 1], "email"]

0    howudoing@gmail.com
1      waphaaa@gmail.com
Name: email, dtype: object

In [18]:
# Slicing same as numpy slicing
df.loc[1:, "last":]

Unnamed: 0,last,email
1,Bing,waphaaa@gmail.com
2,Geller,hiiii@gmail.com


## Dataframes of stackoverflow survey result 2021

In [19]:
df = pd.read_csv("data/survey_results_public.csv")
schema_df = pd.read_csv("data/survey_results_schema.csv")

In [20]:
pd.set_option("display.max_columns", 48)

In [21]:
df.head()

Unnamed: 0,ResponseId,MainBranch,Employment,Country,US_State,UK_Country,EdLevel,Age1stCode,LearnCode,YearsCode,YearsCodePro,DevType,OrgSize,Currency,CompTotal,CompFreq,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSys,NEWStuck,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,NEWOtherComms,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,SurveyLength,SurveyEase,ConvertedCompYearly
0,1,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Slovakia,,,"Secondary school (e.g. American high school, G...",18 - 24 years,Coding Bootcamp;Other online resources (ex: vi...,,,"Developer, mobile",20 to 99 employees,EUR European Euro,4800.0,Monthly,C++;HTML/CSS;JavaScript;Objective-C;PHP;Swift,Swift,PostgreSQL;SQLite,SQLite,,,Laravel;Symfony,,,,,,PHPStorm;Xcode,Atom;Xcode,MacOS,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Multiple times per day,Yes,A few times per month or weekly,"Yes, definitely",No,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,62268.0
1,2,I am a student who is learning to code,"Student, full-time",Netherlands,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",7.0,,,,,,,JavaScript;Python,,PostgreSQL,,,,Angular;Flask;Vue.js,,Cordova,,Docker;Git;Yarn,Git,Android Studio;IntelliJ;Notepad++;PyCharm,,Windows,Visit Stack Overflow;Google it,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,"Yes, definitely",No,18-24 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,
2,3,"I am not primarily a developer, but I write co...","Student, full-time",Russian Federation,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",,,,,,,,Assembly;C;Python;R;Rust,Julia;Python;Rust,SQLite,SQLite,Heroku,,Flask,Flask,NumPy;Pandas;TensorFlow;Torch/PyTorch,Keras;NumPy;Pandas;TensorFlow;Torch/PyTorch,,,IPython/Jupyter;PyCharm;RStudio;Sublime Text;V...,IPython/Jupyter;RStudio;Sublime Text;Visual St...,MacOS,Visit Stack Overflow;Google it;Watch help / tu...,Stack Overflow;Stack Exchange,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",Yes,18-24 years old,Man,No,Prefer not to say,Prefer not to say,None of the above,None of the above,Appropriate in length,Easy,
3,4,I am a developer by profession,Employed full-time,Austria,,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",11 - 17 years,,,,"Developer, front-end",100 to 499 employees,EUR European Euro,,Monthly,JavaScript;TypeScript,JavaScript;TypeScript,,,,,Angular;jQuery,Angular;jQuery,,,,,,,Windows,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,Neutral,No,35-44 years old,Man,No,Straight / Heterosexual,White or of European descent,I am deaf / hard of hearing,,Appropriate in length,Neither easy nor difficult,
4,5,I am a developer by profession,"Independent contractor, freelancer, or self-em...",United Kingdom of Great Britain and Northern I...,,England,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",5 - 10 years,Friend or family member,17.0,10.0,"Developer, desktop or enterprise applications;...","Just me - I am a freelancer, sole proprietor, ...",GBP\tPound sterling,,,Bash/Shell;HTML/CSS;Python;SQL,Bash/Shell;HTML/CSS;Python;SQL,Elasticsearch;PostgreSQL;Redis,Cassandra;Elasticsearch;PostgreSQL;Redis,,,Flask,Flask,Apache Spark;Hadoop;NumPy;Pandas,Hadoop;NumPy;Pandas,Docker;Git;Kubernetes;Yarn,Docker;Git;Kubernetes;Yarn,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim;Vis...,Linux-based,Visit Stack Overflow;Go for a walk or other ph...,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per week,"Yes, somewhat",No,25-34 years old,Man,No,,White or of European descent,None of the above,,Appropriate in length,Easy,


In [22]:
df.tail()

Unnamed: 0,ResponseId,MainBranch,Employment,Country,US_State,UK_Country,EdLevel,Age1stCode,LearnCode,YearsCode,YearsCodePro,DevType,OrgSize,Currency,CompTotal,CompFreq,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSys,NEWStuck,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,NEWOtherComms,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,SurveyLength,SurveyEase,ConvertedCompYearly
83434,83435,I am a developer by profession,Employed full-time,United States of America,Texas,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",6,5,"Developer, back-end",20 to 99 employees,USD\tUnited States dollar,160500.0,Yearly,Clojure;Kotlin;SQL,Clojure,Oracle;SQLite,SQLite,AWS,AWS,,,,,Docker;Git,Git;Kubernetes,IntelliJ;Sublime Text;Vim;Visual Studio Code,Sublime Text;Vim,MacOS,Call a coworker or friend;Google it,Stack Overflow;Stack Exchange,A few times per week,No,,"No, not at all",No,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,I have a concentration and/or memory disorder ...,Appropriate in length,Easy,160500.0
83435,83436,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Benin,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",4,2,"Developer, full-stack","Just me - I am a freelancer, sole proprietor, ...",XOF\tWest African CFA franc,200000.0,Monthly,,,Firebase;MariaDB;MySQL;PostgreSQL;Redis;SQLite,Firebase;MariaDB;MongoDB;MySQL;PostgreSQL;Redi...,,,Django;jQuery;Laravel;React.js;Ruby on Rails,Django;Express;jQuery;Laravel;React.js;Ruby on...,Flutter;Qt,,Git;Unity 3D;Unreal Engine,Docker;Git;Kubernetes,Android Studio;Eclipse;Emacs;IntelliJ;NetBeans...,Emacs;IntelliJ;PHPStorm;PyCharm;RStudio;Sublim...,Linux-based,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow;Stack Exchange,Multiple times per day,Yes,I have never participated in Q&A on Stack Over...,"Yes, somewhat",No,18-24 years old,Man,No,Straight / Heterosexual,Black or of African descent,None of the above,None of the above,Appropriate in length,Easy,3960.0
83436,83437,I am a developer by profession,Employed full-time,United States of America,New Jersey,,"Secondary school (e.g. American high school, G...",11 - 17 years,School,10,4,Data scientist or machine learning specialist;...,"10,000 or more employees",USD\tUnited States dollar,1800.0,Weekly,Groovy;Java;Python,Java;Python,DynamoDB;Elasticsearch;MongoDB;PostgreSQL;Redis,DynamoDB;Redis,AWS;Google Cloud Platform,AWS,FastAPI;Flask,FastAPI;Flask,Hadoop;Keras;NumPy;Pandas,Apache Spark;Hadoop;Keras;NumPy;Pandas;TensorFlow,Ansible;Docker;Git;Terraform,Docker;Git;Kubernetes;Terraform,Android Studio;Eclipse;IntelliJ;IPython/Jupyte...,IntelliJ;IPython/Jupyter;Notepad++;Vim,Windows,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow;Stack Exchange,A few times per week,Yes,I have never participated in Q&A on Stack Over...,"No, not really",No,25-34 years old,Man,No,,White or of European descent,None of the above,None of the above,Appropriate in length,Neither easy nor difficult,90000.0
83437,83438,I am a developer by profession,Employed full-time,Canada,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,Online Courses or Certification;Books / Physic...,5,3,"Developer, back-end",20 to 99 employees,CAD\tCanadian dollar,90000.0,Monthly,Bash/Shell;JavaScript;Node.js;Python,Go;Rust,Cassandra;Elasticsearch;MongoDB;PostgreSQL;Redis,,Heroku,AWS;DigitalOcean,Django;Express;Flask;React.js,,NumPy;Pandas;TensorFlow;Torch/PyTorch,NumPy;Pandas;TensorFlow;Torch/PyTorch,Ansible;Docker;Git;Terraform,Kubernetes;Terraform,PyCharm;Sublime Text,,MacOS,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,A few times per month or weekly,Yes,Less than once per month or monthly,"No, not really",No,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,I have a mood or emotional disorder (e.g. depr...,Appropriate in length,Neither easy nor difficult,816816.0
83438,83439,I am a developer by profession,Employed full-time,Brazil,,,"Professional degree (JD, MD, etc.)",11 - 17 years,School,14,4,"Developer, front-end;Developer, full-stack;Dev...",I don’t know,BRL\tBrazilian real,7700.0,Monthly,Delphi;Elixir;HTML/CSS;Java;JavaScript,Elixir;HTML/CSS;Java;JavaScript;Node.js;PHP;SQ...,Oracle;PostgreSQL,Elasticsearch;MongoDB;MySQL;Oracle;PostgreSQL;...,Microsoft Azure,AWS,Angular;Spring,Express;Laravel;Spring;Symfony,,,Docker;Git,Docker;Git;Kubernetes,IntelliJ;Visual Studio Code,IntelliJ;PHPStorm;Visual Studio Code,Linux-based,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow;Stack Exchange;Stack Overflow f...,A few times per week,Yes,A few times per week,"Yes, somewhat",No,18-24 years old,Man,No,Straight / Heterosexual,Hispanic or Latino/a/x,None of the above,None of the above,Appropriate in length,Easy,21168.0


In [23]:
schema_df.head()

Unnamed: 0,qid,qname,question,force_resp,type,selector
0,QID16,S0,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
1,QID12,MetaInfo,Browser Meta Info,False,Meta,Browser
2,QID1,S1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
3,QID2,MainBranch,Which of the following options best describes ...,True,MC,SAVR
4,QID24,Employment,Which of the following best describes your cur...,False,MC,MAVR


In [24]:
df.shape

(83439, 48)

In [25]:
schema_df.shape

(48, 6)

In [26]:
df.columns

Index(['ResponseId', 'MainBranch', 'Employment', 'Country', 'US_State',
       'UK_Country', 'EdLevel', 'Age1stCode', 'LearnCode', 'YearsCode',
       'YearsCodePro', 'DevType', 'OrgSize', 'Currency', 'CompTotal',
       'CompFreq', 'LanguageHaveWorkedWith', 'LanguageWantToWorkWith',
       'DatabaseHaveWorkedWith', 'DatabaseWantToWorkWith',
       'PlatformHaveWorkedWith', 'PlatformWantToWorkWith',
       'WebframeHaveWorkedWith', 'WebframeWantToWorkWith',
       'MiscTechHaveWorkedWith', 'MiscTechWantToWorkWith',
       'ToolsTechHaveWorkedWith', 'ToolsTechWantToWorkWith',
       'NEWCollabToolsHaveWorkedWith', 'NEWCollabToolsWantToWorkWith', 'OpSys',
       'NEWStuck', 'NEWSOSites', 'SOVisitFreq', 'SOAccount', 'SOPartFreq',
       'SOComm', 'NEWOtherComms', 'Age', 'Gender', 'Trans', 'Sexuality',
       'Ethnicity', 'Accessibility', 'MentalHealth', 'SurveyLength',
       'SurveyEase', 'ConvertedCompYearly'],
      dtype='object')

In [27]:
df["Gender"]

0        Man
1        Man
2        Man
3        Man
4        Man
        ... 
83434    Man
83435    Man
83436    Man
83437    Man
83438    Man
Name: Gender, Length: 83439, dtype: object

In [28]:
# .value_counts() return a series having unique rows with thier counts
df["Gender"].value_counts()

Man                                                                                   74817
Woman                                                                                  4120
Prefer not to say                                                                      1442
Non-binary, genderqueer, or gender non-conforming                                       690
Or, in your own words:                                                                  413
Man;Or, in your own words:                                                              268
Man;Non-binary, genderqueer, or gender non-conforming                                   252
Woman;Non-binary, genderqueer, or gender non-conforming                                 147
Man;Woman                                                                                41
Non-binary, genderqueer, or gender non-conforming;Or, in your own words:                 21
Man;Woman;Non-binary, genderqueer, or gender non-conforming                     

In [29]:
df.loc[0, "Gender"]

'Man'

In [30]:
df.loc[:2, "Age":"MentalHealth"]

Unnamed: 0,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth
0,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above
1,18-24 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above
2,18-24 years old,Man,No,Prefer not to say,Prefer not to say,None of the above,None of the above
