# Python Machine Learning - Mastering Pandas
<hr>

### YouTube Playlist
<a href="https://youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS">Pandas Tutorial Playlist</a>

### Notebook Index

- [Section 1 - Importing Data Using Pandas](#section_1)
- [Section 2 - DataFrames and Row/Column Selection](#section_2)
- [Section 3 - Indexing with Panda DataFrame](#section_3)
- [Section 4 - Filtering Using Conditionals with Pandas](#section_4)
- [Section 5 - Updating Rows and Columns - Modifying Data Within DataFrames](#section_5)
- [Section 6 - Add/Remove Rows and Columns from DataFrame](#section_6)
- [Section 7 - Sorting Data](#section_7)
- [Sections 8 - Grouping and Aggregating - Analyzing and Exploring Your Data](#section_8) 
- [Sections 9 - Cleaning Data - Casting Datatypes and Handling Missing Values](#section_9) 
- [Sections 10 - Working with Dates and Time Series Data](#section_10)
- [Sections 11 - Reading/Writing Data to Different Sources - Excel, JSON, SQL, Etc](#section_11)

<hr>
<a id='section_1'></a>

### Sectoin 1 - Importing Data Using Pandas
- <a href="https://youtu.be/ZyhVh-qRZPA">YouTube Tutorial</a>

<br>
Importing Pandas library

In [1]:
import pandas as pd

<br>
Import data from a csv file<br>
Import schema. (Schema contains information regarding every column in the dataset)<br>

In [2]:
df = pd.read_csv("developer_survey_2019/survey_results_public.csv")

schema_df = pd.read_csv('developer_survey_2019/survey_results_schema.csv')

<br>
This option allows you to change the max rows and columns Pandas can print.

In [3]:
pd.set_option('display.max_columns', 85) # Change the dataframe to display 85 columns
pd.set_option('display.max_rows', 85) # Change the dataframe to display 85 rows

<br>
Print the first five rows in the dataset.

In [4]:
df.head()

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
0,1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",,,4.0,10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,HTML/CSS;Java;JavaScript;Python,C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL,SQLite,MySQL,MacOS;Windows,Android;Arduino;Windows,Django;Flask,Flask;jQuery,Node.js,Node.js,IntelliJ;Notepad++;PyCharm,Windows,I do not use containers,,,Yes,"Fortunately, someone else has that title",Yes,Twitter,Online,Username,2017,A few times per month or weekly,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,31-60 minutes,No,,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
1,2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,,"Developer, desktop or enterprise applications;...",,17,,,,,,,I am actively looking for a job,I've never had a job,,,Financial performance or funding status of the...,"Something else changed (education, award, medi...",,,,,,,,,,,,,,,,,C++;HTML/CSS;Python,C++;HTML/CSS;JavaScript;SQL,,MySQL,Windows,Windows,Django,Django,,,Atom;PyCharm,Windows,I do not use containers,,Useful across many domains and could change ma...,Yes,Yes,Yes,Instagram,Online,Username,2017,Daily or almost daily,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,11-30 minutes,Yes,A few times per month or weekly,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,"Taught yourself a new language, framework, or ...",100 to 499 employees,"Designer;Developer, back-end;Developer, front-...",3.0,22,1,Slightly satisfied,Slightly satisfied,Not at all confident,Not sure,Not sure,"I’m not actively looking, but I am open to new...",1-2 years ago,Interview with people in peer roles,No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,THB,Thai baht,23000.0,Monthly,8820.0,40.0,There's no schedule or spec; I work on what se...,Distracting work environment;Inadequate access...,Less than once per month / Never,Home,Average,No,,"No, but I think we should",Not sure,I have little or no influence,HTML/CSS,Elixir;HTML/CSS,PostgreSQL,PostgreSQL,,,,Other(s):,,,Vim;Visual Studio Code,Linux-based,I do not use containers,,,Yes,Yes,Yes,Reddit,In real life (in person),Username,2011,A few times per week,Find answers to specific questions;Learn how t...,6-10 times per week,They were about the same,,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
3,4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,100 to 499 employees,"Developer, full-stack",3.0,16,Less than 1 year,Very satisfied,Slightly satisfied,Very confident,No,Not sure,I am not interested in new job opportunities,Less than a year ago,"Write code by hand (e.g., on a whiteboard);Int...",No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,61000.0,Yearly,61000.0,80.0,There's no schedule or spec; I work on what se...,,Less than once per month / Never,Home,A little below average,No,,"No, but I think we should",Developers typically have the most influence o...,I have little or no influence,C;C++;C#;Python;SQL,C;C#;JavaScript;SQL,MySQL;SQLite,MySQL;SQLite,Linux;Windows,Linux;Windows,,,.NET,.NET,Eclipse;Vim;Visual Studio;Visual Studio Code,Windows,I do not use containers,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,SIGH,Yes,Reddit,In real life (in person),Username,2014,Daily or almost daily,Find answers to specific questions;Pass the ti...,1-2 times per week,Stack Overflow was much faster,31-60 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
4,5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,"10,000 or more employees","Academic researcher;Developer, desktop or ente...",16.0,14,9,Very dissatisfied,Slightly dissatisfied,Somewhat confident,Yes,No,I am not interested in new job opportunities,Less than a year ago,"Write any code;Write code by hand (e.g., on a ...",No,"Industry that I'd be working in;Languages, fra...",I was preparing for a job search,UAH,Ukrainian hryvnia,,,,55.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Inadequ...,A few days each month,Office,A little above average,"Yes, because I see value in code review",,"Yes, it's part of our process",Not sure,I have little or no influence,C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA,HTML/CSS;Java;JavaScript;SQL;WebAssembly,Couchbase;MongoDB;MySQL;Oracle;PostgreSQL;SQLite,Couchbase;Firebase;MongoDB;MySQL;Oracle;Postgr...,Android;Linux;MacOS;Slack;Windows,Android;Docker;Kubernetes;Linux;Slack,Django;Express;Flask;jQuery;React.js;Spring,Flask;jQuery;React.js;Spring,Cordova;Node.js,Apache Spark;Hadoop;Node.js;React Native,IntelliJ;Notepad++;Vim,Linux-based,"Outside of work, for personal projects",Not at all,,Yes,Also Yes,Yes,Facebook,In real life (in person),Username,I don't remember,Multiple times per day,Find answers to specific questions,More than 10 times per week,Stack Overflow was much faster,,Yes,A few times per month or weekly,"No, I knew that Stack Overflow had a job board...","No, I've heard of them, but I am not part of a...","Yes, definitely",Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


<br>
Display the number of rows and columns in the DataFrame. In this case, we have 88883 rows and 85 columns.

In [5]:
df.shape

(88883, 85)

<br>
<b>.info()</b> give more in-depth information about the columns in the DataFrame. 

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88883 entries, 0 to 88882
Data columns (total 85 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Respondent              88883 non-null  int64  
 1   MainBranch              88331 non-null  object 
 2   Hobbyist                88883 non-null  object 
 3   OpenSourcer             88883 non-null  object 
 4   OpenSource              86842 non-null  object 
 5   Employment              87181 non-null  object 
 6   Country                 88751 non-null  object 
 7   Student                 87014 non-null  object 
 8   EdLevel                 86390 non-null  object 
 9   UndergradMajor          75614 non-null  object 
 10  EduOther                84260 non-null  object 
 11  OrgSize                 71791 non-null  object 
 12  DevType                 81335 non-null  object 
 13  YearsCode               87938 non-null  object 
 14  Age1stCode              87634 non-null

In [7]:
schema_df

Unnamed: 0,Column,QuestionText
0,Respondent,Randomized respondent ID number (not in order ...
1,MainBranch,Which of the following options best describes ...
2,Hobbyist,Do you code as a hobby?
3,OpenSourcer,How often do you contribute to open source?
4,OpenSource,How do you feel about the quality of open sour...
5,Employment,Which of the following best describes your cur...
6,Country,In which country do you currently reside?
7,Student,"Are you currently enrolled in a formal, degree..."
8,EdLevel,Which of the following best describes the high...
9,UndergradMajor,What was your main or most important field of ...


<br>
<hr>

<a id='section_2'></a>
### Section 2 - DataFrames and Row/Column Selection
- <a href="https://youtu.be/zmdjNSmRXF4">YouTube Tutorial</a>

<br>
The following are examples of pure Python dictionaries. The first line contains a dictionary with a single entry.

In [8]:
person = {
    "first": "Corey",
    "last": "Schafer",
    "email": "coreychafer@gmail.com"
}

<br>
The following line is a dictionary with a list item. This allows multiple items to be store inside a single dictionary like an Excel table. The data can be retrieved by calling the assigned key ('first','last',ext). This is similar to a <b>Pandas DataFrame</b> which is similar to a tabular data store.

In [9]:
people = {
    "first": ["Corey"],
    "last": ["Chafer"],
    "email": ["coreychafer@gmail.com"]
}

<br>
This is an example of the it looks like with multiple entries

In [10]:
people = {
    "first": ["Corey","Jane", "John"],
    "last": ["Chafer", "Doe", "Doe"],
    "email": ["Corey@gmail.com","jane@gmail.com","john@gmail.com"]
}

<br>
Accessing all email inside a dictionary.

In [11]:
people['email']

['Corey@gmail.com', 'jane@gmail.com', 'john@gmail.com']

<br>
Convert Python dictionary to Panda DataFrame.
<br>
A <b>Pandas DataFrame</b> consist of rows and columns. The data is stored in a tabular format similar to <b>Excel</b>. 

In [12]:
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Corey,Chafer,Corey@gmail.com
1,Jane,Doe,jane@gmail.com
2,John,Doe,john@gmail.com


In [13]:
df['email']

0    Corey@gmail.com
1     jane@gmail.com
2     john@gmail.com
Name: email, dtype: object

<br>
We can see Pandas DataFrame store data in series. This is just raws of data or a 1D array. <b>Series</b> are like rows of a single column, where dictionaries or DataFrame are rows and columns or 2D array.

In [14]:
type(df['email'])

pandas.core.series.Series

<br>
You can also use dot notation to access a column in Pandas DataFrame. Is best to use brackets because your column name may conflict with build-in functions.

In [15]:
df.email

0    Corey@gmail.com
1     jane@gmail.com
2     john@gmail.com
Name: email, dtype: object

<br>
This way you can access multiple columns within a DataFrame. This is done by passing a list inside the initial bracket.

In [16]:
df[['last','first']]

Unnamed: 0,last,first
0,Chafer,Corey
1,Doe,Jane
2,Doe,John


In [17]:
df.columns

Index(['first', 'last', 'email'], dtype='object')

<br>
Selecting one row.

In [18]:
df.iloc[0]

first              Corey
last              Chafer
email    Corey@gmail.com
Name: 0, dtype: object

<br>
Selecting multiple rows.

In [19]:
df.iloc[[0,2]]

Unnamed: 0,first,last,email
0,Corey,Chafer,Corey@gmail.com
2,John,Doe,john@gmail.com


<br>
Select multiple rows and a specific column.

In [20]:
df.iloc[[0,1],2]

0    Corey@gmail.com
1     jane@gmail.com
Name: email, dtype: object

In [21]:
df.loc[0]

first              Corey
last              Chafer
email    Corey@gmail.com
Name: 0, dtype: object

In [22]:
df.loc[[0,1]]

Unnamed: 0,first,last,email
0,Corey,Chafer,Corey@gmail.com
1,Jane,Doe,jane@gmail.com


<br>
Column names can only be used with loc.

In [23]:
df.loc[[0,1],'email']

0    Corey@gmail.com
1     jane@gmail.com
Name: email, dtype: object

<br>
Select multiple row and columns by using column names.<br>
To access rows we can use <b>loc</b> and <b>iloc</b> methods.<br>
<b>iloc</b> allows us to access rows by integer location.

In [24]:
df.loc[[0,1],['email','last']]

Unnamed: 0,email,last
0,Corey@gmail.com,Chafer
1,jane@gmail.com,Doe


In [25]:
df = pd.read_csv('developer_survey_2019/survey_results_public.csv')

In [26]:
df.shape

(88883, 85)

In [27]:
df.columns

Index(['Respondent', 'MainBranch', 'Hobbyist', 'OpenSourcer', 'OpenSource',
       'Employment', 'Country', 'Student', 'EdLevel', 'UndergradMajor',
       'EduOther', 'OrgSize', 'DevType', 'YearsCode', 'Age1stCode',
       'YearsCodePro', 'CareerSat', 'JobSat', 'MgrIdiot', 'MgrMoney',
       'MgrWant', 'JobSeek', 'LastHireDate', 'LastInt', 'FizzBuzz',
       'JobFactors', 'ResumeUpdate', 'CurrencySymbol', 'CurrencyDesc',
       'CompTotal', 'CompFreq', 'ConvertedComp', 'WorkWeekHrs', 'WorkPlan',
       'WorkChallenge', 'WorkRemote', 'WorkLoc', 'ImpSyn', 'CodeRev',
       'CodeRevHrs', 'UnitTests', 'PurchaseHow', 'PurchaseWhat',
       'LanguageWorkedWith', 'LanguageDesireNextYear', 'DatabaseWorkedWith',
       'DatabaseDesireNextYear', 'PlatformWorkedWith',
       'PlatformDesireNextYear', 'WebFrameWorkedWith',
       'WebFrameDesireNextYear', 'MiscTechWorkedWith',
       'MiscTechDesireNextYear', 'DevEnviron', 'OpSys', 'Containers',
       'BlockchainOrg', 'BlockchainIs', 'BetterLife'

<br>
Selecting a single column.

In [28]:
df['Student']

0                    No
1        Yes, full-time
2                    No
3                    No
4                    No
              ...      
88878                No
88879               NaN
88880               NaN
88881               NaN
88882    Yes, full-time
Name: Student, Length: 88883, dtype: object

<br>
Display unique values within the column.

In [29]:
df["Student"].value_counts()

No                65816
Yes, full-time    15769
Yes, part-time     5429
Name: Student, dtype: int64

<br>
Display all values for the first row.

In [30]:
df.loc[0]

Respondent                                                                1
MainBranch                           I am a student who is learning to code
Hobbyist                                                                Yes
OpenSourcer                                                           Never
OpenSource                The quality of OSS and closed source software ...
Employment                           Not employed, and not looking for work
Country                                                      United Kingdom
Student                                                                  No
EdLevel                                           Primary/elementary school
UndergradMajor                                                          NaN
EduOther                  Taught yourself a new language, framework, or ...
OrgSize                                                                 NaN
DevType                                                                 NaN
YearsCode   

<br>
Display the data for the first row in column Student.

In [31]:
df.loc[0,'Student']

'No'

<br>
Display the data for first 3 rows in column Student.

In [32]:
df.loc[[0,1,2],'Student']

0                No
1    Yes, full-time
2                No
Name: Student, dtype: object

<br>
Use slicing to select rows from 1 to 2. When using loc, the last value is inclusive

In [33]:
df.loc[0:2,'Student']

0                No
1    Yes, full-time
2                No
Name: Student, dtype: object

In [34]:
df.loc[0:2,'Student':'YearsCode']

Unnamed: 0,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode
0,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",,,4.0
1,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,,"Developer, desktop or enterprise applications;...",
2,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,"Taught yourself a new language, framework, or ...",100 to 499 employees,"Designer;Developer, back-end;Developer, front-...",3.0


<br>
<hr>

<a id='section_3'></a>
### Section 3 - Indexing with Panda DataFrame
- <a href="https://youtu.be/W9XjRYFkkyw">YouTube Tutorial</a>

In [35]:
df_people = pd.DataFrame(people)

<br>
Set the index for the DataFrame with <b>.set_index()</b> method. The <b>inplace</b> argument ensure the DataFrame is updated. This mean the changes will be permanent and carried over the new cells.

In [36]:
df_people.set_index('email', inplace=True)

In [37]:
df_people

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
Corey@gmail.com,Corey,Chafer
jane@gmail.com,Jane,Doe
john@gmail.com,John,Doe


In [38]:
df_people.index

Index(['Corey@gmail.com', 'jane@gmail.com', 'john@gmail.com'], dtype='object', name='email')

<br>
You can use the index to extract cell information using <b>loc</b>. Using <b>df_people.loc[0]</b> will no longer work because the index is not numeric anymore. The index has been update to use the email column.

In [39]:
df_people.loc['Corey@gmail.com']

first     Corey
last     Chafer
Name: Corey@gmail.com, dtype: object

<br>
Reset index back to the systems default.

In [40]:
df_people.reset_index(inplace=True)

In [41]:
df_people

Unnamed: 0,email,first,last
0,Corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,john@gmail.com,John,Doe


<br>
Set the index on data import rather than later.

In [42]:
df = pd.read_csv('developer_survey_2019/survey_results_public.csv', index_col='Respondent')

In [43]:
df.loc[1]

MainBranch                           I am a student who is learning to code
Hobbyist                                                                Yes
OpenSourcer                                                           Never
OpenSource                The quality of OSS and closed source software ...
Employment                           Not employed, and not looking for work
Country                                                      United Kingdom
Student                                                                  No
EdLevel                                           Primary/elementary school
UndergradMajor                                                          NaN
EduOther                  Taught yourself a new language, framework, or ...
OrgSize                                                                 NaN
DevType                                                                 NaN
YearsCode                                                                 4
Age1stCode  

In [44]:
df_schema = pd.read_csv('developer_survey_2019/survey_results_schema.csv', index_col='Column')

In [45]:
df_schema

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
Respondent,Randomized respondent ID number (not in order ...
MainBranch,Which of the following options best describes ...
Hobbyist,Do you code as a hobby?
OpenSourcer,How often do you contribute to open source?
OpenSource,How do you feel about the quality of open sour...
Employment,Which of the following best describes your cur...
Country,In which country do you currently reside?
Student,"Are you currently enrolled in a formal, degree..."
EdLevel,Which of the following best describes the high...
UndergradMajor,What was your main or most important field of ...


<br>
To view full text, you have to select the specific cell so you must pass both the row and column location.

In [46]:
df_schema.loc['Employment','QuestionText']

'Which of the following best describes your current employment status?'

<br>
Sorting the index.

In [47]:
df_schema.sort_index(inplace=True, ascending=False)

In [48]:
df_schema

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
YearsCodePro,How many years have you coded professionally (...
YearsCode,"Including any education, how many years have y..."
WorkWeekHrs,"On average, how many hours per week do you work?"
WorkRemote,How often do you work remotely?
WorkPlan,How structured or planned is your work?
WorkLoc,Where would you prefer to work?
WorkChallenge,"Of these options, what are your greatest chall..."
WelcomeChange,"Compared to last year, how welcome do you feel..."
WebFrameWorkedWith,Which of the following web frameworks have you...
WebFrameDesireNextYear,Which of the following web frameworks have you...


<br>
<hr>
<a id='section_4'></a>

### Section 4 - Filtering Using Conditionals with Pandas
- <a href="https://youtu.be/Lw2rlcxScZY">YouTube Tutorial</a>

In [49]:
df_people

Unnamed: 0,email,first,last
0,Corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,john@gmail.com,John,Doe


<br>
Filter by using mask. The desired filter is saved to a variable that can be used to filter the DataFrame later.

In [50]:
filt = (df_people['last'] == 'Doe')
filt

0    False
1     True
2     True
Name: last, dtype: bool

<br>
<b>df_people[(df_people['last'] == 'Doe')]</b> has the same effect as the following code but, is more hands on and harder to update later on.

In [51]:
df_people[filt]

Unnamed: 0,email,first,last
1,jane@gmail.com,Jane,Doe
2,john@gmail.com,John,Doe


<br>
Gives the same results as the above line.

In [52]:
df_people.loc[filt]

Unnamed: 0,email,first,last
1,jane@gmail.com,Jane,Doe
2,john@gmail.com,John,Doe


<br>
When using <b>iloc</b>, the first argument is row, second is column.

In [53]:
df_people.loc[filt, 'email']

1    jane@gmail.com
2    john@gmail.com
Name: email, dtype: object

<br>
Using conditional for filtering.

In [54]:
filt = (df_people['last'] == 'Doe') & (df_people['first'] == 'Jane')

In [55]:
df_people.loc[filt]

Unnamed: 0,email,first,last
1,jane@gmail.com,Jane,Doe


<br>
The tilde return the opposite for the filter. In this case it will return everything but <b>(df_people['last'] == 'Doe') & (df_people['first'] == 'Jane')</b>

In [56]:
df_people.loc[~filt]

Unnamed: 0,email,first,last
0,Corey@gmail.com,Corey,Chafer
2,john@gmail.com,John,Doe


In [57]:
high_salary = (df['ConvertedComp'] > 60000)

In [58]:
df.loc[high_salary]

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,100 to 499 employees,"Developer, full-stack",3,16,Less than 1 year,Very satisfied,Slightly satisfied,Very confident,No,Not sure,I am not interested in new job opportunities,Less than a year ago,"Write code by hand (e.g., on a whiteboard);Int...",No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,61000.0,Yearly,61000.0,80.00,There's no schedule or spec; I work on what se...,,Less than once per month / Never,Home,A little below average,No,,"No, but I think we should",Developers typically have the most influence o...,I have little or no influence,C;C++;C#;Python;SQL,C;C#;JavaScript;SQL,MySQL;SQLite,MySQL;SQLite,Linux;Windows,Linux;Windows,,,.NET,.NET,Eclipse;Vim;Visual Studio;Visual Studio Code,Windows,I do not use containers,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,SIGH,Yes,Reddit,In real life (in person),Username,2014,Daily or almost daily,Find answers to specific questions;Pass the ti...,1-2 times per week,Stack Overflow was much faster,31-60 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
6,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Canada,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Mathematics or statistics,Taken an online course in programming or softw...,,Data or business analyst;Data scientist or mac...,13,15,3,Very satisfied,Slightly satisfied,Very confident,No,Yes,I am not interested in new job opportunities,1-2 years ago,Write any code;Complete a take-home project;In...,No,Financial performance or funding status of the...,I heard about a job opportunity (from a recrui...,CAD,Canadian dollar,40000.0,Monthly,366420.0,15.00,There's no schedule or spec; I work on what se...,,A few days each month,Home,A little above average,No,,"Yes, it's not part of our process but the deve...",Not sure,I have little or no influence,Java;R;SQL,Python;Scala;SQL,MongoDB;PostgreSQL,PostgreSQL,Android;Google Cloud Platform;Linux;Windows,Android;Google Cloud Platform;Linux;Windows,,,Hadoop,Hadoop;Pandas;TensorFlow;Unity 3D,Android Studio;Eclipse;PyCharm;RStudio;Visual ...,Windows,I do not use containers,Not at all,,No,Yes,No,YouTube,In real life (in person),Login,2011,A few times per month or weekly,Find answers to specific questions,Less than once per week,Stack Overflow was slightly faster,60+ minutes,Yes,I have never participated in Q&A on Stack Over...,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,28.0,Man,No,Straight / Heterosexual,East Asian,No,Too long,Neither easy nor difficult
9,I am a developer by profession,Yes,Once a month or more often,The quality of OSS and closed source software ...,Employed full-time,New Zealand,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,10 to 19 employees,"Database administrator;Developer, back-end;Dev...",12,11,4,Slightly satisfied,Slightly satisfied,Somewhat confident,No,Not sure,"I’m not actively looking, but I am open to new...",Less than a year ago,Write any code;Interview with people in peer r...,Yes,Financial performance or funding status of the...,I was preparing for a job search,NZD,New Zealand dollar,138000.0,Yearly,95179.0,32.00,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Inadequ...,Less than once per month / Never,Office,A little above average,"Yes, because I see value in code review",12.0,"Yes, it's not part of our process but the deve...",Not sure,I have some influence,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;P...,Bash/Shell/PowerShell;C;HTML/CSS;JavaScript;Ru...,DynamoDB;PostgreSQL;SQLite,PostgreSQL;Redis;SQLite,AWS;Docker;Heroku;Linux;MacOS;Slack,AWS;Docker;Heroku;Linux;MacOS;Slack;Other(s):,Express;Ruby on Rails;Other(s):,Express;Ruby on Rails;Other(s):,Node.js;Unity 3D,Node.js,Vim,MacOS,Development;Testing;Production,Not at all,An irresponsible use of resources,No,SIGH,Yes,Twitter,In real life (in person),Username,2013,Daily or almost daily,Find answers to specific questions;Contribute ...,3-5 times per week,They were about the same,,Yes,Less than once per month or monthly,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,,23.0,Man,No,Bisexual,White or of European descent,No,Appropriate in length,Neither easy nor difficult
13,I am a developer by profession,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,United States,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,10 to 19 employees,Data or business analyst;Database administrato...,17,11,8,Very satisfied,Very satisfied,,,,I am not interested in new job opportunities,3-4 years ago,Complete a take-home project;Interview with pe...,Yes,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,90000.0,Yearly,90000.0,40.00,There is a schedule and/or spec (made by me or...,"Meetings;Non-work commitments (parenting, scho...",All or almost all the time (I'm full-time remote),Home,A little above average,"Yes, because I see value in code review",5.0,"No, but I think we should",Developers and management have nearly equal in...,I have a great deal of influence,Bash/Shell/PowerShell;HTML/CSS;JavaScript;PHP;...,Bash/Shell/PowerShell;HTML/CSS;JavaScript;Rust...,Couchbase;DynamoDB;Firebase;MySQL,Firebase;MySQL;Redis,Android;AWS;Docker;IBM Cloud or Watson;iOS;Lin...,Android;AWS;Docker;IBM Cloud or Watson;Linux;S...,Angular/Angular.js;ASP.NET;Express;jQuery;Vue.js,Express;Vue.js,Node.js;Xamarin,Node.js;TensorFlow,Vim;Visual Studio;Visual Studio Code;Xcode,Windows,Development;Testing;Production,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,Yes,Yes,Twitter,In real life (in person),Username,2011,Multiple times per day,Find answers to specific questions,More than 10 times per week,Stack Overflow was much faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Somewhat more welcome now than last year,Tech articles written by other developers;Cour...,28.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Easy
16,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,United Kingdom,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",,Taken an online course in programming or softw...,100 to 499 employees,"Developer, full-stack",10,17,3,Very satisfied,Slightly satisfied,Somewhat confident,No,No,"I’m not actively looking, but I am open to new...",3-4 years ago,Interview with people in senior / management r...,Yes,"Languages, frameworks, and other technologies ...",I heard about a job opportunity (from a recrui...,GBP,Pound sterling,29000.0,Monthly,455352.0,40.00,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Distrac...,A few days each month,Home,Average,No,,"No, but I think we should",Developers and management have nearly equal in...,I have some influence,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;T...,C#;HTML/CSS;JavaScript;TypeScript;WebAssembly;...,MongoDB;Microsoft SQL Server;MySQL,Elasticsearch;MongoDB;Microsoft SQL Server;SQLite,,AWS;Google Cloud Platform;Microsoft Azure,Angular/Angular.js;ASP.NET;jQuery,Angular/Angular.js;ASP.NET;React.js,.NET;.NET Core;Node.js,.NET Core;Node.js;React Native,Visual Studio;Visual Studio Code,Windows,I do not use containers,Not at all,A passing fad,No,SIGH,No,YouTube,Online,Username,2010,Multiple times per day,Find answers to specific questions;Learn how t...,Less than once per week,Stack Overflow was much faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,26.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Neither easy nor difficult
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88877,I am a developer by profession,Yes,Less than once per year,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,500 to 999 employees,Data scientist or machine learning specialist;...,31,18,28,Very satisfied,Very satisfied,Very confident,Yes,Yes,"I’m not actively looking, but I am open to new...",More than 4 years ago,Interview with people in senior / management r...,Yes,"Industry that I'd be working in;Languages, fra...",I heard about a job opportunity (from a recrui...,USD,United States dollar,239000.0,Weekly,2000000.0,45.00,There is a schedule and/or spec (made by me or...,Meetings;Not enough people for the workload,Less than once per month / Never,Office,Far above average,"Yes, because I see value in code review",5.0,"No, but I think we should",Developers and management have nearly equal in...,I have some influence,Bash/Shell/PowerShell;C;Clojure;HTML/CSS;Java;...,Bash/Shell/PowerShell;Clojure;HTML/CSS;Java;Ja...,Oracle,Oracle,Docker;iOS;Kubernetes;Linux;Slack;Windows;Othe...,Other(s):,Other(s):,Other(s):,Apache Spark,Apache Spark,Atom;Emacs;IntelliJ;IPython / Jupyter;PyCharm;...,Windows,Development;Testing;Production,Implementing cryptocurrency-based products,An irresponsible use of resources,Yes,Yes,No,Facebook,Online,Screen Name,,A few times per month or weekly,Find answers to specific questions,Less than once per week,Stack Overflow was much faster,11-30 minutes,Yes,I have never participated in Q&A on Stack Over...,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","Yes, somewhat",Somewhat more welcome now than last year,Tech articles written by other developers;Cour...,48.0,Man,No,Straight / Heterosexual,South Asian,Yes,Too long,Neither easy nor difficult
88878,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,20 to 99 employees,"Developer, back-end;Developer, front-end;Devel...",12,14,3,Very satisfied,Very satisfied,Very confident,Yes,Yes,I am not interested in new job opportunities,Less than a year ago,"Write any code;Write code by hand (e.g., on a ...",No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,130000.0,Yearly,130000.0,40.00,There is a schedule and/or spec (made by me or...,"Non-work commitments (parenting, school work, ...",A few days each month,Office,Far above average,"Yes, because I see value in code review",3.0,"No, and I'm glad we don't",Developers and management have nearly equal in...,I have some influence,HTML/CSS;JavaScript;Scala;TypeScript,JavaScript;Rust;Scala;TypeScript,PostgreSQL,PostgreSQL,Slack,Slack,React.js;Other(s):,React.js;Other(s):,Node.js,Node.js,IntelliJ;Sublime Text;Visual Studio Code,MacOS,Production,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,Yes,Yes,Twitter,Online,Username,2010,Multiple times per day,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,0-10 minutes,Yes,A few times per week,Yes,"No, I've heard of them, but I am not part of a...","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,26.0,Man,No,Straight / Heterosexual,South Asian,No,Appropriate in length,Easy
88879,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Finland,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...",20 to 99 employees,"Developer, desktop or enterprise applications;...",17,16,7,Slightly satisfied,Neither satisfied nor dissatisfied,Not at all confident,Not sure,I am already a manager,"I’m not actively looking, but I am open to new...",More than 4 years ago,Complete a take-home project;Interview with pe...,No,"Languages, frameworks, and other technologies ...",I had a negative experience or interaction at ...,EUR,European Euro,6000.0,Monthly,82488.0,37.75,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Distrac...,Less than once per month / Never,Home,Far above average,"Yes, because I see value in code review",10.0,"Yes, it's part of our process",Developers and management have nearly equal in...,I have little or no influence,Bash/Shell/PowerShell;C++;Python,C++,,,Android;Linux;Windows,Linux;Windows,,,,TensorFlow;Unity 3D;Unreal Engine,Android Studio;Notepad++;Vim;Visual Studio,Windows,I do not use containers,Not at all,A passing fad,Yes,,,YouTube,Neither,Username,I don't remember,A few times per month or weekly,Find answers to specific questions,Less than once per week,Stack Overflow was slightly faster,60+ minutes,No,,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are","No, not at all",Not applicable - I did not use Stack Overflow ...,,34.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
88881,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Austria,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,100 to 499 employees,"Developer, full-stack;Engineer, site reliability",18,17,9,Neither satisfied nor dissatisfied,Neither satisfied nor dissatisfied,Very confident,No,No,I am not interested in new job opportunities,More than 4 years ago,Write any code;Complete a take-home project;In...,No,Office environment or company culture;Diversit...,"Something else changed (education, award, medi...",EUR,European Euro,60000.0,Yearly,68745.0,39.00,There is a schedule and/or spec (made by me or...,Distracting work environment;Not enough people...,A few days each month,Office,A little below average,"Yes, because I see value in code review",10.0,"Yes, it's part of our process","The CTO, CIO, or other management purchase new...",I have little or no influence,Bash/Shell/PowerShell;Go;HTML/CSS;Java;JavaScr...,Bash/Shell/PowerShell;Go;HTML/CSS;JavaScript;P...,PostgreSQL;Redis,Elasticsearch;PostgreSQL;Redis,Docker;Kubernetes;Linux;MacOS;Microsoft Azure;...,Docker;Google Cloud Platform;iOS;Kubernetes;Li...,Django;React.js,Angular/Angular.js;Django;React.js,Ansible,Ansible;Node.js,Emacs;Vim;Visual Studio Code,MacOS,Development;Testing;Production;Outside of work...,Non-currency applications of blockchain,Useful for immutable record keeping outside of...,No,Yes,Yes,,Online,Login,2008,A few times per month or weekly,Find answers to specific questions,More than 10 times per week,Stack Overflow was slightly faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,,,37.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy


In [59]:
countries = ['United States', 'Canada', 'India']
filt = df['Country'].isin(countries)

In [60]:
df.loc[filt, 'Country':'OrgSize']

Unnamed: 0_level_0,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
4,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,100 to 499 employees
6,Canada,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Mathematics or statistics,Taken an online course in programming or softw...,
8,India,,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...",
10,India,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",,,"10,000 or more employees"
12,Canada,"Yes, full-time",Some college/university study without earning ...,Mathematics or statistics,Taken an online course in programming or softw...,
...,...,...,...,...,...,...
85182,Canada,,,,,
85642,United States,No,Associate degree,"Information systems, information technology, o...",Taken an online course in programming or softw...,"Just me - I am a freelancer, sole proprietor, ..."
86012,India,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Another engineering discipline (ex. civil, ele...","Taught yourself a new language, framework, or ...",100 to 499 employees
88282,United States,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...",


In [61]:
df['LanguageWorkedWith']

Respondent
1                          HTML/CSS;Java;JavaScript;Python
2                                      C++;HTML/CSS;Python
3                                                 HTML/CSS
4                                      C;C++;C#;Python;SQL
5              C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA
                               ...                        
88377                        HTML/CSS;JavaScript;Other(s):
88601                                                  NaN
88802                                                  NaN
88816                                                  NaN
88863    Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...
Name: LanguageWorkedWith, Length: 88883, dtype: object

<br>
Using a string method to find the word "Python" within the cell. <b>na=False</b> ignores any empty value.

In [62]:
filt = df['LanguageWorkedWith'].str.contains('Python', na=False)

In [63]:
df.loc[filt, 'LanguageWorkedWith']

Respondent
1                          HTML/CSS;Java;JavaScript;Python
2                                      C++;HTML/CSS;Python
4                                      C;C++;C#;Python;SQL
5              C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA
8        Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java...
                               ...                        
84539    Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java...
85738      Bash/Shell/PowerShell;C++;Python;Ruby;Other(s):
86566      Bash/Shell/PowerShell;HTML/CSS;Python;Other(s):
87739             C;C++;HTML/CSS;JavaScript;PHP;Python;SQL
88212                           HTML/CSS;JavaScript;Python
Name: LanguageWorkedWith, Length: 36443, dtype: object

<br>
<hr>
<a id='section_5'></a>

### Section 5 - Updating Rows and Columns - Modifying Data Within DataFrames
- <a href="https://youtu.be/DCDe29sIKcE">YouTube Tutorial</a>

In [64]:
df_people.columns

Index(['email', 'first', 'last'], dtype='object')

In [65]:
df_people.columns = ["Email", "First Name", "Last Name"]
df_people.columns

Index(['Email', 'First Name', 'Last Name'], dtype='object')

In [66]:
df_people.columns = [x.upper() for x in df_people.columns]
df_people.columns

Index(['EMAIL', 'FIRST NAME', 'LAST NAME'], dtype='object')

This line of code make the column name upper case.

In [67]:
df_people.columns = df_people.columns.str.replace(" ","_")
df_people.columns

Index(['EMAIL', 'FIRST_NAME', 'LAST_NAME'], dtype='object')

Replaces a value in the column name like space for underscore.

In [68]:
df_people.rename(columns={"FIRST_NAME": "First", "LAST_NAME": "Last"}, inplace=True)
df_people.columns

Index(['EMAIL', 'First', 'Last'], dtype='object')

In [69]:
df_people

Unnamed: 0,EMAIL,First,Last
0,Corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,john@gmail.com,John,Doe


Replaces values in a column name

In [70]:
df_people.loc[2]

EMAIL    john@gmail.com
First              John
Last                Doe
Name: 2, dtype: object

In [71]:
df_people.loc[2] = ['john@email.com','John', 'Smith']
df_people

Unnamed: 0,EMAIL,First,Last
0,Corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,john@email.com,John,Smith


In [72]:
df_people.loc[2,['Last','EMAIL']] = ['Doe','Johnatan@email.com']

In [73]:
df_people

Unnamed: 0,EMAIL,First,Last
0,Corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,Johnatan@email.com,John,Doe


In [74]:
df_people.loc[2,'Last'] = 'Smith'

In [75]:
df_people

Unnamed: 0,EMAIL,First,Last
0,Corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,Johnatan@email.com,John,Smith


In [76]:
filt = (df_people['EMAIL'] == 'Johnatan@email.com')
df_people.loc[filt, 'Last'] = 'Al'
df_people

Unnamed: 0,EMAIL,First,Last
0,Corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,Johnatan@email.com,John,Al


The above examples replace values within fields.

In [77]:
df_people.columns = [x.lower() for x in df_people.columns]

In [78]:
df_people.columns

Index(['email', 'first', 'last'], dtype='object')

In [79]:
df_people['email'] = df_people['email'].str.lower()

In [80]:
df_people

Unnamed: 0,email,first,last
0,corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,johnatan@email.com,John,Al


Makes the email casing lower.

In [81]:
df_people['email'].apply(len)

0    15
1    14
2    18
Name: email, dtype: int64

Get the number of character for the field

In [82]:
def update_email(email):
    return email.upper()

In [83]:
df_people['email'] = df_people['email'].apply(update_email)

In [84]:
df_people

Unnamed: 0,email,first,last
0,COREY@GMAIL.COM,Corey,Chafer
1,JANE@GMAIL.COM,Jane,Doe
2,JOHNATAN@EMAIL.COM,John,Al


This is a basic example of using a function to make changes within a Series.

In [85]:
df_people['email'] = df_people['email'].apply(lambda x: x.lower())
df_people

Unnamed: 0,email,first,last
0,corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,johnatan@email.com,John,Al


This is another way to replace values in a Series by using a Lambda function (I don't know about Lambda functions. Do some research). 

In [86]:
df_people['email'].apply(len)

0    15
1    14
2    18
Name: email, dtype: int64

In [87]:
df_people.apply(len)

email    3
first    3
last     3
dtype: int64

This function count the number of rows for each column in the DataFrame.

In [88]:
df_people.apply(len, axis='columns')

0    3
1    3
2    3
dtype: int64

This code counts the number of columns in each row.

In [89]:
df_people.apply(pd.Series.min)

email    corey@gmail.com
first              Corey
last                  Al
dtype: object

This line of code gets the first value in alphabetical order.

In [90]:
df_people.applymap(len)

Unnamed: 0,email,first,last
0,15,5,6
1,14,4,3
2,18,4,2


In [91]:
df_people.applymap(str.lower)

Unnamed: 0,email,first,last
0,corey@gmail.com,corey,chafer
1,jane@gmail.com,jane,doe
2,johnatan@email.com,john,al


.applymap only works on DataFrame and is used to update every individual element in the DataFrame.<br>
It applies a function to every single element in a DataFrame.

In [92]:
df_people['first'].map({'Corey': 'Chris','Jane':'Mary'})

0    Chris
1     Mary
2      NaN
Name: first, dtype: object

.map only works on Series object and you can use a dictionary to replace values. This replaces everything!! Look at the last value!

In [93]:
df_people

Unnamed: 0,email,first,last
0,corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,johnatan@email.com,John,Al


In [94]:
df_people['first'].replace({'Corey': 'Chris','Jane':'Mary'})

0    Chris
1     Mary
2     John
Name: first, dtype: object

In [95]:
df_people

Unnamed: 0,email,first,last
0,corey@gmail.com,Corey,Chafer
1,jane@gmail.com,Jane,Doe
2,johnatan@email.com,John,Al


<br>
<hr>
<a id='section_6'></a>

### Section 6 - Add/Remove Rows and Columns from DataFrame
- <a href="https://youtu.be/HQ6XO9eT-fc">YouTube Tutorial 6</a>

In [96]:
df_people['first'] + ' ' + df_people['last']

0    Corey Chafer
1        Jane Doe
2         John Al
dtype: object

In [97]:
df_people['full_name'] = df_people['first'] + ' ' + df_people['last']
df_people

Unnamed: 0,email,first,last,full_name
0,corey@gmail.com,Corey,Chafer,Corey Chafer
1,jane@gmail.com,Jane,Doe,Jane Doe
2,johnatan@email.com,John,Al,John Al


Concatenate two columns

In [98]:
df_people.drop(columns=['first','last'], inplace = True)

Drop columns

In [99]:
df_people['full_name'].str.split(' ')

0    [Corey, Chafer]
1        [Jane, Doe]
2         [John, Al]
Name: full_name, dtype: object

In [100]:
df_people['full_name'].str.split(' ', expand=True)

Unnamed: 0,0,1
0,Corey,Chafer
1,Jane,Doe
2,John,Al


In [101]:
df_people[['first','last']] = df_people['full_name'].str.split(' ', expand=True)

In [102]:
df_people

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al


The above is removing and adding columns from the DataFrame.

In [103]:
df_people.append({'first':'Tony'}, ignore_index=True)

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,,,Tony,


Append a single record to the DataFrame and ignores index so it auto-assigned one.

In [104]:
df_people

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al


In [105]:
people = {
    "email": ["Cap@gmail.com","iron@gmail.com","god@gmail.com"],
    "first": ["Cap","Iron", "God"],
    "last": ["America", "Man", "Thunder"]
}

df2 = pd.DataFrame(people)

In [106]:
df_people.append(df2)

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
0,Cap@gmail.com,,Cap,America
1,iron@gmail.com,,Iron,Man
2,god@gmail.com,,God,Thunder


In [107]:
df_people.append(df2, ignore_index=True)

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
4,iron@gmail.com,,Iron,Man
5,god@gmail.com,,God,Thunder


In [108]:
df_people = df_people.append(df2, ignore_index=True)
df_people

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
4,iron@gmail.com,,Iron,Man
5,god@gmail.com,,God,Thunder


Append a DataFrame to another DataFrame. Must use ignore_index=True so it auto-assigns an index value. Data must be saved back to the data frame, it is not permanent otherwise.

In [109]:
df_people.drop(index=4)

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
5,god@gmail.com,,God,Thunder


In [110]:
df_people.drop(index=4, inplace=True)

In [111]:
df_people

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
5,god@gmail.com,,God,Thunder


In [112]:
filt = df_people['last']=='Doe'
df_people.drop(index=df_people[filt].index)

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
5,god@gmail.com,,God,Thunder


In [113]:
df_people

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
5,god@gmail.com,,God,Thunder


Dropping a single row from the DataFrame based on filters.

<br>
<hr>
<a id='section_7'></a>

### Section 7 - Sorting Data
- <a href="https://youtu.be/T11QYVfZoD0">YouTube Tutorial</a>

In [114]:
df_people

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
5,god@gmail.com,,God,Thunder


In [115]:
df_people.sort_values(by='last', ascending=False)

Unnamed: 0,email,full_name,first,last
5,god@gmail.com,,God,Thunder
1,jane@gmail.com,Jane Doe,Jane,Doe
0,corey@gmail.com,Corey Chafer,Corey,Chafer
3,Cap@gmail.com,,Cap,America
2,johnatan@email.com,John Al,John,Al


In [116]:
df_people

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
5,god@gmail.com,,God,Thunder


Sorting by one columns

In [117]:
df_people.sort_values(by=['first','last'], ascending=False)

Unnamed: 0,email,full_name,first,last
2,johnatan@email.com,John Al,John,Al
1,jane@gmail.com,Jane Doe,Jane,Doe
5,god@gmail.com,,God,Thunder
0,corey@gmail.com,Corey Chafer,Corey,Chafer
3,Cap@gmail.com,,Cap,America


Sorting by multiple columns.

In [118]:
df_people.sort_values(by=['first','last'], ascending=[False,True])

Unnamed: 0,email,full_name,first,last
2,johnatan@email.com,John Al,John,Al
1,jane@gmail.com,Jane Doe,Jane,Doe
5,god@gmail.com,,God,Thunder
0,corey@gmail.com,Corey Chafer,Corey,Chafer
3,Cap@gmail.com,,Cap,America


Sort by multiple columns with different sorting order (ascending descending)

In [119]:
df_people.sort_values(by=['first','last'], ascending=[False,True], inplace=True)

In [120]:
df_people

Unnamed: 0,email,full_name,first,last
2,johnatan@email.com,John Al,John,Al
1,jane@gmail.com,Jane Doe,Jane,Doe
5,god@gmail.com,,God,Thunder
0,corey@gmail.com,Corey Chafer,Corey,Chafer
3,Cap@gmail.com,,Cap,America


In [121]:
df_people.sort_index()

Unnamed: 0,email,full_name,first,last
0,corey@gmail.com,Corey Chafer,Corey,Chafer
1,jane@gmail.com,Jane Doe,Jane,Doe
2,johnatan@email.com,John Al,John,Al
3,Cap@gmail.com,,Cap,America
5,god@gmail.com,,God,Thunder


In [122]:
df_people['last'].sort_values()

2         Al
3    America
0     Chafer
1        Doe
5    Thunder
Name: last, dtype: object

In [123]:
df['ConvertedComp'].nlargest(10)

Respondent
58      2000000.0
102     2000000.0
166     2000000.0
436     2000000.0
452     2000000.0
491     2000000.0
539     2000000.0
770     2000000.0
789     2000000.0
1232    2000000.0
Name: ConvertedComp, dtype: float64

Get the 10 largest values in the column.

In [124]:
df.nlargest(10, 'ConvertedComp')

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
58,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of LOWER quality than prop...",Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Received on-the-job training in software devel...,,"Developer, back-end;Developer, desktop or ente...",28,19,23,Very satisfied,Very satisfied,Very confident,Yes,No,I am not interested in new job opportunities,1-2 years ago,,Yes,Office environment or company culture;Remote w...,Re-entry into the workforce,USD,United States dollar,113000.0,Weekly,2000000.0,40.0,There's no schedule or spec; I work on what se...,Being tasked with non-development work;Non-wor...,"Less than half the time, but at least one day ...",Home,A little above average,"Yes, because I see value in code review",1.0,"No, but I think we should",Developers and management have nearly equal in...,I have a great deal of influence,C#;Java;SQL,C#;F#;Java;Kotlin;SQL,Microsoft SQL Server;Oracle;SQLite,Microsoft SQL Server;Oracle;SQLite,Android;Windows,Android;Raspberry Pi;Windows,ASP.NET;jQuery,Angular/Angular.js;ASP.NET;jQuery,.NET,Hadoop;.NET;.NET Core;Node.js;Puppet;Xamarin,Android Studio;Visual Studio,Windows,I do not use containers,,,Yes,Yes,Yes,I don't use social media,In real life (in person),Login,I don't remember,Multiple times per day,Find answers to specific questions,More than 10 times per week,Stack Overflow was much faster,,Yes,Less than once per month or monthly,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,,47.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,,Easy
102,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...","5,000 to 9,999 employees","Developer, full-stack",8,29,5,Slightly satisfied,Slightly satisfied,Somewhat confident,No,No,"I’m not actively looking, but I am open to new...",1-2 years ago,"Write any code;Write code by hand (e.g., on a ...",Yes,Office environment or company culture;Opportun...,"Something else changed (education, award, medi...",USD,United States dollar,67800.0,Weekly,2000000.0,40.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Distrac...,Less than once per month / Never,Office,Average,No,,"No, but I think we should",Not sure,I have some influence,C#;HTML/CSS;JavaScript;SQL;TypeScript,C;C++;Elixir;Go;Ruby;WebAssembly,Microsoft SQL Server,MongoDB;PostgreSQL;SQLite,Microsoft Azure,AWS;Kubernetes;Microsoft Azure,ASP.NET;jQuery;React.js;Other(s):,Angular/Angular.js;Vue.js,.NET;.NET Core,Node.js,Notepad++;Visual Studio;Visual Studio Code,Windows,I do not use containers,Non-currency applications of blockchain,Useful for immutable record keeping outside of...,No,Yes,Yes,I don't use social media,In real life (in person),Username,2012,Daily or almost daily,Find answers to specific questions;Learn how t...,1-2 times per week,Stack Overflow was much faster,60+ minutes,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,37.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Too long,Easy
166,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","A social science (ex. anthropology, psychology...",Participated in a full-time developer training...,20 to 99 employees,"Developer, back-end;Developer, front-end;Devel...",7,15,6,Slightly satisfied,Slightly satisfied,Very confident,No,Not sure,"I’m not actively looking, but I am open to new...",3-4 years ago,Write any code;Complete a take-home project;So...,No,Financial performance or funding status of the...,I had a negative experience or interaction at ...,USD,United States dollar,137000.0,Weekly,2000000.0,45.0,There is a schedule and/or spec (made by me or...,Distracting work environment;Not enough people...,Less than once per month / Never,Home,A little above average,"Yes, because I see value in code review",8.0,"Yes, it's part of our process","The CTO, CIO, or other management purchase new...",I have some influence,Bash/Shell/PowerShell;Go;HTML/CSS;Java;JavaScr...,Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...,DynamoDB;Elasticsearch;MongoDB;PostgreSQL;Redi...,PostgreSQL;Redis;SQLite,AWS;Docker;Linux,AWS;Docker;iOS;Kubernetes;Linux,jQuery;React.js;Ruby on Rails,React.js;Ruby on Rails,,,Vim,Linux-based,Development;Testing;Production,Not at all,A passing fad,Yes,SIGH,Yes,Twitter,Online,Username,2011,A few times per month or weekly,Find answers to specific questions,Less than once per week,Stack Overflow was slightly faster,11-30 minutes,Yes,Less than once per month or monthly,"No, I knew that Stack Overflow had a job board...","No, I've heard of them, but I am not part of a...","No, not at all",Just as welcome now as I felt last year,,30.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Neither easy nor difficult
436,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,20 to 99 employees,"Database administrator;Developer, back-end;Dev...",20,18,17,Slightly satisfied,Slightly satisfied,Somewhat confident,Yes,I am already a manager,"I’m not actively looking, but I am open to new...",3-4 years ago,"Write any code;Write code by hand (e.g., on a ...",No,Specific department or team I'd be working on;...,"My job status changed (promotion, new job, etc.)",USD,United States dollar,85000.0,Weekly,2000000.0,45.0,There is a schedule and/or spec (made by me or...,Lack of support from management;Meetings;Not e...,A few days each month,Office,Far above average,"Yes, because I see value in code review",,"No, but I think we should","The CTO, CIO, or other management purchase new...",I have some influence,Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...,Bash/Shell/PowerShell;Go;HTML/CSS;JavaScript;P...,Microsoft SQL Server;MySQL;Redis;SQLite,Couchbase;MySQL;Oracle;Redis,Android;AWS;Docker;Google Cloud Platform;Linux...,Android;Arduino;Docker;Google Cloud Platform;L...,jQuery;Laravel;React.js;Vue.js,Laravel;React.js;Vue.js,Node.js,Node.js;React Native;TensorFlow,Android Studio;Atom;Sublime Text;Visual Studio...,Windows,Development;Testing;Production,Not at all,Useful across many domains and could change ma...,Yes,SIGH,Yes,Reddit,Neither,Username,2010,Daily or almost daily,Find answers to specific questions;Contribute ...,3-5 times per week,They were about the same,,Yes,A few times per month or weekly,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,38.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Too long,Easy
452,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,United States,"Yes, full-time",I never completed any formal education,,Taken an online course in programming or softw...,100 to 499 employees,"Database administrator;Developer, back-end",7,28,7,Very satisfied,Very satisfied,Very confident,No,No,"I’m not actively looking, but I am open to new...",Less than a year ago,Solve a brain-teaser style puzzle;Interview wi...,No,Specific department or team I'd be working on;...,"Something else changed (education, award, medi...",USD,United States dollar,75000.0,Weekly,2000000.0,40.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Meeting...,A few days each month,Home,Average,"Yes, because I see value in code review",5.0,"Yes, it's part of our process",Developers and management have nearly equal in...,I have little or no influence,SQL;VBA,Python;SQL,Microsoft SQL Server,MongoDB;Microsoft SQL Server;Oracle;PostgreSQL,Windows,AWS;Linux;Microsoft Azure;Windows,,ASP.NET,.NET,.NET,Notepad++;Visual Studio,Windows,I do not use containers,Not at all,,Yes,"Fortunately, someone else has that title",What?,Facebook,In real life (in person),Screen Name,2013,Daily or almost daily,Find answers to specific questions;Learn how t...,1-2 times per week,Stack Overflow was slightly faster,0-10 minutes,Yes,Multiple times per day,Yes,"No, I've heard of them, but I am not part of a...","Yes, definitely",Somewhat more welcome now than last year,Tech articles written by other developers;Cour...,35.0,Man,No,,White or of European descent,No,Appropriate in length,Easy
491,I am a developer by profession,Yes,Less than once per year,,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,"10,000 or more employees","Developer, full-stack;Developer, mobile",4,18,Less than 1 year,Very satisfied,Very satisfied,Very confident,No,Not sure,I am not interested in new job opportunities,Less than a year ago,"Write any code;Write code by hand (e.g., on a ...",No,Specific department or team I'd be working on;...,I was preparing for a job search,USD,United States dollar,160000.0,Weekly,2000000.0,45.0,There is a schedule and/or spec (made by me or...,Distracting work environment;Inadequate access...,Less than once per month / Never,Office,Average,"Yes, because I see value in code review",2.0,"Yes, it's part of our process",Not sure,I have little or no influence,Java;Objective-C,Java;Objective-C;Swift,,,Android;iOS,Android;iOS,,,,React Native;Unity 3D,Android Studio;IntelliJ;Vim;Xcode,MacOS,"Outside of work, for personal projects",,A passing fad,Yes,SIGH,What?,YouTube,In real life (in person),Username,2014,A few times per week,Find answers to specific questions;Learn how t...,1-2 times per week,They were about the same,,No,,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,22.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina;White or of European...,No,Appropriate in length,Neither easy nor difficult
539,I am a developer by profession,No,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,United States,No,Some college/university study without earning ...,Mathematics or statistics,Taken an online course in programming or softw...,"10,000 or more employees",Data scientist or machine learning specialist;...,22,14,20,Very satisfied,Slightly satisfied,Very confident,Yes,No,"I’m not actively looking, but I am open to new...",1-2 years ago,Write any code;Complete a take-home project;In...,Yes,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,140000.0,Weekly,2000000.0,50.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Non-wor...,A few days each month,Office,A little above average,"Yes, because I see value in code review",10.0,"Yes, it's not part of our process but the deve...","The CTO, CIO, or other management purchase new...",I have little or no influence,Bash/Shell/PowerShell;JavaScript;Python;SQL,Go;JavaScript;Python;TypeScript,PostgreSQL;SQLite,Couchbase;MongoDB;PostgreSQL;Redis;SQLite,AWS;Docker;Linux;MacOS;Slack,AWS;Docker;Google Cloud Platform;Linux;MacOS,Express;Flask;React.js,Flask;React.js,Node.js;Pandas;TensorFlow;Torch/PyTorch,Node.js;Torch/PyTorch,PyCharm;Vim;Visual Studio Code,MacOS,Development;Testing;Production,Not at all,A passing fad,No,Also Yes,Yes,Reddit,In real life (in person),Username,I don't remember,Daily or almost daily,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,0-10 minutes,Yes,I have never participated in Q&A on Stack Over...,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,Courses on technologies you're interested in,40.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Easy
770,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","A humanities discipline (ex. literature, histo...",Taken an online course in programming or softw...,"1,000 to 4,999 employees","Developer, full-stack",5,17,3,Very satisfied,Very satisfied,Somewhat confident,No,Not sure,I am not interested in new job opportunities,1-2 years ago,Interview with people in peer roles;Interview ...,No,Office environment or company culture;Remote w...,"Something else changed (education, award, medi...",USD,United States dollar,68000.0,Weekly,2000000.0,38.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Non-wor...,All or almost all the time (I'm full-time remote),Home,Average,"Yes, because I see value in code review",6.0,"Yes, it's part of our process",Developers and management have nearly equal in...,I have little or no influence,Go;HTML/CSS;JavaScript;Python;Ruby;SQL,Go;HTML/CSS;Python;Ruby;TypeScript;WebAssembly,MariaDB;MySQL;SQLite,Elasticsearch;MariaDB;PostgreSQL,AWS;Docker;Linux;Raspberry Pi;Slack,Arduino;AWS;Docker;Linux;Raspberry Pi;Slack,Angular/Angular.js;jQuery;Ruby on Rails,Angular/Angular.js;Django;Ruby on Rails,Node.js,Apache Spark;TensorFlow;Torch/PyTorch,PyCharm;RubyMine;Sublime Text,Linux-based,Development;Testing,Non-currency applications of blockchain,,No,SIGH,Yes,Reddit,Online,Username,2011,Daily or almost daily,Find answers to specific questions;Learn how t...,1-2 times per week,Stack Overflow was slightly faster,31-60 minutes,Yes,I have never participated in Q&A on Stack Over...,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,29.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
789,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,"Yes, full-time","Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,"10,000 or more employees","Developer, back-end;Engineer, site reliability",12,9,6,Slightly dissatisfied,Neither satisfied nor dissatisfied,Not at all confident,Yes,Not sure,"I’m not actively looking, but I am open to new...",3-4 years ago,"Write any code;Write code by hand (e.g., on a ...",No,Specific department or team I'd be working on;...,I had a negative experience or interaction at ...,USD,United States dollar,180000.0,Monthly,2000000.0,40.0,There is a schedule and/or spec (made by me or...,Lack of support from management;Toxic work env...,A few days each month,Office,A little above average,"Yes, because I see value in code review",2.0,"Yes, it's not part of our process but the deve...",Developers and management have nearly equal in...,I have little or no influence,C#;F#;Python,F#;Go;Python;Rust,Elasticsearch;Other(s):,Other(s):,Docker;Microsoft Azure,Docker,ASP.NET,,.NET;.NET Core,.NET Core;TensorFlow,IntelliJ;Notepad++;Visual Studio;Visual Studio...,MacOS,Development;Testing;Production,Non-currency applications of blockchain,An irresponsible use of resources,No,"Fortunately, someone else has that title",Yes,Facebook,Online,Handle,2009,Less than once per month or monthly,Find answers to specific questions,Less than once per week,Stack Overflow was slightly faster,31-60 minutes,Yes,I have never participated in Q&A on Stack Over...,"No, I knew that Stack Overflow had a job board...","No, I've heard of them, but I am not part of a...","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Cour...,31.0,Woman,No,Bisexual,South Asian,No,Appropriate in length,Neither easy nor difficult
1232,I am a developer by profession,No,Never,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,2-9 employees,"Developer, front-end;Developer, game or graphi...",4,15,2,Slightly satisfied,Slightly dissatisfied,Somewhat confident,Yes,Yes,"I’m not actively looking, but I am open to new...",1-2 years ago,"Write code by hand (e.g., on a whiteboard);Sol...",No,Industry that I'd be working in;Specific depar...,"Something else changed (education, award, medi...",USD,United States dollar,800000.0,Weekly,2000000.0,40.0,There is a schedule and/or spec (made by me or...,Distracting work environment;Non-work commitme...,A few days each month,Office,A little below average,"Yes, because I see value in code review",5.0,"No, but I think we should",Developers and management have nearly equal in...,I have little or no influence,HTML/CSS;JavaScript;SQL,Java;Kotlin;Objective-C;Python;Swift;TypeScrip...,MySQL,MongoDB,,Android;AWS;iOS;MacOS;Windows,React.js,React.js,Node.js;Other(s):,Node.js;Other(s):,Visual Studio Code,MacOS,I do not use containers,Non-currency applications of blockchain,Useful across many domains and could change ma...,Yes,Yes,What?,WeChat 微信,In real life (in person),Username,2014,A few times per week,Find answers to specific questions,1-2 times per week,Stack Overflow was slightly faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers,25.0,Man,No,Straight / Heterosexual,East Asian,No,Appropriate in length,Neither easy nor difficult


In [125]:
df['ConvertedComp'].nsmallest(10)

Respondent
280     0.0
293     0.0
722     0.0
1105    0.0
1501    0.0
1685    0.0
1782    0.0
2019    0.0
2095    0.0
2665    0.0
Name: ConvertedComp, dtype: float64

Get the smallest values in the column.

<br>
<hr>
<a id='section_8'></a>

### Sections 8 - Grouping and Aggregating - Analyzing and Exploring Your Data
- <a href="https://youtu.be/txMdrV1Ut64">YouTube Tutorial</a>

In [126]:
df['ConvertedComp'].median()

57287.0

Gets the median for the column ignoring NaN

In [127]:
df.median()

CompTotal        62000.0
ConvertedComp    57287.0
WorkWeekHrs         40.0
CodeRevHrs           4.0
Age                 29.0
dtype: float64

Goes through the entire DataFrame and find the columns that can have a median calculated.

In [128]:
df.describe()

Unnamed: 0,CompTotal,ConvertedComp,WorkWeekHrs,CodeRevHrs,Age
count,55945.0,55823.0,64503.0,49790.0,79210.0
mean,551901400000.0,127110.7,42.127197,5.084308,30.336699
std,73319260000000.0,284152.3,37.28761,5.513931,9.17839
min,0.0,0.0,1.0,0.0,1.0
25%,20000.0,25777.5,40.0,2.0,24.0
50%,62000.0,57287.0,40.0,4.0,29.0
75%,120000.0,100000.0,44.75,6.0,35.0
max,1e+16,2000000.0,4850.0,99.0,99.0


More indepth statistics about the columns.

In [129]:
df['Hobbyist'].value_counts()

Yes    71257
No     17626
Name: Hobbyist, dtype: int64

A count of how many individual values in the column.

In [130]:
df['SocialMedia'].value_counts()

Reddit                      14374
YouTube                     13830
WhatsApp                    13347
Facebook                    13178
Twitter                     11398
Instagram                    6261
I don't use social media     5554
LinkedIn                     4501
WeChat 微信                     667
Snapchat                      628
VK ВКонта́кте                 603
Weibo 新浪微博                     56
Youku Tudou 优酷                 21
Hello                          19
Name: SocialMedia, dtype: int64

In [131]:
df['SocialMedia'].value_counts(normalize=True)

Reddit                      0.170233
YouTube                     0.163791
WhatsApp                    0.158071
Facebook                    0.156069
Twitter                     0.134988
Instagram                   0.074150
I don't use social media    0.065777
LinkedIn                    0.053306
WeChat 微信                   0.007899
Snapchat                    0.007437
VK ВКонта́кте               0.007141
Weibo 新浪微博                  0.000663
Youku Tudou 优酷              0.000249
Hello                       0.000225
Name: SocialMedia, dtype: float64

This line get the percentage of individual field in a column.

In [132]:
df['Country']

Respondent
1                United Kingdom
2        Bosnia and Herzegovina
3                      Thailand
4                 United States
5                       Ukraine
                  ...          
88377                    Canada
88601                       NaN
88802                       NaN
88816                       NaN
88863                     Spain
Name: Country, Length: 88883, dtype: object

In [133]:
df['Country'].value_counts()

United States                       20949
India                                9061
Germany                              5866
United Kingdom                       5737
Canada                               3395
                                    ...  
Tonga                                   1
Niger                                   1
Saint Vincent and the Grenadines        1
North Korea                             1
Chad                                    1
Name: Country, Length: 179, dtype: int64

In [134]:
country_grp = df.groupby(['Country'])

In [135]:
country_grp.get_group('United States')

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,OrgSize,DevType,YearsCode,Age1stCode,YearsCodePro,CareerSat,JobSat,MgrIdiot,MgrMoney,MgrWant,JobSeek,LastHireDate,LastInt,FizzBuzz,JobFactors,ResumeUpdate,CurrencySymbol,CurrencyDesc,CompTotal,CompFreq,ConvertedComp,WorkWeekHrs,WorkPlan,WorkChallenge,WorkRemote,WorkLoc,ImpSyn,CodeRev,CodeRevHrs,UnitTests,PurchaseHow,PurchaseWhat,LanguageWorkedWith,LanguageDesireNextYear,DatabaseWorkedWith,DatabaseDesireNextYear,PlatformWorkedWith,PlatformDesireNextYear,WebFrameWorkedWith,WebFrameDesireNextYear,MiscTechWorkedWith,MiscTechDesireNextYear,DevEnviron,OpSys,Containers,BlockchainOrg,BlockchainIs,BetterLife,ITperson,OffOn,SocialMedia,Extraversion,ScreenName,SOVisit1st,SOVisitFreq,SOVisitTo,SOFindAnswer,SOTimeSaved,SOHowMuchTime,SOAccount,SOPartFreq,SOJobs,EntTeams,SOComm,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,100 to 499 employees,"Developer, full-stack",3,16,Less than 1 year,Very satisfied,Slightly satisfied,Very confident,No,Not sure,I am not interested in new job opportunities,Less than a year ago,"Write code by hand (e.g., on a whiteboard);Int...",No,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,61000.0,Yearly,61000.0,80.0,There's no schedule or spec; I work on what se...,,Less than once per month / Never,Home,A little below average,No,,"No, but I think we should",Developers typically have the most influence o...,I have little or no influence,C;C++;C#;Python;SQL,C;C#;JavaScript;SQL,MySQL;SQLite,MySQL;SQLite,Linux;Windows,Linux;Windows,,,.NET,.NET,Eclipse;Vim;Visual Studio;Visual Studio Code,Windows,I do not use containers,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,SIGH,Yes,Reddit,In real life (in person),Username,2014,Daily or almost daily,Find answers to specific questions;Pass the ti...,1-2 times per week,Stack Overflow was much faster,31-60 minutes,Yes,Less than once per month or monthly,Yes,"No, and I don't know what those are","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
13,I am a developer by profession,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,United States,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,10 to 19 employees,Data or business analyst;Database administrato...,17,11,8,Very satisfied,Very satisfied,,,,I am not interested in new job opportunities,3-4 years ago,Complete a take-home project;Interview with pe...,Yes,"Languages, frameworks, and other technologies ...",I was preparing for a job search,USD,United States dollar,90000.0,Yearly,90000.0,40.0,There is a schedule and/or spec (made by me or...,"Meetings;Non-work commitments (parenting, scho...",All or almost all the time (I'm full-time remote),Home,A little above average,"Yes, because I see value in code review",5.0,"No, but I think we should",Developers and management have nearly equal in...,I have a great deal of influence,Bash/Shell/PowerShell;HTML/CSS;JavaScript;PHP;...,Bash/Shell/PowerShell;HTML/CSS;JavaScript;Rust...,Couchbase;DynamoDB;Firebase;MySQL,Firebase;MySQL;Redis,Android;AWS;Docker;IBM Cloud or Watson;iOS;Lin...,Android;AWS;Docker;IBM Cloud or Watson;Linux;S...,Angular/Angular.js;ASP.NET;Express;jQuery;Vue.js,Express;Vue.js,Node.js;Xamarin,Node.js;TensorFlow,Vim;Visual Studio;Visual Studio Code;Xcode,Windows,Development;Testing;Production,Not at all,"Useful for decentralized currency (i.e., Bitcoin)",Yes,Yes,Yes,Twitter,In real life (in person),Username,2011,Multiple times per day,Find answers to specific questions,More than 10 times per week,Stack Overflow was much faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Somewhat more welcome now than last year,Tech articles written by other developers;Cour...,28.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Easy
22,I am a developer by profession,Yes,Less than once per year,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,United States,No,Some college/university study without earning ...,,Taken an online course in programming or softw...,"10,000 or more employees","Data or business analyst;Designer;Developer, b...",35,12,18,Slightly satisfied,Very dissatisfied,Somewhat confident,No,No,"I’m not actively looking, but I am open to new...",More than 4 years ago,Interview with people in senior / management r...,No,Industry that I'd be working in;Financial perf...,I had a negative experience or interaction at ...,USD,United States dollar,103000.0,Yearly,103000.0,40.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Meeting...,"Less than half the time, but at least one day ...",Home,Average,No,,"No, but I think we should","The CTO, CIO, or other management purchase new...",I have little or no influence,Bash/Shell/PowerShell;C++;HTML/CSS;JavaScript;...,Bash/Shell/PowerShell;C++;HTML/CSS;JavaScript;...,Elasticsearch;MySQL;Oracle;Redis,Elasticsearch;MySQL;Oracle;Redis,Docker;Linux;Raspberry Pi;Windows,Docker;Linux;Raspberry Pi;Windows,Angular/Angular.js;Ruby on Rails,Angular/Angular.js;Ruby on Rails,Node.js,Node.js,Sublime Text;Visual Studio;Visual Studio Code,Windows,"Outside of work, for personal projects",Not at all,,Yes,Yes,Yes,Instagram,Online,Username,I don't remember,Daily or almost daily,Find answers to specific questions,3-5 times per week,Stack Overflow was much faster,0-10 minutes,Yes,A few times per week,Yes,"No, and I don't know what those are","Yes, somewhat",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,47.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Easy
23,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Information systems, information technology, o...",Taken an online course in programming or softw...,"10,000 or more employees","Developer, full-stack",3,19,1,Slightly satisfied,Slightly satisfied,Very confident,No,Not sure,"I’m not actively looking, but I am open to new...",Less than a year ago,"Write any code;Write code by hand (e.g., on a ...",No,Opportunities for professional development;How...,I was preparing for a job search,USD,United States dollar,69000.0,Yearly,69000.0,40.0,There is a schedule and/or spec (made by me or...,Distracting work environment;Meetings;Non-work...,A few days each month,Office,Average,"Yes, because I see value in code review",8.0,"Yes, it's part of our process",Developers and management have nearly equal in...,I have little or no influence,Bash/Shell/PowerShell;HTML/CSS;JavaScript;Pyth...,Bash/Shell/PowerShell;Go;HTML/CSS;Java;JavaScr...,Oracle;SQLite,Couchbase;DynamoDB;Elasticsearch;Firebase;Oracle,Docker;Google Cloud Platform,Docker;iOS;Slack,React.js;Ruby on Rails,Express;React.js;Ruby on Rails;Vue.js,,React Native;TensorFlow,Visual Studio Code,MacOS,Development;Testing;Production,,Useful for immutable record keeping outside of...,Yes,SIGH,Yes,Reddit,In real life (in person),Username,2014,Multiple times per day,Find answers to specific questions;Learn how t...,6-10 times per week,They were about the same,,Yes,I have never participated in Q&A on Stack Over...,Yes,"No, I've heard of them, but I am not part of a...","No, not really",Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,22.0,Man,No,Straight / Heterosexual,Black or of African descent,No,Appropriate in length,Easy
26,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,United States,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...","Taught yourself a new language, framework, or ...","10,000 or more employees","Designer;Developer, back-end;Developer, deskto...",12,8,8,Very satisfied,Very satisfied,,,,"I’m not actively looking, but I am open to new...",Less than a year ago,Interview with people in peer roles;Interview ...,No,Remote work options;Diversity of the company o...,I was preparing for a job search,USD,United States dollar,114000.0,Yearly,114000.0,40.0,There is a schedule and/or spec (made by me or...,Being tasked with non-development work;Meeting...,"Less than half the time, but at least one day ...",Home,Far above average,"Yes, because I see value in code review",2.0,"Yes, it's not part of our process but the deve...",Developers typically have the most influence o...,I have a great deal of influence,Bash/Shell/PowerShell;C++;C#;HTML/CSS;JavaScri...,C#;HTML/CSS;JavaScript;Objective-C;Ruby;SQL;Sw...,Microsoft SQL Server;MySQL;Redis;SQLite,Microsoft SQL Server;MySQL;Redis;SQLite,AWS;Docker;Linux;MacOS;Microsoft Azure;Windows...,Android;Docker;iOS;Linux;MacOS;Microsoft Azure...,Angular/Angular.js;ASP.NET;Drupal;Express;jQue...,Angular/Angular.js;ASP.NET,.NET;.NET Core;Node.js;Xamarin,.NET;.NET Core;Node.js,Notepad++;Sublime Text;Vim;Visual Studio;Xcode,MacOS,Development;Testing,Not at all,A passing fad,Yes,SIGH,Yes,I don't use social media,In real life (in person),Username,2008,Daily or almost daily,Find answers to specific questions;Learn how t...,3-5 times per week,Stack Overflow was much faster,11-30 minutes,Yes,Less than once per month or monthly,Yes,"No, I've heard of them, but I am not part of a...",Neutral,Just as welcome now as I felt last year,,34.0,Man,No,Gay or Lesbian,,No,Appropriate in length,Easy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
78292,,No,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...","Independent contractor, freelancer, or self-em...",United States,No,"Other doctoral degree (Ph.D, Ed.D., etc.)","A health science (ex. nursing, pharmacy, radio...",Completed an industry certification program (e...,"Just me - I am a freelancer, sole proprietor, ...",Academic researcher,42,14,31,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bash/Shell/PowerShell;C;Python,Bash/Shell/PowerShell;C;Python,SQLite,SQLite,Linux;Raspberry Pi;Other(s):,Linux;Raspberry Pi;Other(s):,,,Chef,,Emacs;IPython / Jupyter,Linux-based,I do not use containers,,Useful for immutable record keeping outside of...,No,Yes,Yes,I don't use social media,In real life (in person),,2013,A few times per week,Find answers to specific questions,Less than once per week,The other resource was slightly faster,11-30 minutes,Not sure / can't remember,,"No, I didn't know that Stack Overflow had a jo...","No, and I don't know what those are","No, not really",Somewhat less welcome now than last year,,60.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Too long,Neither easy nor difficult
82717,,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",United States,No,"Secondary school (e.g. American high school, G...",,,,,Less than 1 year,,Less than 1 year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Android;Windows,Android;Microsoft Azure;Windows,,,,,,MacOS,Testing,,,No,SIGH,Yes,Facebook,In real life (in person),Username,2018,Less than once per month or monthly,Find answers to specific questions,Less than once per week,,60+ minutes,No,,"No, I knew that Stack Overflow had a job board...","No, I've heard of them, but I am not part of a...",Not sure,,Industry news about technologies you're intere...,44.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Neither easy nor difficult
83397,,Yes,Less than once per year,,"Not employed, but looking for work",United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,,,12,9,Less than 1 year,,,,,,,,,,,,,,,,,,,,,,,,,,,,HTML/CSS;JavaScript;Python;SQL,C;C++;C#;Go;Java;JavaScript;Python;R;Ruby;SQL;...,,,Android;Arduino;Slack,Android;Arduino;Docker;iOS;Raspberry Pi;Slack,Flask,Django;Drupal;Flask;jQuery;React.js,,Chef;Torch/PyTorch,Eclipse;IPython / Jupyter;Sublime Text,MacOS,I do not use containers,,,,SIGH,Yes,,,Handle,I don't remember,A few times per week,Find answers to specific questions;Learn how t...,3-5 times per week,They were about the same,,Not sure / can't remember,,Yes,"No, and I don't know what those are","No, not at all",Just as welcome now as I felt last year,,27.0,Woman,No,Bisexual,White or of European descent,No,Appropriate in length,Easy
85642,,No,Less than once per year,"OSS is, on average, of LOWER quality than prop...","Independent contractor, freelancer, or self-em...",United States,No,Associate degree,"Information systems, information technology, o...",Taken an online course in programming or softw...,"Just me - I am a freelancer, sole proprietor, ...",Designer;Marketing or sales professional,20,7,Less than 1 year,,,,,,,,,,,,,,,,,,,,,,,,,,,,Go;HTML/CSS,,,,,,,,,,Visual Studio Code,Windows,I do not use containers,,Useful for immutable record keeping outside of...,No,SIGH,Yes,,In real life (in person),Handle,2008,Less than once per month or monthly,Find answers to specific questions,Less than once per week,Stack Overflow was slightly faster,60+ minutes,Yes,I have never participated in Q&A on Stack Over...,"No, I knew that Stack Overflow had a job board...","No, and I don't know what those are","No, not at all",Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,34.0,"Non-binary, genderqueer, or gender non-conforming",,Bisexual;Gay or Lesbian,White or of European descent,No,Appropriate in length,Easy


In [136]:
country_grp.get_group('United States').median()

CompTotal        102000.0
ConvertedComp    110000.0
WorkWeekHrs          40.0
CodeRevHrs            4.0
Age                  31.0
dtype: float64

In [137]:
country_grp['SocialMedia'].value_counts().loc['United States']

SocialMedia
Reddit                      5700
Twitter                     3468
Facebook                    2844
YouTube                     2463
I don't use social media    1851
Instagram                   1652
LinkedIn                    1020
WhatsApp                     609
Snapchat                     326
WeChat 微信                     93
VK ВКонта́кте                  9
Weibo 新浪微博                     8
Hello                          2
Youku Tudou 优酷                 1
Name: SocialMedia, dtype: int64

In [138]:
country_grp['ConvertedComp'].median()

Country
Afghanistan                               6222.0
Albania                                  10818.0
Algeria                                   7878.0
Andorra                                 160931.0
Angola                                    7764.0
                                          ...   
Venezuela, Bolivarian Republic of...      6384.0
Viet Nam                                 11892.0
Yemen                                    11940.0
Zambia                                    5040.0
Zimbabwe                                 19200.0
Name: ConvertedComp, Length: 179, dtype: float64

In [139]:
country_grp['ConvertedComp'].median().loc['United States']

110000.0

In [140]:
country_grp['ConvertedComp'].agg(['median','mean']).loc[['United States', 'Canada']]

Unnamed: 0_level_0,median,mean
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
United States,110000.0,249546.254589
Canada,68705.0,134018.564909


In [141]:
country_grp['ConvertedComp'].agg(['median','mean']).loc['United States']

median    110000.000000
mean      249546.254589
Name: United States, dtype: float64

In [142]:
filt = df['Country'] == 'United States'
df.loc[filt]['LanguageWorkedWith'].str.contains('Python').sum()

10083

Find out how many people in each country know how to use Python. 

In [143]:
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())

Country
Afghanistan                              8
Albania                                 23
Algeria                                 40
Andorra                                  0
Angola                                   2
                                        ..
Venezuela, Bolivarian Republic of...    28
Viet Nam                                78
Yemen                                    3
Zambia                                   4
Zimbabwe                                14
Name: LanguageWorkedWith, Length: 179, dtype: int64

This is the same as above but grouped by country.

In [144]:
country_resp = df['Country'].value_counts()
country_resp

United States                       20949
India                                9061
Germany                              5866
United Kingdom                       5737
Canada                               3395
                                    ...  
Tonga                                   1
Niger                                   1
Saint Vincent and the Grenadines        1
North Korea                             1
Chad                                    1
Name: Country, Length: 179, dtype: int64

In [145]:
country_python_yes = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
python_per = (country_python_yes/country_resp)*100
country_python_yes

Country
Afghanistan                              8
Albania                                 23
Algeria                                 40
Andorra                                  0
Angola                                   2
                                        ..
Venezuela, Bolivarian Republic of...    28
Viet Nam                                78
Yemen                                    3
Zambia                                   4
Zimbabwe                                14
Name: LanguageWorkedWith, Length: 179, dtype: int64

In [146]:
python_df = pd.concat([country_resp,country_python_yes,python_per],axis='columns')
python_df

Unnamed: 0,Country,LanguageWorkedWith,0
United States,20949,10083,48.131176
India,9061,3105,34.267741
Germany,5866,2451,41.783157
United Kingdom,5737,2384,41.554820
Canada,3395,1558,45.891016
...,...,...,...
Tonga,1,0,0.000000
Niger,1,1,100.000000
Saint Vincent and the Grenadines,1,0,0.000000
North Korea,1,0,0.000000


In [147]:
python_df.rename(columns={'Country':'N_Respon','LanguageWorkedWith':'Know_Python',0:'Percent'},inplace=True)
python_df.sort_values(by=['N_Respon','Percent'],ascending=[False,False]).head(10)

Unnamed: 0,N_Respon,Know_Python,Percent
United States,20949,10083,48.131176
India,9061,3105,34.267741
Germany,5866,2451,41.783157
United Kingdom,5737,2384,41.55482
Canada,3395,1558,45.891016
France,2391,1054,44.081974
Brazil,1948,767,39.373717
Poland,1922,751,39.073881
Australia,1903,790,41.5134
Netherlands,1852,767,41.414687


Get the percent of developers who know python by country.

In [148]:
python_df.mean()

N_Respon       495.815642
Know_Python    203.592179
Percent         35.750651
dtype: float64

<br>
<hr>
<a id='section_9'></a>

### Section 9 - Cleaning Data - Casting Datatypes and Handling Missing Values
- <a href="https://youtu.be/KdmPHEnPJPs">YouTube Tutorial</a>

In [152]:
import numpy as np
people = {
    'first':['Corey', 'Jane', 'John', 'Chris', np.nan, None, 'NA'],
    'last':['Schafer','Doe','Doe','Schafer', np.nan, np.nan, 'Missing'],
    'email':['CoreyMSchafer@gmail.com','JaneDoe@email.com','JohnDoe@email.com',None,np.nan,'Anonymous@email.com','NA'],
    'age':['33','55','63','36',None,None,'Missing']
}
df2 = pd.DataFrame(people)

In [153]:
df2

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33
1,Jane,Doe,JaneDoe@email.com,55
2,John,Doe,JohnDoe@email.com,63
3,Chris,Schafer,,36
4,,,,
5,,,Anonymous@email.com,
6,,Missing,,Missing


In [154]:
df2.dropna()

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33
1,Jane,Doe,JaneDoe@email.com,55
2,John,Doe,JohnDoe@email.com,63
6,,Missing,,Missing


In [155]:
df2.dropna(axis='index',how='any')

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33
1,Jane,Doe,JaneDoe@email.com,55
2,John,Doe,JohnDoe@email.com,63
6,,Missing,,Missing


This is the default arguments for <b>.dropna()</b>. Index mean it will drop any row that has missing values. If it was set to "columns", it would drop any column that had missing values. In the how argument, any will drop if any field in the row/column has a missing value. If we change it to all, it will drop if the entire row/column has missing values.

In [156]:
df2.dropna(axis='index',how='any', subset=['email'])

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33
1,Jane,Doe,JaneDoe@email.com,55
2,John,Doe,JohnDoe@email.com,63
5,,,Anonymous@email.com,
6,,Missing,,Missing


This removed any row that has missing value on email.

In [157]:
df2.dropna(axis='index',how='all', subset=['last','email'])

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33
1,Jane,Doe,JaneDoe@email.com,55
2,John,Doe,JohnDoe@email.com,63
3,Chris,Schafer,,36
5,,,Anonymous@email.com,
6,,Missing,,Missing


Removed rows that are missing last or email.

In [159]:
df2.replace('NA',np.nan, inplace=True)
df2.replace('Missing', np.nan, inplace=True)

In [160]:
df2

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33.0
1,Jane,Doe,JaneDoe@email.com,55.0
2,John,Doe,JohnDoe@email.com,63.0
3,Chris,Schafer,,36.0
4,,,,
5,,,Anonymous@email.com,
6,,,,


In [161]:
df2.dropna(axis='index',how='any')

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33
1,Jane,Doe,JaneDoe@email.com,55
2,John,Doe,JohnDoe@email.com,63


In [163]:
df2.isna()

Unnamed: 0,first,last,email,age
0,False,False,False,False
1,False,False,False,False
2,False,False,False,False
3,False,False,True,False
4,True,True,True,True
5,True,True,False,True
6,True,True,True,True


In [165]:
df2.fillna('0')

Unnamed: 0,first,last,email,age
0,Corey,Schafer,CoreyMSchafer@gmail.com,33
1,Jane,Doe,JaneDoe@email.com,55
2,John,Doe,JohnDoe@email.com,63
3,Chris,Schafer,0,36
4,0,0,0,0
5,0,0,Anonymous@email.com,0
6,0,0,0,0


In [166]:
df2.dtypes

first    object
last     object
email    object
age      object
dtype: object

Check data type for the DataFrame. 

In [170]:
df2['age'] = df2['age'].astype(float)

In [173]:
df2.dtypes

first     object
last      object
email     object
age      float64
dtype: object

In [174]:
df2['age'].mean()

46.75

Change the data type for a DataFrame.

In [175]:
na_vals = ['NA','Missing']
df = pd.read_csv("developer_survey_2019/survey_results_public.csv", na_values=na_vals)

This is how you replace values on input. Any NA or Missing will be replaced with NaN

In [177]:
df['YearsCode'].head(10)

0      4
1    NaN
2      3
3      3
4     16
5     13
6      6
7      8
8     12
9     12
Name: YearsCode, dtype: object

In [181]:
df['YearsCode'].unique()

array(['4', nan, '3', '16', '13', '6', '8', '12', '2', '5', '17', '10',
       '14', '35', '7', 'Less than 1 year', '30', '9', '26', '40', '19',
       '15', '20', '28', '25', '1', '22', '11', '33', '50', '41', '18',
       '34', '24', '23', '42', '27', '21', '36', '32', '39', '38', '31',
       '37', 'More than 50 years', '29', '44', '45', '48', '46', '43',
       '47', '49'], dtype=object)

In [182]:
df['YearsCode'].replace('Less than 1 year', 0, inplace=True)
df['YearsCode'].replace('More than 50 years', 51, inplace=True)

In [184]:
df['YearsCode'].astype(float).mean()

11.662114216834588

We looked for all unique values in the column and replaced both string with numbers. Lastly, calculate the mean of the column by changing the data type to float.

<br>
<hr>
<a id='section_10'></a>

### Section 10 - Working with Dates and Series Data
- <a href="https://youtu.be/UFuo7EHI8zc">YouTube Tutorial</a>

In [192]:
df3 = pd.read_csv('ETH_1h.csv')
df3

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume
0,2020-03-13 08-PM,ETHUSD,129.94,131.82,126.87,128.71,1940673.93
1,2020-03-13 07-PM,ETHUSD,119.51,132.02,117.10,129.94,7579741.09
2,2020-03-13 06-PM,ETHUSD,124.47,124.85,115.50,119.51,4898735.81
3,2020-03-13 05-PM,ETHUSD,124.08,127.42,121.63,124.47,2753450.92
4,2020-03-13 04-PM,ETHUSD,124.85,129.51,120.17,124.08,4461424.71
...,...,...,...,...,...,...,...
23669,2017-07-01 03-PM,ETHUSD,265.74,272.74,265.00,272.57,1500282.55
23670,2017-07-01 02-PM,ETHUSD,268.79,269.90,265.00,265.74,1702536.85
23671,2017-07-01 01-PM,ETHUSD,274.83,274.93,265.00,268.79,3010787.99
23672,2017-07-01 12-PM,ETHUSD,275.01,275.01,271.00,274.83,824362.87


In [193]:
df3.shape

(23674, 7)

In [194]:
df3.dtypes

Date       object
Symbol     object
Open      float64
High      float64
Low       float64
Close     float64
Volume    float64
dtype: object

In [188]:
#df3['date'] = pd.to_datetime(df['Date'])

This line only works if the datetime is formated in a standard format.

In [196]:
df3['Date'] = pd.to_datetime(df3['Date'], format='%Y-%m-%d %I-%p')
df3

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume
0,2020-03-13 20:00:00,ETHUSD,129.94,131.82,126.87,128.71,1940673.93
1,2020-03-13 19:00:00,ETHUSD,119.51,132.02,117.10,129.94,7579741.09
2,2020-03-13 18:00:00,ETHUSD,124.47,124.85,115.50,119.51,4898735.81
3,2020-03-13 17:00:00,ETHUSD,124.08,127.42,121.63,124.47,2753450.92
4,2020-03-13 16:00:00,ETHUSD,124.85,129.51,120.17,124.08,4461424.71
...,...,...,...,...,...,...,...
23669,2017-07-01 15:00:00,ETHUSD,265.74,272.74,265.00,272.57,1500282.55
23670,2017-07-01 14:00:00,ETHUSD,268.79,269.90,265.00,265.74,1702536.85
23671,2017-07-01 13:00:00,ETHUSD,274.83,274.93,265.00,268.79,3010787.99
23672,2017-07-01 12:00:00,ETHUSD,275.01,275.01,271.00,274.83,824362.87


In [198]:
df3.loc[0, 'Date'].day_name()

'Friday'

In [201]:
d_parser = lambda x: pd.datetime.strptime(x,'%Y-%m-%d %I-%p')
df3 = pd.read_csv('ETH_1h.csv', parse_dates=['Date'], date_parser=d_parser)
df3['Date']

  d_parser = lambda x: pd.datetime.strptime(x,'%Y-%m-%d %I-%p')


0       2020-03-13 20:00:00
1       2020-03-13 19:00:00
2       2020-03-13 18:00:00
3       2020-03-13 17:00:00
4       2020-03-13 16:00:00
                ...        
23669   2017-07-01 15:00:00
23670   2017-07-01 14:00:00
23671   2017-07-01 13:00:00
23672   2017-07-01 12:00:00
23673   2017-07-01 11:00:00
Name: Date, Length: 23674, dtype: datetime64[ns]

Changing the data type to date time on import.

In [203]:
df3['Date'].dt.day_name()

0          Friday
1          Friday
2          Friday
3          Friday
4          Friday
           ...   
23669    Saturday
23670    Saturday
23671    Saturday
23672    Saturday
23673    Saturday
Name: Date, Length: 23674, dtype: object

In [205]:
df3['DayOfWeek'] = df3['Date'].dt.day_name()
df3

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume,DayOfWeek
0,2020-03-13 20:00:00,ETHUSD,129.94,131.82,126.87,128.71,1940673.93,Friday
1,2020-03-13 19:00:00,ETHUSD,119.51,132.02,117.10,129.94,7579741.09,Friday
2,2020-03-13 18:00:00,ETHUSD,124.47,124.85,115.50,119.51,4898735.81,Friday
3,2020-03-13 17:00:00,ETHUSD,124.08,127.42,121.63,124.47,2753450.92,Friday
4,2020-03-13 16:00:00,ETHUSD,124.85,129.51,120.17,124.08,4461424.71,Friday
...,...,...,...,...,...,...,...,...
23669,2017-07-01 15:00:00,ETHUSD,265.74,272.74,265.00,272.57,1500282.55,Saturday
23670,2017-07-01 14:00:00,ETHUSD,268.79,269.90,265.00,265.74,1702536.85,Saturday
23671,2017-07-01 13:00:00,ETHUSD,274.83,274.93,265.00,268.79,3010787.99,Saturday
23672,2017-07-01 12:00:00,ETHUSD,275.01,275.01,271.00,274.83,824362.87,Saturday


In [206]:
df3['Date'].min()

Timestamp('2017-07-01 11:00:00')

In [207]:
df3['Date'].max()

Timestamp('2020-03-13 20:00:00')

In [208]:
df3['Date'].max() - df3['Date'].min()

Timedelta('986 days 09:00:00')

Date difference between the minimum and maximum date.

In [212]:
filt = (df3['Date'] >= '2019') & (df3['Date'] < '2020')
df3.loc[filt]

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume,DayOfWeek
1749,2019-12-31 23:00:00,ETHUSD,128.33,128.69,128.14,128.54,440678.91,Tuesday
1750,2019-12-31 22:00:00,ETHUSD,128.38,128.69,127.95,128.33,554646.02,Tuesday
1751,2019-12-31 21:00:00,ETHUSD,127.86,128.43,127.72,128.38,350155.69,Tuesday
1752,2019-12-31 20:00:00,ETHUSD,127.84,128.34,127.71,127.86,428183.38,Tuesday
1753,2019-12-31 19:00:00,ETHUSD,128.69,128.69,127.60,127.84,1169847.84,Tuesday
...,...,...,...,...,...,...,...,...
10504,2019-01-01 04:00:00,ETHUSD,130.75,133.96,130.74,131.96,2791135.37,Tuesday
10505,2019-01-01 03:00:00,ETHUSD,130.06,130.79,130.06,130.75,503732.63,Tuesday
10506,2019-01-01 02:00:00,ETHUSD,130.79,130.88,129.55,130.06,838183.43,Tuesday
10507,2019-01-01 01:00:00,ETHUSD,131.62,131.62,130.77,130.79,434917.99,Tuesday


All data for 2019.

In [221]:
df3.set_index('Date', inplace=True)

In [222]:
df3['High'].resample('D').max()

Date
2017-07-01    279.99
2017-07-02    293.73
2017-07-03    285.00
2017-07-04    282.83
2017-07-05    274.97
               ...  
2020-03-09    208.65
2020-03-10    206.28
2020-03-11    202.98
2020-03-12    195.64
2020-03-13    148.00
Freq: D, Name: High, Length: 987, dtype: float64

In [224]:
df3.resample('W').agg({'Close':'mean','High':'max','Low':'min','Volume':'sum'})

Unnamed: 0_level_0,Close,High,Low,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-07-02,268.202162,293.73,253.23,8.084631e+07
2017-07-09,261.062083,285.00,231.25,2.246746e+08
2017-07-16,195.698393,240.33,130.26,5.017750e+08
2017-07-23,212.783750,249.40,153.25,7.221637e+08
2017-07-30,203.309524,229.99,178.03,2.657305e+08
...,...,...,...,...
2020-02-16,255.198452,290.00,216.31,3.912867e+08
2020-02-23,265.321905,287.13,242.36,3.067838e+08
2020-03-01,236.373988,278.13,209.26,3.693920e+08
2020-03-08,229.817619,253.01,196.00,2.736569e+08


Aggregating values by week.

<br>
<hr>
<a id='section_11'></a>

### Reading/Writing Data to Different Sources - Excel, JSON, SQL, Etc
- <a href="https://youtu.be/N6hyN6BW6ao">YouTube Tutorial</a>