# Notes for YouTube Python Tutorials
## Python Pandas Tutorial
https://m.youtube.com/watch?v=ZyhVh-qRZPA&list=PL-osiE80TeTsWmV9i9c58mdDCSsklFdDS&index=2&t=247s

## Python Pandas Tutorial (Part 1): Getting Started with Data Analysis - Installation and Loading Data

Sample data was downloaded from the following link.<br>
https://insights.stackoverflow.com/survey

In [1]:
import pandas as pd


df = pd.read_csv('Python Data/developer_survey_2019/survey_results_public.csv')

In [2]:
df.shape # It is an attribute, not a method. So, don't need ()
# 88883 rows and 85 columns

(88883, 85)

In [3]:
df.info() # Shows all column names.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88883 entries, 0 to 88882
Data columns (total 85 columns):
Respondent                88883 non-null int64
MainBranch                88331 non-null object
Hobbyist                  88883 non-null object
OpenSourcer               88883 non-null object
OpenSource                86842 non-null object
Employment                87181 non-null object
Country                   88751 non-null object
Student                   87014 non-null object
EdLevel                   86390 non-null object
UndergradMajor            75614 non-null object
EduOther                  84260 non-null object
OrgSize                   71791 non-null object
DevType                   81335 non-null object
YearsCode                 87938 non-null object
Age1stCode                87634 non-null object
YearsCodePro              74331 non-null object
CareerSat                 72847 non-null object
JobSat                    70988 non-null object
MgrIdiot                  61

In [4]:
# Change the number of showing columns to 5
# If change the number to 85, which shows all columns in this case.
# Be default, it only shows 20 columns.
pd.set_option('display.max_columns', 5)
pd.set_option('display.max_rows', 10)
df

Unnamed: 0,Respondent,MainBranch,...,SurveyLength,SurveyEase
0,1,I am a student who is learning to code,...,Appropriate in length,Neither easy nor difficult
1,2,I am a student who is learning to code,...,Appropriate in length,Neither easy nor difficult
2,3,"I am not primarily a developer, but I write co...",...,Appropriate in length,Neither easy nor difficult
3,4,I am a developer by profession,...,Appropriate in length,Easy
4,5,I am a developer by profession,...,Appropriate in length,Easy
...,...,...,...,...,...
88878,88377,,...,Appropriate in length,Easy
88879,88601,,...,,
88880,88802,,...,,
88881,88816,,...,,


In [5]:
df.head() # By default, it shows the first 5 rows
# df.tail()

Unnamed: 0,Respondent,MainBranch,...,SurveyLength,SurveyEase
0,1,I am a student who is learning to code,...,Appropriate in length,Neither easy nor difficult
1,2,I am a student who is learning to code,...,Appropriate in length,Neither easy nor difficult
2,3,"I am not primarily a developer, but I write co...",...,Appropriate in length,Neither easy nor difficult
3,4,I am a developer by profession,...,Appropriate in length,Easy
4,5,I am a developer by profession,...,Appropriate in length,Easy


In [6]:
df.head(1)
# df.tail(1)

Unnamed: 0,Respondent,MainBranch,...,SurveyLength,SurveyEase
0,1,I am a student who is learning to code,...,Appropriate in length,Neither easy nor difficult


In [7]:
schema_df = pd.read_csv('Python Data/developer_survey_2019/survey_results_schema.csv')
schema_df

Unnamed: 0,Column,QuestionText
0,Respondent,Randomized respondent ID number (not in order ...
1,MainBranch,Which of the following options best describes ...
2,Hobbyist,Do you code as a hobby?
3,OpenSourcer,How often do you contribute to open source?
4,OpenSource,How do you feel about the quality of open sour...
...,...,...
80,Sexuality,Which of the following do you currently identi...
81,Ethnicity,Which of the following do you identify as? Ple...
82,Dependents,"Do you have any dependents (e.g., children, el..."
83,SurveyLength,How do you feel about the length of the survey...


## Python Pandas Tutorial (Part 2): DataFrame and Series Basics - Selecting Rows and Columns

### Small Sample Example

In [8]:
# Sample to change dict to dataframe
people = {
    "first": ["Corey", "Jane", "John"],
    "last": ["Schafer", "Doe", "Doe"],
    "email": ["CoreyMSchafer@gmail.com", "JaneDoe@email.com", "JohnDoe@email.com"]
}

# people_df = pd.DataFrame(people) # It may change the order of columns.
people_df = pd.DataFrame(people, columns=['first', 'last', 'email'])
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [9]:
people_df['email'] # Recommand

0    CoreyMSchafer@gmail.com
1          JaneDoe@email.com
2          JohnDoe@email.com
Name: email, dtype: object

In [10]:
# Same as the previous
people_df.email

0    CoreyMSchafer@gmail.com
1          JaneDoe@email.com
2          JohnDoe@email.com
Name: email, dtype: object

In [11]:
type(people_df['email']) # It is a series, which is a one dimensional array.
# dataframe is container for multiple of series objects.

pandas.core.series.Series

In [12]:
# Return multiple columns
people_df[['last', 'email']] # Has to have inner [].

Unnamed: 0,last,email
0,Schafer,CoreyMSchafer@gmail.com
1,Doe,JaneDoe@email.com
2,Doe,JohnDoe@email.com


In [13]:
people_df.columns

Index(['first', 'last', 'email'], dtype='object')

In [14]:
# iloc ==> access rows by interger location
people_df.iloc[0] # returns a series

first                      Corey
last                     Schafer
email    CoreyMSchafer@gmail.com
Name: 0, dtype: object

In [15]:
# Return multiple rows
people_df.iloc[[0, 1]]

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com


In [16]:
# Show selected rows and columns
# By using iloc, the parameters have to be index.
people_df.iloc[[0, 1], 2]

0    CoreyMSchafer@gmail.com
1          JaneDoe@email.com
Name: email, dtype: object

In [17]:
# loc is using labels.
# In this case, there is no labels for rows now.
people_df.loc[[0, 1], ['email', 'last']]

Unnamed: 0,email,last
0,CoreyMSchafer@gmail.com,Schafer
1,JaneDoe@email.com,Doe


### Real World Example

In [18]:
df['Hobbyist']
# df.loc[:, 'Hobbyist']

0        Yes
1         No
2        Yes
3         No
4        Yes
        ... 
88878    Yes
88879     No
88880     No
88881     No
88882    Yes
Name: Hobbyist, Length: 88883, dtype: object

In [19]:
# Count values
df['Hobbyist'].value_counts()

Yes    71257
No     17626
Name: Hobbyist, dtype: int64

In [20]:
# Show only one entry.
df.loc[0, 'Hobbyist']

'Yes'

In [21]:
df.loc[0:2, 'Hobbyist':'Employment'] # include both Hobbyist and Employment

Unnamed: 0,Hobbyist,OpenSourcer,OpenSource,Employment
0,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work"
1,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work"
2,Yes,Never,The quality of OSS and closed source software ...,Employed full-time


## Python Pandas Tutorial (Part 3): Indexes - How to Set, Reset, and Use Indexes

### Small Sample Example

In [22]:
# Set email as index. However, it doesn't really changed the dataframe.
people_df.set_index('email')
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [23]:
# Set index and modify dataframe
people_df.set_index('email', inplace=True)
people_df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
CoreyMSchafer@gmail.com,Corey,Schafer
JaneDoe@email.com,Jane,Doe
JohnDoe@email.com,John,Doe


In [24]:
people_df.index

Index(['CoreyMSchafer@gmail.com', 'JaneDoe@email.com', 'JohnDoe@email.com'], dtype='object', name='email')

In [25]:
# Use loc
people_df.loc['CoreyMSchafer@gmail.com', 'last']

'Schafer'

In [26]:
# iloc still works
people_df.iloc[0]

first      Corey
last     Schafer
Name: CoreyMSchafer@gmail.com, dtype: object

In [27]:
# Removed setted index
people_df.reset_index(inplace=True)
people_df # There is default indexes.

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe


### Real World Example

In [28]:
# Method 1: Read the data, then set the index as Respondent. (previous example for people_df)
# Method 2: When read data from csv, set the index direclty.
df = pd.read_csv('Python Data/developer_survey_2019/survey_results_public.csv',
                 index_col='Respondent')
df.head()

Unnamed: 0_level_0,MainBranch,Hobbyist,...,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,I am a student who is learning to code,Yes,...,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,No,...,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",Yes,...,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,No,...,Appropriate in length,Easy
5,I am a developer by profession,Yes,...,Appropriate in length,Easy


In [29]:
schema_df = pd.read_csv('Python Data/developer_survey_2019/survey_results_schema.csv',
                        index_col='Column')
schema_df.loc['Hobbyist']

QuestionText    Do you code as a hobby?
Name: Hobbyist, dtype: object

In [30]:
schema_df.loc['MgrIdiot', 'QuestionText'] # Add column name to show all text.

'How confident are you that your manager knows what they’re doing?'

In [31]:
# Sort by index
schema_df.sort_index()

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
Age,What is your age (in years)? If you prefer not...
Age1stCode,At what age did you write your first line of c...
BetterLife,Do you think people born today will have a bet...
BlockchainIs,Blockchain / cryptocurrency technology is prim...
BlockchainOrg,How is your organization thinking about or imp...
...,...
WorkPlan,How structured or planned is your work?
WorkRemote,How often do you work remotely?
WorkWeekHrs,"On average, how many hours per week do you work?"
YearsCode,"Including any education, how many years have y..."


In [32]:
schema_df.sort_index(ascending=False)

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
YearsCodePro,How many years have you coded professionally (...
YearsCode,"Including any education, how many years have y..."
WorkWeekHrs,"On average, how many hours per week do you work?"
WorkRemote,How often do you work remotely?
WorkPlan,How structured or planned is your work?
...,...
BlockchainOrg,How is your organization thinking about or imp...
BlockchainIs,Blockchain / cryptocurrency technology is prim...
BetterLife,Do you think people born today will have a bet...
Age1stCode,At what age did you write your first line of c...


In [33]:
schema_df # The dataframe is still unsorted.

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
Respondent,Randomized respondent ID number (not in order ...
MainBranch,Which of the following options best describes ...
Hobbyist,Do you code as a hobby?
OpenSourcer,How often do you contribute to open source?
OpenSource,How do you feel about the quality of open sour...
...,...
Sexuality,Which of the following do you currently identi...
Ethnicity,Which of the following do you identify as? Ple...
Dependents,"Do you have any dependents (e.g., children, el..."
SurveyLength,How do you feel about the length of the survey...


In [34]:
schema_df.sort_index(inplace=True) # Sort and modify the dataframe
schema_df

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
Age,What is your age (in years)? If you prefer not...
Age1stCode,At what age did you write your first line of c...
BetterLife,Do you think people born today will have a bet...
BlockchainIs,Blockchain / cryptocurrency technology is prim...
BlockchainOrg,How is your organization thinking about or imp...
...,...
WorkPlan,How structured or planned is your work?
WorkRemote,How often do you work remotely?
WorkWeekHrs,"On average, how many hours per week do you work?"
YearsCode,"Including any education, how many years have y..."


## Python Pandas Tutorial (Part 4): Filtering - Using Conditionals to Filter Rows and Columns

### Small Sample Example

In [35]:
people_df = pd.DataFrame(people, columns=['first', 'last', 'email'])
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [36]:
filt = (people_df['last'] == 'Doe')
filt

0    False
1     True
2     True
Name: last, dtype: bool

In [37]:
people_df[filt]

Unnamed: 0,first,last,email
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [38]:
people_df.loc[filt] # Series of boolean also works for loc. Recommend.

Unnamed: 0,first,last,email
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [39]:
people_df.loc[filt, 'email']

1    JaneDoe@email.com
2    JohnDoe@email.com
Name: email, dtype: object

In [40]:
# & for ADD
filt = (people_df['last'] == 'Doe') & (people_df['first'] == 'John')
people_df.loc[filt, 'email']

2    JohnDoe@email.com
Name: email, dtype: object

In [41]:
# / for OR
filt = (people_df['last'] == 'Schafer') | (people_df['first'] == 'John')
people_df.loc[filt, 'email']

0    CoreyMSchafer@gmail.com
2          JohnDoe@email.com
Name: email, dtype: object

In [42]:
# ~ for the OPPOSITE
people_df.loc[~filt, 'email']

1    JaneDoe@email.com
Name: email, dtype: object

### Real World Example

In [43]:
high_salary = (df['ConvertedComp'] > 70000)
df.loc[high_salary]

Unnamed: 0_level_0,MainBranch,Hobbyist,...,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
6,"I am not primarily a developer, but I write co...",Yes,...,Too long,Neither easy nor difficult
9,I am a developer by profession,Yes,...,Appropriate in length,Neither easy nor difficult
13,I am a developer by profession,Yes,...,Appropriate in length,Easy
16,I am a developer by profession,Yes,...,Appropriate in length,Neither easy nor difficult
22,I am a developer by profession,Yes,...,Appropriate in length,Easy
...,...,...,...,...,...
88876,I am a developer by profession,Yes,...,Appropriate in length,Easy
88877,I am a developer by profession,Yes,...,Too long,Neither easy nor difficult
88878,I am a developer by profession,Yes,...,Appropriate in length,Easy
88879,I am a developer by profession,Yes,...,Appropriate in length,Easy


In [44]:
df.loc[high_salary, ['Country', 'LanguageWorkedWith', 'ConvertedComp']]

Unnamed: 0_level_0,Country,LanguageWorkedWith,ConvertedComp
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
6,Canada,Java;R;SQL,366420.0
9,New Zealand,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;P...,95179.0
13,United States,Bash/Shell/PowerShell;HTML/CSS;JavaScript;PHP;...,90000.0
16,United Kingdom,Bash/Shell/PowerShell;C#;HTML/CSS;JavaScript;T...,455352.0
22,United States,Bash/Shell/PowerShell;C++;HTML/CSS;JavaScript;...,103000.0
...,...,...,...
88876,United States,Bash/Shell/PowerShell;C#;HTML/CSS;Java;Python;...,180000.0
88877,United States,Bash/Shell/PowerShell;C;Clojure;HTML/CSS;Java;...,2000000.0
88878,United States,HTML/CSS;JavaScript;Scala;TypeScript,130000.0
88879,Finland,Bash/Shell/PowerShell;C++;Python,82488.0


In [45]:
# Only filt out selected countries
countries = ['United Status', 'India', 'United Kingdom', 'Germany', 'Canada']
filt = df['Country'].isin(countries)
df.loc[filt, 'Country']

Respondent
1        United Kingdom
6                Canada
8                 India
10                India
12               Canada
              ...      
84539    United Kingdom
85182            Canada
85961    United Kingdom
86012             India
88377            Canada
Name: Country, Length: 24059, dtype: object

In [46]:
df['LanguageWorkedWith']

Respondent
1                          HTML/CSS;Java;JavaScript;Python
2                                      C++;HTML/CSS;Python
3                                                 HTML/CSS
4                                      C;C++;C#;Python;SQL
5              C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA
                               ...                        
88377                        HTML/CSS;JavaScript;Other(s):
88601                                                  NaN
88802                                                  NaN
88816                                                  NaN
88863    Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...
Name: LanguageWorkedWith, Length: 88883, dtype: object

In [47]:
# "Python" may be a string within "LanguageWorkedWith".
# Check if "Python" is in the string
filt = df['LanguageWorkedWith'].str.contains('Python', na=False)
df.loc[filt, 'LanguageWorkedWith']

Respondent
1                          HTML/CSS;Java;JavaScript;Python
2                                      C++;HTML/CSS;Python
4                                      C;C++;C#;Python;SQL
5              C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA
8        Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java...
                               ...                        
84539    Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java...
85738      Bash/Shell/PowerShell;C++;Python;Ruby;Other(s):
86566      Bash/Shell/PowerShell;HTML/CSS;Python;Other(s):
87739             C;C++;HTML/CSS;JavaScript;PHP;Python;SQL
88212                           HTML/CSS;JavaScript;Python
Name: LanguageWorkedWith, Length: 36443, dtype: object

## Python Pandas Tutorial (Part 5): Updating Rows and Columns - Modifying Data Within DateFrames

### Small Sample Example

In [48]:
# Change column names
people_df.columns = ['first_name', 'last_name', 'email']
# The shortage is that a full list need to be provided.
people_df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [49]:
# Use list comprehensions for column names
people_df.columns = [x.upper() for x in people_df.columns]
people_df

Unnamed: 0,FIRST_NAME,LAST_NAME,EMAIL
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [50]:
# Another way
people_df.columns = people_df.columns.str.lower()
people_df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [51]:
people_df.columns = [x.replace('_', ' ') for x in people_df.columns]
people_df

Unnamed: 0,first name,last name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [52]:
# Another way
people_df.columns = people_df.columns.str.replace(' ', '_')
people_df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [53]:
# Only change a few column names
people_df.rename(columns={'first_name': 'first', 'last_name': 'last'}, inplace=True)
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [54]:
# Update values
people_df.loc[2] = ['John', 'Smith', 'JohnSmith@email.com']
# The shortage is that a full list need to be provided.
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,JohnSmith@email.com


In [55]:
# Only update entries that are need to be updated.
people_df.loc[2, ['last', 'email']] = ['Doe', 'JohnDoe@email.com']
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [56]:
filt = (people_df['email'] == 'JohnDoe@email.com')
people_df[filt]['last'] # It is possible get the value this way, but CANNOT change data.
# Does NOT work: people_df[filt]['last'] = 'Smith'

2    Doe
Name: last, dtype: object

In [57]:
# Use loc for the previous example
people_df.loc[filt, 'last'] = 'Smith'
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Smith,JohnDoe@email.com


In [58]:
people_df['email'] = [x.upper() for x in people_df['email']]
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,COREYMSCHAFER@GMAIL.COM
1,Jane,Doe,JANEDOE@EMAIL.COM
2,John,Smith,JOHNDOE@EMAIL.COM


In [59]:
people_df['email'] = people_df['email'].str.lower()
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschafer@gmail.com
1,Jane,Doe,janedoe@email.com
2,John,Smith,johndoe@email.com


#### Four Popular Methods (apply, map, applymap, replace) to Change Multiple Values

##### apply (works for series or series from dataframe)

In [60]:
# apply works for either a dataframe or a series of objects.
# It is working on series now.
people_df['email'].apply(len) # apply len function to all email elements.

0    23
1    17
2    17
Name: email, dtype: int64

In [61]:
def update_email(email):
    return email.upper()

people_df['email'] = people_df['email'].apply(update_email) # Pass in function without ().
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,COREYMSCHAFER@GMAIL.COM
1,Jane,Doe,JANEDOE@EMAIL.COM
2,John,Smith,JOHNDOE@EMAIL.COM


In [62]:
people_df['email'] = people_df['email'].apply(lambda x: x.lower())
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschafer@gmail.com
1,Jane,Doe,janedoe@email.com
2,John,Smith,johndoe@email.com


In [63]:
# Work on dataframe
# It is NOT applying every value in the dataframe. It works on each column.
people_df.apply(len)
# It is the same as following.
# len(people_df['email'])

first    3
last     3
email    3
dtype: int64

In [64]:
people_df.apply(len, axis='columns') # work on columns and return values on rows

0    3
1    3
2    3
dtype: int64

In [65]:
# If it is numbers, return the min of numbers.
# If it is string, return the first one in alphabetical order.
people_df.apply(pd.Series.min)

first                      Corey
last                         Doe
email    coreymschafer@gmail.com
dtype: object

In [66]:
people_df.apply(lambda x: x.min()) # x is a series object.

first                      Corey
last                         Doe
email    coreymschafer@gmail.com
dtype: object

##### applymap (to every element in dataframe) (only works for dataframe)

In [67]:
people_df.applymap(len)

Unnamed: 0,first,last,email
0,5,7,23
1,4,3,17
2,4,5,17


In [68]:
people_df.applymap(str.lower) # Will get an error, if there is any numerical data.

Unnamed: 0,first,last,email
0,corey,schafer,coreymschafer@gmail.com
1,jane,doe,janedoe@email.com
2,john,smith,johndoe@email.com


##### map (substituting each value in a series with another value) (only works for series)

In [69]:
# Values that didn't substitute were converted to NaN.
people_df['first'].map({'Corey': 'Chris', 'Jane': 'Mary'})

0    Chris
1     Mary
2      NaN
Name: first, dtype: object

##### replace (don't want to change un-substituted to NaN)

In [70]:
people_df['first'].replace({'Corey': 'Chris', 'Jane': 'Mary'})

0    Chris
1     Mary
2     John
Name: first, dtype: object

### Real World Example

In [71]:
df.rename(columns={'ConvertedComp': 'SalaryUSD'}, inplace=True)
df['SalaryUSD']

Respondent
1            NaN
2            NaN
3         8820.0
4        61000.0
5            NaN
          ...   
88377        NaN
88601        NaN
88802        NaN
88816        NaN
88863        NaN
Name: SalaryUSD, Length: 88883, dtype: float64

In [72]:
# It doesn't have to be one to one. It changes all Yes and No.
df['Hobbyist'] = df['Hobbyist'].map({'Yes': True, 'No': False})
df

Unnamed: 0_level_0,MainBranch,Hobbyist,...,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,I am a student who is learning to code,True,...,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,False,...,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",True,...,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,False,...,Appropriate in length,Easy
5,I am a developer by profession,True,...,Appropriate in length,Easy
...,...,...,...,...,...
88377,,True,...,Appropriate in length,Easy
88601,,False,...,,
88802,,False,...,,
88816,,False,...,,
