# Pandas Tutorial
### Updating Rows and Columns - Modifying Data Within DataFrame
**Source:**

Corey Schafer - [Python Pandas Playlist](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS)

In [1]:
import pandas as pd

### Creating a sample DataFrame

In [2]:
people = {
    "first" : ["Parampreet", "Corey", "Anant"],
    "last" : ["Singh", "Schafer", "Luthra"],
    "email" : ["ParampreetSingh@gmail.com", "CoreySchafer@gmail.com", "AnantLuthra@gmail.com"]
}

In [3]:
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [4]:
df.columns

Index(['first', 'last', 'email'], dtype='object')

### Changing column names
1. By assigning a new list of columns to `df.columns`.
2. Using  `df.rename()` and passing a dictionary of specific columns
- Either pass `mapper=dict`, with `axis=1` or `columns=dict`
- We can also change `index` either with `axis=0` or `index=dict/function`

In [5]:
# Changing column names
df.columns = ["first_name", "last_name", "email"]
df

Unnamed: 0,first_name,last_name,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [6]:
df.columns = [i.capitalize() for i in df.columns]
df

Unnamed: 0,First_name,Last_name,Email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [7]:
# Changing specific columns
df.rename(columns={"First_name" : "first", "Last_name" : "last"}, inplace=True)
df

Unnamed: 0,first,last,Email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [8]:
# changing columns using function
df.rename(columns=str.lower, inplace=True)
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


### Changing rows

In [9]:
# Changing a row
df.loc[1] = ["John", "Smith", "JohnSmith@gmail.com"]

In [10]:
# Changing a specific column of a row
df.loc[1, "email"] = "SmithJohn@gmail.com"

In [11]:
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,John,Smith,SmithJohn@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [12]:
# We can also use `.at` for accessing/changing a single value
df.at[1, "email"] = "JohnSmith@gmail.com"
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,John,Smith,JohnSmith@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [13]:
# We can also use filter to access/change using `at`, `loc`
df.loc[df["first"]=="John", "email"] = "JohnSmith@email.com"
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,John,Smith,JohnSmith@email.com
2,Anant,Luthra,AnantLuthra@gmail.com


### DONT: Change value without `loc`, `iloc` or `at`
```df [df["first"] == "John"] ["email"] = "JohnSmith@email.com"```
- Filter returns a copy, so here, the changes are temporary, and gets deleted  immidiately.
- For more, check [this](https://pandas.pydata.org/docs/user_guide/indexing.html#returning-a-view-versus-a-copy)

In [14]:
# Changing all rows of a specific column
df["email"] = df["email"].str.lower()
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,parampreetsingh@gmail.com
1,John,Smith,johnsmith@email.com
2,Anant,Luthra,anantluthra@gmail.com


### Use `.apply()` to call a function on values of Series.
1. Returns a Series
2. We can pass additional arguments using `args` parameter like this:
- ```Series.apply(func, args=("hello", 2))```
3. We can pass additional keyword arguments like this:
- ```Series.apply(func, name="hello")```

In [15]:
df["email"].apply(len)

0    25
1    19
2    21
Name: email, dtype: int64

In [16]:
# Changing first name to uppercase using `apply`
df["first"] = df["first"].apply(lambda name: name.upper())
df

Unnamed: 0,first,last,email
0,PARAMPREET,Singh,parampreetsingh@gmail.com
1,JOHN,Smith,johnsmith@email.com
2,ANANT,Luthra,anantluthra@gmail.com


In [17]:
df["first"] = [i.capitalize() for i in df["first"]]
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,parampreetsingh@gmail.com
1,John,Smith,johnsmith@email.com
2,Anant,Luthra,anantluthra@gmail.com


#### If we use `.apply()` on a dataframe, the function will apply on each column/series
- returns a series, having column names as index, function return value as values.

In [18]:
df.apply(len)    # returning length of series for each column

first    3
last     3
email    3
dtype: int64

In [19]:
df.loc[2, "email"] = "anantluthra@gmail.com"
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,parampreetsingh@gmail.com
1,John,Smith,johnsmith@email.com
2,Anant,Luthra,anantluthra@gmail.com


In [20]:
# Now, axis=1, so function will apply on each row
df.apply(len, axis=1)

0    3
1    3
2    3
dtype: int64

In [21]:
df.apply(pd.Series.min)

first                    Anant
last                    Luthra
email    anantluthra@gmail.com
dtype: object

### Use `.map()` to substitute values in a Series
- We can either pass a function, a dictionary or a Series
- Returns a series
- Give NaN as a value, if we skip a value

In [22]:
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,parampreetsingh@gmail.com
1,John,Smith,johnsmith@email.com
2,Anant,Luthra,anantluthra@gmail.com


In [23]:
# Using a dictionary
df["first"].map({"Parampreet": "Param", "John": "Jimmy"})

0    Param
1    Jimmy
2      NaN
Name: first, dtype: object

### Use `.replace()` to replace values in a Series
- Takes 2 args, `to_replace` values needed to replace
- `value` replace the `to_replace` with
- Returns a series
- Doesn't give NaN

In [24]:
df["first"].replace({"Parampreet": "Param", "John": "Jimmy"})

0    Param
1    Jimmy
2    Anant
Name: first, dtype: object

### Use `.applymap()` to apply a function on a DataFrame elementwise.
1. Returns a DataFrame

In [25]:
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,parampreetsingh@gmail.com
1,John,Smith,johnsmith@email.com
2,Anant,Luthra,anantluthra@gmail.com


In [26]:
# Getting len of each element
df.applymap(len)

Unnamed: 0,first,last,email
0,10,5,25
1,4,5,19
2,5,6,21


### DataFrames of StackOverflow survey result 2021

In [27]:
df = pd.read_csv("./data/survey_results_public.csv", index_col="ResponseId")
schema_df = pd.read_csv("./data/survey_results_schema.csv", index_col="qname")

In [28]:
pd.set_option("display.max_columns", df.shape[1])

In [29]:
df.head()

Unnamed: 0_level_0,MainBranch,Employment,Country,US_State,UK_Country,EdLevel,Age1stCode,LearnCode,YearsCode,YearsCodePro,DevType,OrgSize,Currency,CompTotal,CompFreq,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSys,NEWStuck,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,NEWOtherComms,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1
1,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Slovakia,,,"Secondary school (e.g. American high school, G...",18 - 24 years,Coding Bootcamp;Other online resources (ex: vi...,,,"Developer, mobile",20 to 99 employees,EUR European Euro,4800.0,Monthly,C++;HTML/CSS;JavaScript;Objective-C;PHP;Swift,Swift,PostgreSQL;SQLite,SQLite,,,Laravel;Symfony,,,,,,PHPStorm;Xcode,Atom;Xcode,MacOS,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Multiple times per day,Yes,A few times per month or weekly,"Yes, definitely",No,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,62268.0
2,I am a student who is learning to code,"Student, full-time",Netherlands,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",7.0,,,,,,,JavaScript;Python,,PostgreSQL,,,,Angular;Flask;Vue.js,,Cordova,,Docker;Git;Yarn,Git,Android Studio;IntelliJ;Notepad++;PyCharm,,Windows,Visit Stack Overflow;Google it,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,"Yes, definitely",No,18-24 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,
3,"I am not primarily a developer, but I write co...","Student, full-time",Russian Federation,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",,,,,,,,Assembly;C;Python;R;Rust,Julia;Python;Rust,SQLite,SQLite,Heroku,,Flask,Flask,NumPy;Pandas;TensorFlow;Torch/PyTorch,Keras;NumPy;Pandas;TensorFlow;Torch/PyTorch,,,IPython/Jupyter;PyCharm;RStudio;Sublime Text;V...,IPython/Jupyter;RStudio;Sublime Text;Visual St...,MacOS,Visit Stack Overflow;Google it;Watch help / tu...,Stack Overflow;Stack Exchange,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",Yes,18-24 years old,Man,No,Prefer not to say,Prefer not to say,None of the above,None of the above,Appropriate in length,Easy,
4,I am a developer by profession,Employed full-time,Austria,,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",11 - 17 years,,,,"Developer, front-end",100 to 499 employees,EUR European Euro,,Monthly,JavaScript;TypeScript,JavaScript;TypeScript,,,,,Angular;jQuery,Angular;jQuery,,,,,,,Windows,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,Neutral,No,35-44 years old,Man,No,Straight / Heterosexual,White or of European descent,I am deaf / hard of hearing,,Appropriate in length,Neither easy nor difficult,
5,I am a developer by profession,"Independent contractor, freelancer, or self-em...",United Kingdom of Great Britain and Northern I...,,England,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",5 - 10 years,Friend or family member,17.0,10.0,"Developer, desktop or enterprise applications;...","Just me - I am a freelancer, sole proprietor, ...",GBP\tPound sterling,,,Bash/Shell;HTML/CSS;Python;SQL,Bash/Shell;HTML/CSS;Python;SQL,Elasticsearch;PostgreSQL;Redis,Cassandra;Elasticsearch;PostgreSQL;Redis,,,Flask,Flask,Apache Spark;Hadoop;NumPy;Pandas,Hadoop;NumPy;Pandas,Docker;Git;Kubernetes;Yarn,Docker;Git;Kubernetes;Yarn,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim;Vis...,Linux-based,Visit Stack Overflow;Go for a walk or other ph...,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per week,"Yes, somewhat",No,25-34 years old,Man,No,,White or of European descent,None of the above,,Appropriate in length,Easy,


In [30]:
# Renamed `ConvertedCompYearly` column
df.rename(columns={"ConvertedCompYearly" : "SalaryUSD"}, inplace=True)

In [31]:
df.columns

Index(['MainBranch', 'Employment', 'Country', 'US_State', 'UK_Country',
       'EdLevel', 'Age1stCode', 'LearnCode', 'YearsCode', 'YearsCodePro',
       'DevType', 'OrgSize', 'Currency', 'CompTotal', 'CompFreq',
       'LanguageHaveWorkedWith', 'LanguageWantToWorkWith',
       'DatabaseHaveWorkedWith', 'DatabaseWantToWorkWith',
       'PlatformHaveWorkedWith', 'PlatformWantToWorkWith',
       'WebframeHaveWorkedWith', 'WebframeWantToWorkWith',
       'MiscTechHaveWorkedWith', 'MiscTechWantToWorkWith',
       'ToolsTechHaveWorkedWith', 'ToolsTechWantToWorkWith',
       'NEWCollabToolsHaveWorkedWith', 'NEWCollabToolsWantToWorkWith', 'OpSys',
       'NEWStuck', 'NEWSOSites', 'SOVisitFreq', 'SOAccount', 'SOPartFreq',
       'SOComm', 'NEWOtherComms', 'Age', 'Gender', 'Trans', 'Sexuality',
       'Ethnicity', 'Accessibility', 'MentalHealth', 'SurveyLength',
       'SurveyEase', 'SalaryUSD'],
      dtype='object')

In [32]:
df["Age1stCode"]

ResponseId
1        18 - 24 years
2        11 - 17 years
3        11 - 17 years
4        11 - 17 years
5         5 - 10 years
             ...      
83435    11 - 17 years
83436    11 - 17 years
83437    11 - 17 years
83438    11 - 17 years
83439    11 - 17 years
Name: Age1stCode, Length: 83439, dtype: object

In [33]:
df["Age1stCode"].unique()

array(['18 - 24 years', '11 - 17 years', '5 - 10 years', '25 - 34 years',
       '35 - 44 years', 'Younger than 5 years', '45 - 54 years',
       '55 - 64 years', nan, 'Older than 64 years'], dtype=object)

In [34]:
#  Replace 11-17 years of age, with "Schooling years"
df["Age1stCode"].replace({"11 - 17 years" : "Schooling years"}, inplace=True)

In [35]:
df["Age1stCode"]

ResponseId
1          18 - 24 years
2        Schooling years
3        Schooling years
4        Schooling years
5           5 - 10 years
              ...       
83435    Schooling years
83436    Schooling years
83437    Schooling years
83438    Schooling years
83439    Schooling years
Name: Age1stCode, Length: 83439, dtype: object

In [36]:
df.columns

Index(['MainBranch', 'Employment', 'Country', 'US_State', 'UK_Country',
       'EdLevel', 'Age1stCode', 'LearnCode', 'YearsCode', 'YearsCodePro',
       'DevType', 'OrgSize', 'Currency', 'CompTotal', 'CompFreq',
       'LanguageHaveWorkedWith', 'LanguageWantToWorkWith',
       'DatabaseHaveWorkedWith', 'DatabaseWantToWorkWith',
       'PlatformHaveWorkedWith', 'PlatformWantToWorkWith',
       'WebframeHaveWorkedWith', 'WebframeWantToWorkWith',
       'MiscTechHaveWorkedWith', 'MiscTechWantToWorkWith',
       'ToolsTechHaveWorkedWith', 'ToolsTechWantToWorkWith',
       'NEWCollabToolsHaveWorkedWith', 'NEWCollabToolsWantToWorkWith', 'OpSys',
       'NEWStuck', 'NEWSOSites', 'SOVisitFreq', 'SOAccount', 'SOPartFreq',
       'SOComm', 'NEWOtherComms', 'Age', 'Gender', 'Trans', 'Sexuality',
       'Ethnicity', 'Accessibility', 'MentalHealth', 'SurveyLength',
       'SurveyEase', 'SalaryUSD'],
      dtype='object')

In [37]:
df.Trans.unique()

array(['No', 'Prefer not to say', 'Yes', nan, 'Or, in your own words:'],
      dtype=object)

In [38]:
# Changing Trans Values
# - No, Prefer not to say -> False
# - Yes -> True
# Or in your words: -> Nan (default)
df["Trans"] = df["Trans"].map({"No" : False, "Prefer not to say" : False, "Yes" : True})

In [39]:
df["Trans"].value_counts(dropna=False)

False    79039
NaN       3365
True      1035
Name: Trans, dtype: int64