## Updating Rows and Columns 

In [1]:
people = {
    "first": ["Corey", "Jane", 'John'],
    "last": ["Schafer", "Doe", "Doe"],
    "email": ["CoreyMSchafer@gmail.com", "JaneDoe@gmail.com", "JohnDoe@gmail.com"]
}

In [2]:
import pandas as pd

In [3]:
df = pd.DataFrame(people)

In [4]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


### Modifying Data Within DataFrame

In [5]:
df.columns

Index(['first', 'last', 'email'], dtype='object')

#### Updating Columns in the dataframe
* We want column name **first_name** instead of **first**

In [6]:
# One way of changing name of column is to use columns attribute
# By using columns attribute we change the names of all columns 
# This method is only used when passing different names to all the columns
df.columns = ['first_name', 'last_name', 'email']

In [7]:
df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


In [8]:
# Using List conprehenssion 
df.columns = [x.upper() for x in df.columns]

In [9]:
df

Unnamed: 0,FIRST_NAME,LAST_NAME,EMAIL
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


In [10]:
df.columns = df.columns.str.replace('_', ' ')

In [11]:
df

Unnamed: 0,FIRST NAME,LAST NAME,EMAIL
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


In [12]:
df.columns = [x.lower() for x in df.columns]
df.columns = df.columns.str.replace(' ', '_')
df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


* All the above methods applies to all the columns at once
* Is we only want to change some columns. In that case we can use the **rename method** and just pass in dictionary of the columns that we want to change.
    * Here the output of the rename method shows the changes done.
    * But if we look at the data frame then those changes actually didn't go through.
    * To get these changes to take place in data frame then we neeed to set **inplace=True**

In [13]:
df.rename(columns={'first_name': 'first', 'last_name': 'last'}, inplace=True)

In [14]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


#### Updating data in the rows

In [15]:
# Change the last name of the John Doe to John Smith
df.loc[2] = ['John', 'Smith', 'JohnSmith@email.com']

In [16]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Smith,JohnSmith@email.com


* We want to change the last name and email only

In [17]:
df.loc[2, ['last', 'email']] = ['Doe', 'JohnDoe@email.com']

In [18]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@email.com


* Updating single value

In [19]:
df.loc[2, 'last'] = 'Smith'

In [20]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Smith,JohnDoe@email.com


* Using **.at** indexer instead of **.loc**

In [21]:
df.at[2, 'last'] = 'Doe'

In [22]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@email.com


In [23]:
filt = (df['email'] == 'JohnDoe@email.com')
df[filt]['last'] = 'Smith'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[filt]['last'] = 'Smith'


* So this give a warrning -> **SettingWithCopyWarning**
* And the changes we tried to do also didn't took place.

In [24]:
filt = (df['email'] == 'JohnDoe@email.com')
df.loc[filt, 'last'] = 'Smith'

* Updating multiple rows of data

In [27]:
# updating all the values in email to lower case
df['email'] = df['email'].str.lower()

In [28]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschafer@gmail.com
1,Jane,Doe,janedoe@gmail.com
2,John,Smith,johndoe@email.com


* Four methods to updating multiple rows of data
    * apply: apply is used for calling a function on our values. It can work on data frame or series object
    * applymap: is used to apply a function to every individual element in the data frame. It only works on data frames, series objects doesnt have applymap function.
    * map: Substituting each value in series with another value. It only works on series objects. The values that are not substituted gets converted to NaN.
    * replace: It replaces the values that are passed in the dictionary format but doesn't convert the rest of the values to NaN like map.

* ### apply

In [48]:
# Using Apply on series :- It apply a function to every value in our series 
# We want length of the email column
df['email'].apply(len)

0    23
1    17
2    17
Name: email, dtype: int64

In [31]:
# Create simple fuction that returns the uppercase version of our email
# Here the fuction is simple but the function can be as complex as possible
def update_email(email):
    return email.upper()

In [33]:
# Here we will get back a series of our email adresses in uppercase 
# But this doesn't actually change our values or change our rows of data frame
df['email'].apply(update_email)

0    COREYMSCHAFER@GMAIL.COM
1          JANEDOE@GMAIL.COM
2          JOHNDOE@EMAIL.COM
Name: email, dtype: object

In [34]:
# Hence assigning the apply to our column
df['email'] = df['email'].apply(update_email)

In [36]:
# Instead of using the simple funciton with the .apply() we can use lambda function
df['email'] = df['email'].apply(lambda x: x.lower())

In [37]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschafer@gmail.com
1,Jane,Doe,janedoe@gmail.com
2,John,Smith,johndoe@email.com


In [39]:
# Using Apply on data frame :- It runs a function on each row or column of tha data frame

df['email'].apply(len)

0    23
1    17
2    17
Name: email, dtype: int64

In [43]:
# If we use .apply on entire dataframe then it doesnt apply len function to every value in data frame
# It is actually applying len function to each series in the data frame specifically columns.
# So it outputs the total no. of values/rows in each column
# we can get same result by manually doing this len(df['eamil'])
# we can change the axis to columns to do same operation on rows.
df.apply(len, axis='columns')

0    3
1    3
2    3
dtype: int64

In [45]:
# Because we have string values in the data frame 
# we get the value with the initialials that comes first in the alphabetical order.
# So if we have the numerical data then we will get the minimum value in the entire colun as output.
df.apply(pd.Series.min)

first                      Corey
last                         Doe
email    coreymschafer@gmail.com
dtype: object

In [47]:
df.apply(lambda x: x.min())

first                      Corey
last                         Doe
email    coreymschafer@gmail.com
dtype: object

* ### applymap

In [50]:
# Using applymap on data frame: I apply a function to every single element in the dataframe
# We want the lenght of every element in the data frame
df.applymap(len)

Unnamed: 0,first,last,email
0,5,7,23
1,4,3,17
2,4,5,17


In [51]:
df.applymap(str.lower)

Unnamed: 0,first,last,email
0,corey,schafer,coreymschafer@gmail.com
1,jane,doe,janedoe@gmail.com
2,john,smith,johndoe@email.com


* ### map

In [54]:
# Using map on series: It is used to substitute each value in the series with another value
# Only works with series objects
# WE want to substitue a couple of our first names
# In the output we get the replaced values but the values that we didn't substitute were converted NaN
# So if we dont want that to happen use replace function.
df['first'].map({'Corey': 'Chris', 'Jane': 'Mary'})

0    Chris
1     Mary
2      NaN
Name: first, dtype: object

In [57]:
# Using replace: It replaces the values that are passed in the dictionary format 
# but doesn't convert the rest of the values to NaN like map.
# If we dont want the values that are not substituted convert to NaN we use replace.

df['first'] = df['first'].replace({'Corey': 'Chris', 'Jane': 'Mary'})

In [58]:
df

Unnamed: 0,first,last,email
0,Chris,Schafer,coreymschafer@gmail.com
1,Mary,Doe,janedoe@gmail.com
2,John,Smith,johndoe@email.com
