# Updating Rows and Columns in Dataframe
* Rename all the column names
* Rename single column name
* Rename column name to Lower/Upper case letters
* Rename single column name
* Replace space or underscore in column name
* Change all the row values referenced to index.
* Change specific the row values referenced to index.
* Converting/Replacing multiple values in column
* Functions:
    * apply 
    * applymap
    * map
    * replace


In [43]:
# Importing pandas library and loading data
import pandas as pd
df = pd.read_csv(r"C:\Users\battih\Desktop\Personal\Python\data\extracted_data\survey_results_public.csv")
schema_df = pd.read_csv(r"C:\Users\battih\Desktop\Personal\Python\data\extracted_data\survey_results_schema.csv")

In [44]:
people = {
    'first' : ['Huzefa', 'Jane', 'John'],
    'last' : ['Battiwala', 'Doe', 'Doe'],
    'email' : ['battih@gmail.com', 'johndoe@gmail.com', 'janedoe@gmail.com']
}

dict_df = pd.DataFrame(people)

Rename all the column name

In [45]:
dict_df.columns = ['first name', 'last name','email']

Rename the column name to lower/upper case letters

In [46]:
dict_df.columns = [x.upper() for x in dict_df.columns]

In [47]:
dict_df

Unnamed: 0,FIRST NAME,LAST NAME,EMAIL
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Doe,janedoe@gmail.com


In [48]:
dict_df.columns = dict_df.columns.str.replace(' ', '_')
dict_df

Unnamed: 0,FIRST_NAME,LAST_NAME,EMAIL
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Doe,janedoe@gmail.com


Rename respective column name

In [49]:
dict_df.rename(columns={'FIRST_NAME':'first', 'LAST_NAME':'last', 'EMAIL':'email'},inplace=True)
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Doe,janedoe@gmail.com


Change all the row values referenced to index.

In [50]:
dict_df.loc[2] = ['John', 'Smith', 'johnsmith@email.com']

In [51]:
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Smith,johnsmith@email.com


Change specific row values referenced to index.

In [52]:
dict_df.loc[2,['last','email']] = ['Doe', 'johndoe@email.com']
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Doe,johndoe@email.com


In [53]:
dict_df.at[2,'last'] = 'Doe'
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Doe,johndoe@email.com


In [54]:
filt = (dict_df['email'] == 'johndoe@email.com')
dict_df[filt]['last'] = 'Smith'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dict_df[filt]['last'] = 'Smith'


In [55]:
filt = (dict_df['email'] == 'johndoe@email.com')
dict_df.loc[filt,'last'] = 'Smith'

In [56]:
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Smith,johndoe@email.com


In [57]:
dict_df.loc[filt,'email'] = 'JohnDoe@email.com'

Converting/Replacing multiple values in column


In [58]:
dict_df['email'] = dict_df['email'].str.lower()
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@gmail.com
1,Jane,Doe,johndoe@gmail.com
2,John,Smith,johndoe@email.com


In [59]:
dict_df['email'] = dict_df['email'].str.replace('gmail','email')
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@email.com
1,Jane,Doe,johndoe@email.com
2,John,Smith,johndoe@email.com


### Apply Function: It allows user to pass a function and apply it on every single value of the series

In [60]:
# Using len function to derive the length of string present in email column
dict_df['email'].apply(len)

0    16
1    17
2    17
Name: email, dtype: int64

In [61]:
# Using anonumyous function to convert the value in email column to upper case
dict_df['email'] = dict_df['email'].apply(lambda x: x.upper())
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,BATTIH@EMAIL.COM
1,Jane,Doe,JOHNDOE@EMAIL.COM
2,John,Smith,JOHNDOE@EMAIL.COM


In [62]:
# Using uder defined function to convert the value in email column to upper case
def lower_email(email):
    return email.lower()
dict_df['email'] = dict_df['email'].apply(lower_email)
dict_df

Unnamed: 0,first,last,email
0,Huzefa,Battiwala,battih@email.com
1,Jane,Doe,johndoe@email.com
2,John,Smith,johndoe@email.com


In [63]:
# Using minimum function to get the least value in all column email column to upper case
 
dict_df.apply(lambda x: x.min())

first              Huzefa
last            Battiwala
email    battih@email.com
dtype: object

### Applymap function applies a function that accepts and returns a scalar to every element in dataframe.

In [64]:
dict_df.applymap(len)

  dict_df.applymap(len)


Unnamed: 0,first,last,email
0,6,9,16
1,4,3,17
2,4,5,17


In [65]:
dict_df.applymap(str.lower)

  dict_df.applymap(str.lower)


Unnamed: 0,first,last,email
0,huzefa,battiwala,battih@email.com
1,jane,doe,johndoe@email.com
2,john,smith,johndoe@email.com


Difference between apply and applymap:
The `applymap` method in pandas is tailor-made for DataFrame operations, offering a straightforward approach to apply a function to each element individually. Unlike `apply`, which operates on rows or columns, `applymap` is designed specifically for element-wise transformations across the entire DataFrame

### Map: It is used to map values from two series having one similar column

In [66]:
dict_df['first'].map({'Huzefa':'Chris','Jane':'Marry'},)

0    Chris
1    Marry
2      NaN
Name: first, dtype: object

### Replace function is used to replace a string, regex, list, dictionary, series, number, etc. from a Pandas Dataframe in Python. 

In [67]:
dict_df['first'].replace({'Huzefa':'Chris','Jane':'Marry'},)

0    Chris
1    Marry
2     John
Name: first, dtype: object

Difference between replace and map is, if the value is not present then map will update it as nan whereas replace wiill replace it with existing value present in column

In [68]:
# Renaming column name in dataframe
df.rename(columns={'ConvertedComp':'SalaryUSD'},inplace=True)

In [69]:
df['SalaryUSD']

0            NaN
1            NaN
2         8820.0
3        61000.0
4            NaN
          ...   
88878        NaN
88879        NaN
88880        NaN
88881        NaN
88882        NaN
Name: SalaryUSD, Length: 88883, dtype: float64

In [70]:
# mapping boolean value for yes and no in Hobbyist column
df['Hobbyist'] = df['Hobbyist'].map({'Yes':True,'No':False})

In [71]:
df['Hobbyist'].value_counts

<bound method IndexOpsMixin.value_counts of 0         True
1        False
2         True
3        False
4         True
         ...  
88878     True
88879    False
88880    False
88881    False
88882     True
Name: Hobbyist, Length: 88883, dtype: bool>