Resources:
    
1. https://towardsdatascience.com/renaming-columns-in-a-pandas-dataframe-1d909360ddc6

2. https://www.listendata.com/2020/09/How-to-rename-columns-in-Pandas.html

3. https://www.youtube.com/watch?v=CXzmULB5rjI


- In data analysis, we may work on a dataset that has no column names or column names contain some unwanted characters (e.g. space), or maybe we just want to rename columns to have better names. These all require us to rename columns in a Pandas DataFrame.

- In this article, you’ll learn 5 different approaches to do that. This article is structured as follows:
    
1. Passing a list of names to columns attribute

2. Using rename() function

3. Renaming columns while reading a CSV file

4. Using columns.str.replace() method

5. Renaming columns via set_axis()

In [1]:
import pandas as pd
import numpy as np
import datetime as dt
import seaborn as sns

In [2]:
titanic= sns.load_dataset('titanic')

In [3]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [4]:
titanic.columns

Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
       'alive', 'alone'],
      dtype='object')

## 1. Passing a list of names to columns attribute


In [6]:
subset=titanic.loc[:,['survived', 'pclass', 'sex', 'age']]
subset

Unnamed: 0,survived,pclass,sex,age
0,0,3,male,22.0
1,1,1,female,38.0
2,1,3,female,26.0
3,1,1,female,35.0
4,0,3,male,35.0
...,...,...,...,...
886,0,2,male,27.0
887,1,1,female,19.0
888,0,3,female,
889,1,1,male,26.0


In [7]:
subset.columns=['Alive','class','Sex','Age']

In [8]:
subset

Unnamed: 0,Alive,class,Sex,Age
0,0,3,male,22.0
1,1,1,female,38.0
2,1,3,female,26.0
3,1,1,female,35.0
4,0,3,male,35.0
...,...,...,...,...
886,0,2,male,27.0
887,1,1,female,19.0
888,0,3,female,
889,1,1,male,26.0


However, a disadvantage with this approach is that we need to provide names for all columns even if we want to rename only some of them. Otherwise, we will get a “ValueError: Length mismatch”:


## 2. Using rename() function


In [9]:
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [10]:
titanic.rename(columns={"survived":"Alive",'pclass':'class'})

Unnamed: 0,Alive,class,sex,age,sibsp,parch,fare,embarked,class.1,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


## 2.2 Rename columns using a function


- To convert column names into uppercase, we can pass a str.upper function

In [14]:
titanic.rename(columns=str.upper)

Unnamed: 0,SURVIVED,PCLASS,SEX,AGE,SIBSP,PARCH,FARE,EMBARKED,CLASS,WHO,ADULT_MALE,DECK,EMBARK_TOWN,ALIVE,ALONE
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [15]:
titanic.rename(columns=str.lower)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


- We can also create a custom function and pass it to the columns argument.


In [17]:
def toUpperCase(string):
    return string.upper()
titanic.rename(columns=toUpperCase).head()

Unnamed: 0,SURVIVED,PCLASS,SEX,AGE,SIBSP,PARCH,FARE,EMBARKED,CLASS,WHO,ADULT_MALE,DECK,EMBARK_TOWN,ALIVE,ALONE
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


- We can also use lambda expression:

In [18]:
titanic.rename(columns=lambda x : x.upper())

Unnamed: 0,SURVIVED,PCLASS,SEX,AGE,SIBSP,PARCH,FARE,EMBARKED,CLASS,WHO,ADULT_MALE,DECK,EMBARK_TOWN,ALIVE,ALONE
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


This is useful when you need to update many columns or all columns with the same naming convention.

# 3. Using read_csv() with names argument


df = pd.read_csv(
    'data/titanic/train.csv', 
    names=new_names,           # Rename columns
    header=0,                  # Drop the existing header row
    usecols=[0,1,2,3,4],       # Read the first 5 columns
)

# 4. Using columns.str.replace() method


In [20]:
df = pd.DataFrame(
    { "account id": [1, 2], "uk city": ['London', 'Oxford']}, 
    index=['A1', 'A2'],
)
df

Unnamed: 0,account id,uk city
A1,1,London
A2,2,Oxford


In [21]:
df.columns = df.columns.str.replace(' ', '_')

In [22]:
df

Unnamed: 0,account_id,uk_city
A1,1,London
A2,2,Oxford


This method is also available to index via df.index.str.replace().


# 5. Renaming columns via set_axis()


In [24]:
subset.set_axis(['Survived', 'Class', 'Name', 'Sex'], axis=1)


Unnamed: 0,Survived,Class,Name,Sex
0,0,3,male,22.0
1,1,1,female,38.0
2,1,3,female,26.0
3,1,1,female,35.0
4,0,3,male,35.0
...,...,...,...,...
886,0,2,male,27.0
887,1,1,female,19.0
888,0,3,female,
889,1,1,male,26.0


- The columns attribute is the easiest but it requires us to provide names for all columns even if we want to rename only some of them. This approach is sensible when working on a DataFrame with just a couple of columns.

- When reading a CSV file, it may be more sensible to rename columns using read_csv() with names argument.

- When you want to rename some selected columns, the rename() function is the best choice.

- columns.str.replace() is useful only when you want to replace characters. Note, passing a custom function to rename() can do the same.

- Lastly, we could also change column names by setting axis via set_axis().

In [28]:

df = pd.DataFrame({'A': [11, 21, 31],
                   'B': [12, 22, 32],
                   'C': [13, 23, 33]},
                  index=['ONE', 'TWO', 'THREE'])
df

Unnamed: 0,A,B,C
ONE,11,12,13
TWO,21,22,23
THREE,31,32,33


In [29]:
print(df.add_prefix('X_'))

       X_A  X_B  X_C
ONE     11   12   13
TWO     21   22   23
THREE   31   32   33


In [30]:
print(df.add_suffix('_X'))


       A_X  B_X  C_X
ONE     11   12   13
TWO     21   22   23
THREE   31   32   33


In [31]:
print(df.set_axis(['Row_1', 'Row_2', 'Row_3'], axis=0))

        A   B   C
Row_1  11  12  13
Row_2  21  22  23
Row_3  31  32  33


In [32]:
print(df.set_axis(['Row_1', 'Row_2', 'Row_3'], axis='index'))

        A   B   C
Row_1  11  12  13
Row_2  21  22  23
Row_3  31  32  33


In [33]:
print(df.set_axis(['Col_1', 'Col_2', 'Col_3'], axis=1))

       Col_1  Col_2  Col_3
ONE       11     12     13
TWO       21     22     23
THREE     31     32     33


In [34]:
print(df.set_axis(['Col_1', 'Col_2', 'Col_3'], axis='columns'))

       Col_1  Col_2  Col_3
ONE       11     12     13
TWO       21     22     23
THREE     31     32     33


In [35]:
print(df.set_axis(['Row_1', 'Row_2', 'Row_3']))

        A   B   C
Row_1  11  12  13
Row_2  21  22  23
Row_3  31  32  33


In [41]:
nycflights = pd.read_csv("https://raw.githubusercontent.com/JackyP/testing/master/datasets/nycflights.csv",usecols=range(1,17))

In [42]:
nycflights

Unnamed: 0,year,month,day,dep_time,dep_delay,arr_time,arr_delay,carrier,tailnum,flight,origin,dest,air_time,distance,hour,minute
0,2013,1,1,517.0,2.0,830.0,11.0,UA,N14228,1545,EWR,IAH,227.0,1400,5.0,17.0
1,2013,1,1,533.0,4.0,850.0,20.0,UA,N24211,1714,LGA,IAH,227.0,1416,5.0,33.0
2,2013,1,1,542.0,2.0,923.0,33.0,AA,N619AA,1141,JFK,MIA,160.0,1089,5.0,42.0
3,2013,1,1,544.0,-1.0,1004.0,-18.0,B6,N804JB,725,JFK,BQN,183.0,1576,5.0,44.0
4,2013,1,1,554.0,-6.0,812.0,-25.0,DL,N668DN,461,LGA,ATL,116.0,762,5.0,54.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
336771,2013,9,30,,,,,9E,,3393,JFK,DCA,,213,,
336772,2013,9,30,,,,,9E,,3525,LGA,SYR,,198,,
336773,2013,9,30,,,,,MQ,N535MQ,3461,LGA,BNA,,764,,
336774,2013,9,30,,,,,MQ,N511MQ,3572,LGA,CLE,,419,,


In [43]:
df.columns

Index(['Unnamed: 0', 'year', 'month', 'day', 'dep_time', 'dep_delay',
       'arr_time', 'arr_delay', 'carrier', 'tailnum', 'flight', 'origin',
       'dest', 'air_time', 'distance', 'hour', 'minute'],
      dtype='object')

In [47]:
df.rename(columns={"year":'years'},inplace=True)

In [48]:
df

Unnamed: 0.1,Unnamed: 0,years,month,day,dep_time,dep_delay,arr_time,arr_delay,carrier,tailnum,flight,origin,dest,air_time,distance,hour,minute
0,1,2013,1,1,517.0,2.0,830.0,11.0,UA,N14228,1545,EWR,IAH,227.0,1400,5.0,17.0
1,2,2013,1,1,533.0,4.0,850.0,20.0,UA,N24211,1714,LGA,IAH,227.0,1416,5.0,33.0
2,3,2013,1,1,542.0,2.0,923.0,33.0,AA,N619AA,1141,JFK,MIA,160.0,1089,5.0,42.0
3,4,2013,1,1,544.0,-1.0,1004.0,-18.0,B6,N804JB,725,JFK,BQN,183.0,1576,5.0,44.0
4,5,2013,1,1,554.0,-6.0,812.0,-25.0,DL,N668DN,461,LGA,ATL,116.0,762,5.0,54.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
336771,336772,2013,9,30,,,,,9E,,3393,JFK,DCA,,213,,
336772,336773,2013,9,30,,,,,9E,,3525,LGA,SYR,,198,,
336773,336774,2013,9,30,,,,,MQ,N535MQ,3461,LGA,BNA,,764,,
336774,336775,2013,9,30,,,,,MQ,N511MQ,3572,LGA,CLE,,419,,


In [49]:
df.rename(columns={"year":'years','month':'months'},inplace=True)

In [50]:
df.columns.str.replace("_","-")

Index(['Unnamed: 0', 'years', 'months', 'day', 'dep-time', 'dep-delay',
       'arr-time', 'arr-delay', 'carrier', 'tailnum', 'flight', 'origin',
       'dest', 'air-time', 'distance', 'hour', 'minute'],
      dtype='object')

In [53]:
df.rename(columns={df.columns[0]:'col1'})

Unnamed: 0,col1,years,months,day,dep_time,dep_delay,arr_time,arr_delay,carrier,tailnum,flight,origin,dest,air_time,distance,hour,minute
0,1,2013,1,1,517.0,2.0,830.0,11.0,UA,N14228,1545,EWR,IAH,227.0,1400,5.0,17.0
1,2,2013,1,1,533.0,4.0,850.0,20.0,UA,N24211,1714,LGA,IAH,227.0,1416,5.0,33.0
2,3,2013,1,1,542.0,2.0,923.0,33.0,AA,N619AA,1141,JFK,MIA,160.0,1089,5.0,42.0
3,4,2013,1,1,544.0,-1.0,1004.0,-18.0,B6,N804JB,725,JFK,BQN,183.0,1576,5.0,44.0
4,5,2013,1,1,554.0,-6.0,812.0,-25.0,DL,N668DN,461,LGA,ATL,116.0,762,5.0,54.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
336771,336772,2013,9,30,,,,,9E,,3393,JFK,DCA,,213,,
336772,336773,2013,9,30,,,,,9E,,3525,LGA,SYR,,198,,
336773,336774,2013,9,30,,,,,MQ,N535MQ,3461,LGA,BNA,,764,,
336774,336775,2013,9,30,,,,,MQ,N511MQ,3572,LGA,CLE,,419,,


In [57]:
df.columns=["Col"+str(i) for i in range(1, 18)]
df.columns

Index(['Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6', 'Col7', 'Col8', 'Col9',
       'Col10', 'Col11', 'Col12', 'Col13', 'Col14', 'Col15', 'Col16', 'Col17'],
      dtype='object')

In [62]:
trend=pd.read_csv('SearchTrendData.csv')
trend.head()

Unnamed: 0,Week,eLearning,DataScience,MachineLearning,ArtificialIntelligence,DeepLearning
0,20/09/2015,32,9,11,11,4
1,27/09/2015,35,10,11,11,4
2,4/10/2015,38,10,12,11,4
3,11/10/2015,37,9,12,10,4
4,18/10/2015,38,9,12,10,4


In [64]:
trend.rename(columns={'MachineLearning':'ML','DeepLearning':'DL'})

Unnamed: 0,Week,eLearning,DataScience,ML,ArtificialIntelligence,DL
0,20/09/2015,32,9,11,11,4
1,27/09/2015,35,10,11,11,4
2,4/10/2015,38,10,12,11,4
3,11/10/2015,37,9,12,10,4
4,18/10/2015,38,9,12,10,4
...,...,...,...,...,...,...
256,16/08/2020,41,36,40,22,14
257,23/08/2020,49,39,40,21,14
258,30/08/2020,62,37,40,21,14
259,6/09/2020,74,38,42,24,14


In [65]:
trend.rename_axis(index="Date",axis=0).rename_axis('Topics',axis=1)

Topics,Week,eLearning,DataScience,MachineLearning,ArtificialIntelligence,DeepLearning
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,20/09/2015,32,9,11,11,4
1,27/09/2015,35,10,11,11,4
2,4/10/2015,38,10,12,11,4
3,11/10/2015,37,9,12,10,4
4,18/10/2015,38,9,12,10,4
...,...,...,...,...,...,...
256,16/08/2020,41,36,40,22,14
257,23/08/2020,49,39,40,21,14
258,30/08/2020,62,37,40,21,14
259,6/09/2020,74,38,42,24,14


In [66]:
trend.replace(to_replace=[4,14,21],value=[0,'what','replace'])

Unnamed: 0,Week,eLearning,DataScience,MachineLearning,ArtificialIntelligence,DeepLearning
0,20/09/2015,32,9,11,11,0
1,27/09/2015,35,10,11,11,0
2,4/10/2015,38,10,12,11,0
3,11/10/2015,37,9,12,10,0
4,18/10/2015,38,9,12,10,0
...,...,...,...,...,...,...
256,16/08/2020,41,36,40,22,what
257,23/08/2020,49,39,40,replace,what
258,30/08/2020,62,37,40,replace,what
259,6/09/2020,74,38,42,24,what
