# Pandas rename columns

This is a notebook for the medium article [Pandas rename columns](https://bindichen.medium.com/renaming-columns-in-a-pandas-dataframe-1d909360ddc6)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [1]:
import pandas as pd

In [2]:
def load_data(): 
    df_all = pd.read_csv('data/titanic/train.csv')
    # Take a subset
    return df_all.loc[:, ['PassengerId', 'Pclass', 'Name', 'Sex']]

https://www.youtube.com/watch?v=0uBirYFhizE

In [3]:
df = load_data()

In [4]:
df.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male


## 1. Passing a list of names to `columns` attribute

In [6]:
# Access column names
df.columns

Index(['PassengerId', 'Pclass', 'Name', 'Sex'], dtype='object')

In [7]:
# Rename columns
df.columns = ['Id', 'Class', 'Name', 'Sex']
df.head()

Unnamed: 0,Id,Class,Name,Sex
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male


A disadvantage with this approach is that we need to provide names for all columns even if we want to rename only some of them.

In [36]:
# Getting ValueError: Length mismatch
df.columns = ['Id', 'Class']

ValueError: Length mismatch: Expected axis has 4 elements, new values have 2 elements

## 2. Using `rename()` function

### 2.1 Rename columns using a dictionary mapping

In [8]:
df = load_data()

In [9]:
# Rename columns
#   PassengerId  ->  Id
#   Pclass       ->  Class
df.rename(
    columns=({ 'PassengerId': 'Id', 'Pclass': 'Class'}), 
    inplace=True,
)

In [10]:
df.head()

Unnamed: 0,Id,Class,Name,Sex
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male


### 2.2 Rename column using a function

In [11]:
df = load_data()

In [12]:
# Passing `str.lower` function to lowercase column names
df.rename(columns=str.lower).head()

Unnamed: 0,passengerid,pclass,name,sex
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male


In [13]:
# Custom function
def toUpperCase(string):
    return string.upper()

df.rename(columns=toUpperCase).head()

Unnamed: 0,PASSENGERID,PCLASS,NAME,SEX
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male


In [14]:
# Use lambda expression
df.rename(columns=lambda s: s.upper()).head()

Unnamed: 0,PASSENGERID,PCLASS,NAME,SEX
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male


### 2.3 Rename index

In [15]:
df = pd.DataFrame(
    { "ID": [1, 2], "City": ['London', 'Oxford']}, 
    index=['A1', 'A2'],
)
df

Unnamed: 0,ID,City
A1,1,London
A2,2,Oxford


In [16]:
# Use dict mapping
df.rename({ 'A1': 'B1', 'A2': 'B2'})

Unnamed: 0,ID,City
B1,1,London
B2,2,Oxford


In [17]:
df.rename(str.lower, axis=0)

Unnamed: 0,ID,City
a1,1,London
a2,2,Oxford


## 3. Using read_csv() with `names` argument

In [18]:
# Preview
df = pd.read_csv('data/titanic/train.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,493,0,1,"Molson, Mr. Harry Markland",male,55.0,0,0,113787,30.5,C30,S
1,53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female,49.0,1,0,PC 17572,76.7292,D33,C
2,388,1,2,"Buss, Miss. Kate",female,36.0,0,0,27849,13.0,,S
3,192,0,2,"Carbines, Mr. William",male,19.0,0,0,28424,13.0,,S
4,687,0,3,"Panula, Mr. Jaako Arnold",male,14.0,4,1,3101295,39.6875,,S


In [19]:
new_names = ['ID', 'Survived', 'Class', 'Name', 'Sex']

df = pd.read_csv(
    'data/titanic/train.csv', 
    names=new_names,           # Rename columns
    header=0,                  # 
    usecols=[0,1,2,3,4],       # Read the first 5 columns
)
df.head()

Unnamed: 0,ID,Survived,Class,Name,Sex
0,493,0,1,"Molson, Mr. Harry Markland",male
1,53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,1,2,"Buss, Miss. Kate",female
3,192,0,2,"Carbines, Mr. William",male
4,687,0,3,"Panula, Mr. Jaako Arnold",male


## 4. Using `columns.str.replace`

In [20]:
df = pd.DataFrame(
    { "account id": [1, 2], "uk city": ['London', 'Oxford']}, 
    index=['A1', 'A2'],
)
df

Unnamed: 0,account id,uk city
A1,1,London
A2,2,Oxford


In [21]:
df.columns = df.columns.str.replace(' ', '_')

In [22]:
df.head()

Unnamed: 0,account_id,uk_city
A1,1,London
A2,2,Oxford


## 5. Renaming columns via `set_axis()`

In [24]:
df = load_data()
df.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male


In [25]:
df.set_axis(
    ['Id', 'Class', 'Name', 'Sex'], 
    axis=1,
).head()

Unnamed: 0,Id,Class,Name,Sex
0,493,1,"Molson, Mr. Harry Markland",male
1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female
2,388,2,"Buss, Miss. Kate",female
3,192,2,"Carbines, Mr. William",male
4,687,3,"Panula, Mr. Jaako Arnold",male
