Link to Medium blog post: https://towardsdatascience.com/how-to-add-a-new-column-to-an-existing-pandas-dataframe-310a8e7baf8f

First, let’s create an example DataFrame that we’ll reference throughout this guide to demonstrate a few concepts related to adding columns to pandas frames.

In [24]:
import pandas as pd

df = pd.DataFrame({
    'colA':[True, False, False], 
    'colB': [1, 2, 3],
})
print(df)

    colA  colB
0   True     1
1  False     2
2  False     3


And let’s assume that we need to insert a new column called colC that should contain values 'a' , 'b' and 'c' for indices 0 , 1 and 3 respectively.



In [25]:
s = pd.Series(['a', 'b', 'c'], index=[0, 1, 2])
print(s)

0    a
1    b
2    c
dtype: object


### Using simple assignment

The easiest way to insert a new column is to simply assign the values of your Series into the existing frame:

In [26]:
df['colC'] = s.values
print(df)

    colA  colB colC
0   True     1    a
1  False     2    b
2  False     3    c


Note that the above will work for most cases assuming that the indices of the new column match those of the DataFrame otherwise NaN values will be assigned to missing indices. For example,



In [27]:
df['colC'] = pd.Series(['a', 'b', 'c'], index=[1, 2, 3])
print(df)

    colA  colB colC
0   True     1  NaN
1  False     2    a
2  False     3    b


### Using assign()

pandas.DataFrame.assign() method can be used when you need to insert multiple new columns in a DataFrame, when you need to ignore the index of the column to be added or when you need to overwrite the values of an existing columns.

The method will return a new DataFrame object (a copy) containing all the original columns in addition to new ones:

In [28]:
e = pd.Series([1.0, 3.0, 2.0], index=[0, 2, 1])
s = pd.Series(['a', 'b', 'c'], index=[0, 1, 2])
df.assign(colC=s.values, colB=e.values)

Unnamed: 0,colA,colB,colC
0,True,1.0,a
1,False,3.0,b
2,False,2.0,c


Always remember that with assign:

- the index of the column to be added is ignored
- all existing columns that are re-assigned will be overwritten

### Using insert()

Alternatively, you can also use pandas.DataFrame.insert(). This method is usually useful when you need to insert a new column in a specific position or index.

For example, to add colD to the end of the DataFrame:

In [17]:
df.insert(len(df.columns), 'colD', s.values)
print(df)

    colA  colB colC colD
0   True     1  NaN    a
1  False     2    a    b
2  False     3    b    c


To insert colE in between colA and colB :

In [19]:
df.insert(1, 'colE', s.values)
print(df)

    colA colE  colB colC colD
0   True    a     1  NaN    a
1  False    b     2    a    b
2  False    c     3    b    c


Additionally, insert() can even be used to add a duplicate column name. By default, a ValueError is raised when a column already exists in the DataFrame:

In [20]:
df.insert(1, 'colC', s.values)
df.insert(1, 'colC', s.values)

ValueError: cannot insert colC, already exists

However, if you pass allow_duplicates=True to insert() method, the DataFrame will have two columns with the same name:

In [22]:
df.insert(1, 'colC', s.values, allow_duplicates=True)
print(df)

    colA colC colE  colB colC colD
0   True    a    a     1  NaN    a
1  False    b    b     2    a    b
2  False    c    c     3    b    c


### Using concat()

Finally, pandas.concat() method can also be used to concatenate a new column to a DataFrame by passing axis=1. This method returns a new DataFrame which is the result of the concatenation.

In [29]:
df = pd.concat([df, s.rename('colC')], axis=1)
print(df)

    colA  colB colC colC
0   True     1  NaN    a
1  False     2    a    b
2  False     3    b    c


The above operation will concatenate the Series with the original DataFrame using the index. In most of the cases, you should use concat() if the indices of the objects to be concatenated match with each other. If indices don’t match then the all indices for every object will be present in the result:

In [30]:
s = pd.Series(['a', 'b', 'c'], index=[10, 20, 30])
df = pd.concat([df, s.rename('colC')], axis=1)
print(df)


     colA  colB colC colC colC
0    True   1.0  NaN    a  NaN
1   False   2.0    a    b  NaN
2   False   3.0    b    c  NaN
10    NaN   NaN  NaN  NaN    a
20    NaN   NaN  NaN  NaN    b
30    NaN   NaN  NaN  NaN    c


### Changing the index of the column to be added

On of the trickiest part when it comes to adding new columns to DataFrames is the index. You should be careful as each of the methods we discussed in this guide may handle indices in a different way.

If for any reason the index of the new column to be added has not any special meaning and you don’t want it to be taken into account when inserted, you can even specify the index of the Series to be the same as the index of the DataFrame.

In [32]:
'''s = pd.Series(['a', 'b', 'c'], index=df.index)'''

"s = pd.Series(['a', 'b', 'c'], index=df.index)"