<a href="https://colab.research.google.com/github/edelord/DS-practice/blob/main/3_4_DataFrame_basic_operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://www.gormanalysis.com/blog/python-pandas-for-your-grandpa-3-4-dataframe-basic-operations/

In this section, we’ll go over some basic DataFrame operations like how to insert or delete columns, and how to modify existing data.

Inserting new columns into an existing DataFrame is easy. For example, if you have a DataFrame like this

In [1]:
import numpy as np
import pandas as pd

df = pd.DataFrame({
    'a': [2, 3, 11, 13],
    'b': ['fox', 'rabbit', 'hound', 'rabbit']
})
print(df)
##     a       b
## 0   2     fox
## 1   3  rabbit
## 2  11   hound
## 3  13  rabbit

    a       b
0   2     fox
1   3  rabbit
2  11   hound
3  13  rabbit


you can insert a new column, ‘c’, using df['c'] and setting it equal to either a list, Series, NumPy array, or a scalar.

In [2]:
df['c'] = [1, 0, 1, 2]
print(df)
##     a       b  c
## 0   2     fox  1
## 1   3  rabbit  0
## 2  11   hound  1
## 3  13  rabbit  2

    a       b  c
0   2     fox  1
1   3  rabbit  0
2  11   hound  1
3  13  rabbit  2


Note that you can’t use dot notation to create a new column. So you can’t do df.d = 1. You have to use square brackets, like df['d'] = 1.

In [3]:
df['d'] = 1
print(df)
##     a       b  c  d
## 0   2     fox  1  1
## 1   3  rabbit  0  1
## 2  11   hound  1  1
## 3  13  rabbit  2  1

    a       b  c  d
0   2     fox  1  1
1   3  rabbit  0  1
2  11   hound  1  1
3  13  rabbit  2  1


You can also combine columns to create a new column. For example, you could create column ‘e’ as the sum of ‘a’ and ‘c’ like

In [4]:
df['e'] = df.a + df.c
print(df)
##     a       b  c  d   e
## 0   2     fox  1  1   3
## 1   3  rabbit  0  1   3
## 2  11   hound  1  1  12
## 3  13  rabbit  2  1  15

    a       b  c  d   e
0   2     fox  1  1   3
1   3  rabbit  0  1   3
2  11   hound  1  1  12
3  13  rabbit  2  1  15


You can also create or update column values using boolean indexing. For example, we could update ‘d’ to equal 0 where ‘b’ is ‘rabbit’ by doing

In [5]:
df.loc[df.b == 'rabbit', 'd'] = 0
print(df)
##     a       b  c  d   e
## 0   2     fox  1  1   3
## 1   3  rabbit  0  0   3
## 2  11   hound  1  1  12
## 3  13  rabbit  2  0  15

    a       b  c  d   e
0   2     fox  1  1   3
1   3  rabbit  0  0   3
2  11   hound  1  1  12
3  13  rabbit  2  0  15


Deleting columns is also pretty straight-forward. If you wanted to delete columns ‘a’ and ‘c’, just do

In [7]:
df.drop(columns=['a', 'c'], inplace=True)
print(df)
##         b  d   e
## 0     fox  1   3
## 1  rabbit  0   3
## 2   hound  1  12
## 3  rabbit  0  15

KeyError: ignored