# Add, change and delete data

For many data sets, you may want to perform a transformation based on the values in an array, series or column in a DataFrame. For this, we look at the first Unicode characters:

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.DataFrame(
    {
        "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
        "Decimal": [0, 1, 2, 3, 4, 5],
        "Octal": ["001", "002", "003", "004", "004", "005"],
        "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
    }
)

df

Unnamed: 0,Code,Decimal,Octal,Key
0,U+0000,0,1,NUL
1,U+0001,1,2,Ctrl-A
2,U+0002,2,3,Ctrl-B
3,U+0003,3,4,Ctrl-C
4,U+0004,4,4,Ctrl-D
5,U+0005,5,5,Ctrl-E


## Add data

Suppose you want to add a column where the characters are assigned to the `C0` or `C1` control code:

In [3]:
control_code = {
    "u+0000": "C0",
    "u+0001": "C0",
    "u+0002": "C0",
    "u+0003": "C0",
    "u+0004": "C0",
    "u+0005": "C0",
}

The `map` method for a series accepts a function or dict-like object that contains an assignment, but here we have a small problem because some of the codes in `control_code` are lower case, but not in our DataFrame. Therefore, we need to convert each value to lower case using the `str.lower method`:

In [4]:
lowercased = df["Code"].str.lower()

lowercased

0    u+0000
1    u+0001
2    u+0002
3    u+0003
4    u+0004
5    u+0005
Name: Code, dtype: object

In [5]:
df["Control code"] = lowercased.map(control_code)

df

Unnamed: 0,Code,Decimal,Octal,Key,Control code
0,U+0000,0,1,NUL,C0
1,U+0001,1,2,Ctrl-A,C0
2,U+0002,2,3,Ctrl-B,C0
3,U+0003,3,4,Ctrl-C,C0
4,U+0004,4,4,Ctrl-D,C0
5,U+0005,5,5,Ctrl-E,C0


We could also have passed a function that does all the work:

In [6]:
df["Code"].map(lambda x: control_code[x.lower()])

0    C0
1    C0
2    C0
3    C0
4    C0
5    C0
Name: Code, dtype: object

Using `map` is a convenient way to perform element-wise transformations and other data cleaning operations.

## Change data

The [replace](https://pandas.pydata.org/docs/reference/api/pandas.Series.replace.html) method can be used to replace certain values with others.

In [7]:
s = pd.Series(["Manpower", "man-made", np.nan])

In [8]:
s.replace("Man", "Personal")

0    Manpower
1    man-made
2         NaN
dtype: object

In [9]:
s.replace("[Mm]an", "Personal", regex=True)

0    Personalpower
1    Personal-made
2              NaN
dtype: object

In [10]:
s.replace(["[Mm]an", np.nan], ["Personal", 0], regex=True)

0    Personalpower
1    Personal-made
2                0
dtype: object

In [11]:
s.replace(["[Mm]an", np.nan], ["Personal", len(s)], regex=True)

0    Personalpower
1    Personal-made
2                3
dtype: object

<div class="alert alert-block alert-info">
    
**See also:**

* [Managing missing data with pandas](../../clean-prep/nulls.ipynb)
</div>

## Delete data

Deleting one or more entries from an axis is easy if you already have an index array or a list without these entries.

To delete duplicates, see [Deduplicating data](../../clean-prep/deduplicate.ipynb).

Since this may require a bit of set theory, we return the drop method as a new object without the deleted values:

In [12]:
rng = np.random.default_rng()
s = pd.Series(rng.normal(size=7))

s

0   -1.166964
1    1.777849
2    1.382118
3    1.336918
4    0.757578
5   -0.530292
6    2.097554
dtype: float64

In [13]:
new = s.drop(2)

new

0   -1.166964
1    1.777849
3    1.336918
4    0.757578
5   -0.530292
6    2.097554
dtype: float64

In [14]:
new = s.drop([2, 3])

new

0   -1.166964
1    1.777849
4    0.757578
5   -0.530292
6    2.097554
dtype: float64

With DataFrames, index values can be deleted on both axes. To illustrate this, we first create an example DataFrame:

In [15]:
data = {
    "Code": ["U+0000", "U+0001", "U+0002", "U+0003", "U+0004", "U+0005"],
    "Decimal": [0, 1, 2, 3, 4, 5],
    "Octal": ["001", "002", "003", "004", "004", "005"],
    "Key": ["NUL", "Ctrl-A", "Ctrl-B", "Ctrl-C", "Ctrl-D", "Ctrl-E"],
}

df = pd.DataFrame(data)

df

Unnamed: 0,Code,Decimal,Octal,Key
0,U+0000,0,1,NUL
1,U+0001,1,2,Ctrl-A
2,U+0002,2,3,Ctrl-B
3,U+0003,3,4,Ctrl-C
4,U+0004,4,4,Ctrl-D
5,U+0005,5,5,Ctrl-E


In [16]:
df.drop([0, 1])

Unnamed: 0,Code,Decimal,Octal,Key
2,U+0002,2,3,Ctrl-B
3,U+0003,3,4,Ctrl-C
4,U+0004,4,4,Ctrl-D
5,U+0005,5,5,Ctrl-E


You can also remove values from the columns by passing `axis=1` or `axis='columns'`:

In [17]:
df.drop("Decimal", axis=1)

Unnamed: 0,Code,Octal,Key
0,U+0000,1,NUL
1,U+0001,2,Ctrl-A
2,U+0002,3,Ctrl-B
3,U+0003,4,Ctrl-C
4,U+0004,4,Ctrl-D
5,U+0005,5,Ctrl-E


Many functions such as `drop` that change the size or shape of a row or DataFrame can manipulate an object in place without returning a new object:

In [18]:
df.drop(0, inplace=True)

df

Unnamed: 0,Code,Decimal,Octal,Key
1,U+0001,1,2,Ctrl-A
2,U+0002,2,3,Ctrl-B
3,U+0003,3,4,Ctrl-C
4,U+0004,4,4,Ctrl-D
5,U+0005,5,5,Ctrl-E


<div class="alert alert-block alert-warning">

**Warning:**

Be careful with the `inplace` function, as the data will be irretrievably deleted.
</div>

<div class="alert alert-block alert-info">

**See also:**

* [Deduplicate data](../../clean-prep/deduplicate.ipynb)
</div>