### Wrong Data
"Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong, like if someone registered "199" instead of "1.99".

Sometimes you can spot wrong data by looking at the data set, because you have an expectation of what it should be.

If you take a look at our data set, you can see that in row 7, the duration is 450, but for all the other rows the duration is between 30 and 60.

It doesn't have to be wrong, but taking in consideration that this is the data set of someone's workout sessions, we conclude with the fact that this person did not work out in 450 minutes.

### Replacing Values
One way to fix wrong values is to replace them with something else.

In our example, it is most likely a typo, and the value should be "45" instead of "450", and we could just insert "45" in row 3:

In [1]:
import pandas as pd
df = pd.read_csv('data.csv')

# Set "Duration" = 45 in row 3:
df.loc[3,'Duration'] = 45
print(df.to_string())

    Duration        Date  Pulse  Maxpulse  Calories
0         60   12/1/2020    110       130     409.1
1         60   12/2/2020    117       145     479.0
2         60   12/3/2020    103       135     340.0
3         45   12/4/2020    109       175     282.4
4         45   12/5/2020    117       148     406.0
5         60   12/6/2020    102       127     300.0
6         60   12/7/2020    110       136     374.0
7         45   12/8/2020    104       134     253.3
8         30   12/9/2020    109       133     195.1
9         60  12/10/2020     98       124     269.0
10        60  12/11/2020    103       147     329.3
11        60  12/12/2020    100       120     250.7
12        60  12/12/2020    100       120     250.7
13        60  12/13/2020    106       128     345.3
14        60  12/14/2020    104       132     379.3
15        60  12/15/2020     98       123     275.0
16        60  12/16/2020     98       120     215.2
17        60  12/17/2020    100       120     300.0
18        45

For small data sets you might be able to replace the wrong data one by one, but not for big data sets.

To replace wrong data for larger data sets you can create some rules, e.g. set some boundaries for legal values, and replace any values that are outside of the boundaries.

In [2]:
# Loop through all values in the "Duration" column.
# If the value is higher than 50, set it to 50:
for x in df.index:
    if df.loc[x,'Duration'] > 50:
        df.loc[x,'Duration'] = 50
        
print(df.to_string())

    Duration        Date  Pulse  Maxpulse  Calories
0         50   12/1/2020    110       130     409.1
1         50   12/2/2020    117       145     479.0
2         50   12/3/2020    103       135     340.0
3         45   12/4/2020    109       175     282.4
4         45   12/5/2020    117       148     406.0
5         50   12/6/2020    102       127     300.0
6         50   12/7/2020    110       136     374.0
7         45   12/8/2020    104       134     253.3
8         30   12/9/2020    109       133     195.1
9         50  12/10/2020     98       124     269.0
10        50  12/11/2020    103       147     329.3
11        50  12/12/2020    100       120     250.7
12        50  12/12/2020    100       120     250.7
13        50  12/13/2020    106       128     345.3
14        50  12/14/2020    104       132     379.3
15        50  12/15/2020     98       123     275.0
16        50  12/16/2020     98       120     215.2
17        50  12/17/2020    100       120     300.0
18        45

### Removing Rows
Another way of handling wrong data is to remove the rows that contains wrong data.

This way you do not have to find out what to replace them with, and there is a good chance you do not need them to do your analyses.

In [3]:
# Delete rows where "Duration" is higher than 120:

for x in df.index:
    if df.loc[x,'Duration'] > 120:
        df.drop(x, inplace = True)
print(df.to_string())

    Duration        Date  Pulse  Maxpulse  Calories
0         50   12/1/2020    110       130     409.1
1         50   12/2/2020    117       145     479.0
2         50   12/3/2020    103       135     340.0
3         45   12/4/2020    109       175     282.4
4         45   12/5/2020    117       148     406.0
5         50   12/6/2020    102       127     300.0
6         50   12/7/2020    110       136     374.0
7         45   12/8/2020    104       134     253.3
8         30   12/9/2020    109       133     195.1
9         50  12/10/2020     98       124     269.0
10        50  12/11/2020    103       147     329.3
11        50  12/12/2020    100       120     250.7
12        50  12/12/2020    100       120     250.7
13        50  12/13/2020    106       128     345.3
14        50  12/14/2020    104       132     379.3
15        50  12/15/2020     98       123     275.0
16        50  12/16/2020     98       120     215.2
17        50  12/17/2020    100       120     300.0
18        45