# Replace, Fill, Error
Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.

You can use the methods in this notebook to change values in your dataset.

* <a href='#replace'>replace</a> - use this method to replace a value with another value. You can also use this to replace null with a value, or a value with null
* <a href='#error'>error</a> - use this method to replace a value with an error.
* <a href='#fill_nulls'>fill_nulls</a> - this method lets you fill all nulls in a column with a certain value.
* <a href='#fill_errors'>fill_errors</a> - this method lets you fill all errors in a column with a certain value.

## Setup

In [1]:
import azureml.dataprep as dprep

In [2]:
dflow = dprep.read_csv('../data/crime-spring.csv')
dflow.head(5)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,4/15/2016 23:56,007XX E 111TH ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,4/15/2016 17:00,082XX S MARSHFIELD AVE,890,THEFT,FROM BUILDING,RESIDENCE,False,False,...,21,71,6,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,4/15/2016 10:00,104XX S SACRAMENTO AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,4/15/2016 9:00,113XX S PRAIRIE AVE,1120,DECEPTIVE PRACTICE,FORGERY,RESIDENCE,False,False,...,9,49,10,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,4/15/2016 10:00,055XX N KEDZIE AVE,890,THEFT,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6,,,2016,5/25/2016 15:59,,,


In [3]:
dflow = dflow.to_datetime('Date', ['%m/%d/%Y %H:%M'])
dflow = dflow.to_number(['IUCR', 'District', 'FBI Code'])
dflow.head(5)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153.0,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,890.0,THEFT,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154.0,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2016-04-15 09:00:00,113XX S PRAIRIE AVE,1120.0,DECEPTIVE PRACTICE,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,890.0,THEFT,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,


## Replace <a id='replace'></a>

### String
Use `replace` to swap a string value with another string value.

In [4]:
dflow = dflow.replace('Primary Type', 'THEFT', 'STOLEN')
head = dflow.head(5)
head

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153.0,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,890.0,STOLEN,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154.0,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2016-04-15 09:00:00,113XX S PRAIRIE AVE,1120.0,DECEPTIVE PRACTICE,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,890.0,STOLEN,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,


Use `replace` to remove a certain string value from the column, replacing it with null. Note that Pandas shows null values as None.

In [5]:
dflow = dflow.replace('Primary Type', 'DECEPTIVE PRACTICE', None)
head = dflow.head(5)
head

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153.0,,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,890.0,STOLEN,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154.0,,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2016-04-15 09:00:00,113XX S PRAIRIE AVE,1120.0,,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,890.0,STOLEN,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,


### Numeric
Use `replace` to swap a numeric value with another numeric value.

In [6]:
dflow = dflow.replace('District', 5, 1)
head = dflow.head(5)
head

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153.0,,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,890.0,STOLEN,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154.0,,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2016-04-15 09:00:00,113XX S PRAIRIE AVE,1120.0,,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,890.0,STOLEN,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,


### Date
Use `replace` to swap in a new Date for an existing Date in the data.

In [7]:
from datetime import datetime, timezone
dflow = dflow.replace('Date', 
                 datetime(2016, 4, 15, 9, 0, tzinfo=timezone.utc), 
                 datetime(2018, 7, 4, 0, 0, tzinfo=timezone.utc))
head = dflow.head(5)
head

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153.0,,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,890.0,STOLEN,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154.0,,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2018-07-04 00:00:00,113XX S PRAIRIE AVE,1120.0,,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,890.0,STOLEN,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,


## Error <a id='error'></a>

The `error` method lets you create Error values. You can pass to this function the value that you want to find, along with the Error code to use in any Errors created.

In [8]:
dflow = dflow.error('IUCR', 890, 'Invalid value')
head = dflow.head(5)
head

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153,,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,"azureml.dataprep.native.DataPrepError(""'Invali...",STOLEN,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154,,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2018-07-04 00:00:00,113XX S PRAIRIE AVE,1120,,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,"azureml.dataprep.native.DataPrepError(""'Invali...",STOLEN,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,


## Fill Nulls <a id='fill_nulls'></a>

Use the `fill_nulls` method to replace all null values in columns with another value. This is similar to Panda's fillna() method.

In [9]:
dflow = dflow.fill_nulls('Primary Type', 'N/A')
head = dflow.head(5)
head

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153,,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,"azureml.dataprep.native.DataPrepError(""'Invali...",STOLEN,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154,,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2018-07-04 00:00:00,113XX S PRAIRIE AVE,1120,,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,"azureml.dataprep.native.DataPrepError(""'Invali...",STOLEN,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,


## Fill Errors <a id='fill_errors'></a>

Use the `fill_errors` method to replace all error values in columns with another value.

In [10]:
dflow = dflow.fill_errors('IUCR', -1)
head = dflow.head(5)
head

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10498554,HZ239907,2016-04-15 23:56:00,007XX E 111TH ST,1153.0,,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,9,50,11.0,1183356.0,1831503.0,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
1,10516598,HZ258664,2016-04-15 17:00:00,082XX S MARSHFIELD AVE,-1.0,STOLEN,FROM BUILDING,RESIDENCE,False,False,...,21,71,6.0,1166776.0,1850053.0,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
2,10519196,HZ261252,2016-04-15 10:00:00,104XX S SACRAMENTO AVE,1154.0,,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,19,74,11.0,,,2016,5/12/2016 15:50,,,
3,10519591,HZ261534,2018-07-04 00:00:00,113XX S PRAIRIE AVE,1120.0,,FORGERY,RESIDENCE,False,False,...,9,49,10.0,,,2016,5/13/2016 15:51,,,
4,10534446,HZ277630,2016-04-15 10:00:00,055XX N KEDZIE AVE,-1.0,STOLEN,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",False,False,...,40,13,6.0,,,2016,5/25/2016 15:59,,,
