# Add Column using Expression

DataPrep has the ability to add a new column to data using an expression to calculate the value from existing columns. In this case, we will attempt to add additional columns to the input data.

In [1]:
import azureml.dataprep as dprep

In [2]:
# loading data
dataflow = dprep.read_csv(path=r'data\crime0-10.csv')
dataflow.head(3)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490,HY329907,07/05/2015 11:50:00 PM,050XX N NEWLAND AVE,820,THEFT,$500 AND UNDER,STREET,False,False,...,41,10,06,1129230.0,1933315.0,2015,07/12/2015 12:42:46 PM,41.973309466,-87.800174996,"(41.973309466, -87.800174996)"
1,10139776,HY329265,07/05/2015 11:30:00 PM,011XX W MORSE AVE,460,BATTERY,SIMPLE,STREET,False,True,...,49,1,08B,1167370.0,1946271.0,2015,07/12/2015 12:42:46 PM,42.008124017,-87.65955018,"(42.008124017, -87.65955018)"
2,10140270,HY329253,07/05/2015 11:20:00 PM,121XX S FRONT AVE,486,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,True,...,9,53,08B,,,2015,07/12/2015 12:42:46 PM,,,


#### `substring(start, length)`
Add a new column _Case Category_ using the `substring(start, length)` expression to extract the prefix from the _Case Number_ column.

In [3]:
substring_expression = dprep.col('Case Number').substring(0, 2)
case_category = dataflow.add_column(new_column_name='Case Category',
                                    prior_column='Case Number',
                                    expression=substring_expression)
case_category.head(3)

Unnamed: 0,ID,Case Number,Case Category,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490,HY329907,HY,07/05/2015 11:50:00 PM,050XX N NEWLAND AVE,820,THEFT,$500 AND UNDER,STREET,False,...,41,10,06,1129230.0,1933315.0,2015,07/12/2015 12:42:46 PM,41.973309466,-87.800174996,"(41.973309466, -87.800174996)"
1,10139776,HY329265,HY,07/05/2015 11:30:00 PM,011XX W MORSE AVE,460,BATTERY,SIMPLE,STREET,False,...,49,1,08B,1167370.0,1946271.0,2015,07/12/2015 12:42:46 PM,42.008124017,-87.65955018,"(42.008124017, -87.65955018)"
2,10140270,HY329253,HY,07/05/2015 11:20:00 PM,121XX S FRONT AVE,486,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,...,9,53,08B,,,2015,07/12/2015 12:42:46 PM,,,


#### `substring(start)`
Add a new column _Case Id_ using the `substring(start)` expression to extract just the number from _Case Number_ column and convert it to numeric.

In [4]:
substring_expression2 = dprep.col('Case Number').substring(2)
case_id = dataflow.add_column(new_column_name='Case Id',
                              prior_column='Case Number',
                              expression=substring_expression2)
case_id = case_id.to_number('Case Id')
case_id.head(3)

Unnamed: 0,ID,Case Number,Case Id,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490,HY329907,329907.0,07/05/2015 11:50:00 PM,050XX N NEWLAND AVE,820,THEFT,$500 AND UNDER,STREET,False,...,41,10,06,1129230.0,1933315.0,2015,07/12/2015 12:42:46 PM,41.973309466,-87.800174996,"(41.973309466, -87.800174996)"
1,10139776,HY329265,329265.0,07/05/2015 11:30:00 PM,011XX W MORSE AVE,460,BATTERY,SIMPLE,STREET,False,...,49,1,08B,1167370.0,1946271.0,2015,07/12/2015 12:42:46 PM,42.008124017,-87.65955018,"(42.008124017, -87.65955018)"
2,10140270,HY329253,329253.0,07/05/2015 11:20:00 PM,121XX S FRONT AVE,486,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,...,9,53,08B,,,2015,07/12/2015 12:42:46 PM,,,
