![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/add-column-using-expression.png)

# Add Column using Expression for Manipulating Stream Identifiers


With Azure ML Data Prep you can add a new column to data with `Dataflow.add_column` by using a Data Prep expression to calculate the value from existing columns. For more examples of this refer to [add column using expression](./add-column-using-expression.ipynb).
<p>
Here we add additional columns based on the Path/Identifier of the Stream being read at the beggining of the Dataflow.

In [None]:
import azureml.dataprep as dprep

#### `RegEx.extract_record()`
Using the `RegEx.extract_record()` expression, add a new record column "Stream Date Record", which contains the name capturing groups in the regex with value.

In [None]:
dflow_regex_extract_record = dprep.auto_read_file('../data/stream-path.csv')
regex = dprep.RegEx('\/(?<year>\d{4})\/(?<month>\d{2})\/(?<day>\d{2})\/')
dflow_regex_extract_record = dflow_regex_extract_record.add_column(new_column_name='Stream Date Record',
                                                                   prior_column='Stream Path',
                                                                   expression=regex.extract_record(dflow_regex_extract_record['Stream Path']))
dflow_regex_extract_record.head(5)

#### `create_datetime()`
Using the `create_datetime()` expression, add a new column "Stream Date", which contains datetime values constructed from year, month, day values extracted from a record column "Stream Date Record".

In [None]:
year = dprep.col('year', dflow_regex_extract_record['Stream Date Record'])
month = dprep.col('month', dflow_regex_extract_record['Stream Date Record'])
day = dprep.col('day', dflow_regex_extract_record['Stream Date Record'])
dflow_create_datetime = dflow_regex_extract_record.add_column(new_column_name='Stream Date',
                                                              prior_column='Stream Date Record',
                                                              expression=dprep.create_datetime(year, month, day))
dflow_create_datetime.head(5)

#### `create_http_stream_info()`
Using the `create_http_stream_info()` expression, add a new column "HttpStream", which contains a Stream Info value constructed from a String value which represents a HTTP url.

In [None]:
dflow_urls = dprep.read_csv('../data/urls.csv')
dflow_urls.head(5)

The urls in the 'Url' columns are currently just strings. They don't represent the data that could be read by doing a HTTP GET request on the url.

In Data Prep binary streams are represented by Stream Info values. These values contain the path/identifier to access data at a location and the nature of that location (i.e. http address/local file/blob file).

To create a HTTP Stream Info from a url in a column the `create_http_stream_info` function can be used.

In [None]:
dflow_streaminfos = dflow_urls.add_column(dprep.create_http_stream_info(dflow_urls['Url']), 'HttpStream', 'Url')
dflow_streaminfos.head(5)

Now the data at those urls can be read by renaming the column with Stream Infos ('HttpStream') to 'Path', then adding a `parse_*`/`read_*` step for the appropriate file format. 

In [None]:
dflow_url_paths = dflow_streaminfos.rename_columns({'HttpStream': 'Path'})
dflow_url_read = dflow_url_paths.parse_delimited(separator=',', headers_mode=dprep.PromoteHeadersMode.CONSTANTGROUPED,
    encoding=dprep.FileEncoding.UTF8, quoting=False, skip_rows=0, skip_mode=dprep.SkipMode.NONE, comment=None)
dflow_url_read.head(5)

<br>