# External References
Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.

In addition to opening existing Dataflows in code and modifying them, it is also possible to create and persist Dataflows that reference another Dataflow that has been persisted to a Data Prep package. In this case, executing this Dataflow will load the referenced Data Prep package dynamically, execute the referenced Dataflow, and then execute the steps in the referencing Dataflow.

To demonstrate, we will create a Dataflow that loads and transforms some data. After that, we will persist this Dataflow to a Data Prep package.

In [1]:
import azureml.dataprep as dprep
import tempfile
import os

dflow = dprep.auto_read_file('../data/crime.txt')
dflow = dflow.drop_errors(['Column7', 'Column8', 'Column9'], dprep.ColumnRelationship.ANY)
dflow = dflow.set_name('FWF')
pkg = dprep.Package(dflow)
pkg_path = os.path.join(tempfile.gettempdir(), 'package.dprep')
pkg = pkg.save(pkg_path)

Now that we have a package file, we can create a new Dataflow that references it.

In [2]:
dflow_new = dprep.Dataflow.reference(dprep.ExternalReference(pkg_path, 'FWF'))
dflow_new.head(5)

Unnamed: 0,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9
0,10140490.0,HY,329907.0,7/5/2015,23:50,50.0,XX,N,NEWLAND AVE 820 THEFT
1,10139776.0,HY,329265.0,7/5/2015,23:30,11.0,XX,W,MORSE AVE 460 BATTERY
2,10140270.0,HY,329253.0,7/5/2015,23:20,121.0,XX,S,FRONT AVE 486 BATTERY
3,10139885.0,HY,329308.0,7/5/2015,23:19,51.0,XX,W,DIVISION ST 610 BURGLARY
4,10140379.0,HY,329556.0,7/5/2015,23:00,12.0,XX,W,LAKE ST 930 MOTOR VEHICLE THEFT


When executed, the new Dataflow returns the same results as the one we saved in our package. Since this reference is resolved on execution, updating the package file results in the changes being visible when re-executing the referencing Dataflow.

In [3]:
dflow = dflow.take(5)
pkg = dprep.Package(dflow)
pkg.save(pkg_path)

dflow_new.head(10)

Unnamed: 0,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9
0,10140490.0,HY,329907.0,7/5/2015,23:50,50.0,XX,N,NEWLAND AVE 820 THEFT
1,10139776.0,HY,329265.0,7/5/2015,23:30,11.0,XX,W,MORSE AVE 460 BATTERY
2,10140270.0,HY,329253.0,7/5/2015,23:20,121.0,XX,S,FRONT AVE 486 BATTERY
3,10139885.0,HY,329308.0,7/5/2015,23:19,51.0,XX,W,DIVISION ST 610 BURGLARY
4,10140379.0,HY,329556.0,7/5/2015,23:00,12.0,XX,W,LAKE ST 930 MOTOR VEHICLE THEFT


As we can see, even though we did not modify `dflow_new`, it now returns only 5 records, as the package was updated with the Dataflow that resulted from calling `df.take(5)`.