# Opening and Saving Packages
Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.

Once you have built a dataflow, you can save it as a package to a DPREP file. This persists all of the information in your dataflow including steps you've added, examples and programs from by-example steps, computed aggregations, etc.

You can also open DPREP files to access any dataflows in those packages.

A DPREP package (and DPREP file) can contain multiple dataflows, and each dataflow in a package must have a unique name.

## Open

Use the `open()` method of the Package class to load existing DPREP files. You can then index into the Package to access a particular dataflow.

In [1]:
import os
package_path = os.path.join(os.getcwd(), 'data', 'crime0-10.dprep')

In [2]:
from azureml.dataprep.api.package import Package

In [3]:
p = Package.open(package_path)
t = p['crime0-10']
h = t.head(100)
h

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490.0,HY329907,2015-07-05 23:50:00,050XX N NEWLAND AVE,820.0,THEFT,$500 AND UNDER,STREET,False,False,...,41.0,10.0,06,1129230.0,1933315.0,2015.0,2015-07-12 12:42:46,41.973309,-87.800175,"(41.973309466, -87.800174996)"
1,10139776.0,HY329265,2015-07-05 23:30:00,011XX W MORSE AVE,460.0,BATTERY,SIMPLE,STREET,False,True,...,49.0,1.0,08B,1167370.0,1946271.0,2015.0,2015-07-12 12:42:46,42.008124,-87.65955,"(42.008124017, -87.65955018)"
2,10140270.0,HY329253,2015-07-05 23:20:00,121XX S FRONT AVE,486.0,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,True,...,9.0,53.0,08B,,,2015.0,2015-07-12 12:42:46,,,
3,10139885.0,HY329308,2015-07-05 23:19:00,051XX W DIVISION ST,610.0,BURGLARY,FORCIBLE ENTRY,SMALL RETAIL STORE,False,False,...,37.0,25.0,05,1141721.0,1907465.0,2015.0,2015-07-12 12:42:46,41.902152,-87.754883,"(41.902152027, -87.754883404)"
4,10140379.0,HY329556,2015-07-05 23:00:00,012XX W LAKE ST,930.0,MOTOR VEHICLE THEFT,THEFT/RECOVERY: AUTOMOBILE,STREET,False,False,...,27.0,28.0,07,1168413.0,1901632.0,2015.0,2015-07-12 12:42:46,41.88561,-87.657009,"(41.885610142, -87.657008701)"
5,10140868.0,HY330421,2015-07-05 22:54:00,118XX S PEORIA ST,1320.0,CRIMINAL DAMAGE,TO VEHICLE,VEHICLE NON-COMMERCIAL,False,False,...,34.0,53.0,14,1172409.0,1826485.0,2015.0,2015-07-12 12:42:46,41.679311,-87.644545,"(41.6793109, -87.644545209)"
6,10139762.0,HY329232,2015-07-05 22:42:00,026XX W 37TH PL,1020.0,ARSON,BY FIRE,VACANT LOT/LAND,False,False,...,12.0,58.0,09,1159436.0,1879658.0,2015.0,2015-07-12 12:42:46,41.825501,-87.690578,"(41.825500607, -87.690578042)"
7,10139722.0,HY329228,2015-07-05 22:30:00,016XX S CENTRAL PARK AVE,1811.0,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,ALLEY,True,False,...,24.0,29.0,18,1152687.0,1891389.0,2015.0,2015-07-12 12:42:46,41.857828,-87.715029,"(41.857827814, -87.715028789)"
8,10139774.0,HY329209,2015-07-05 22:15:00,048XX N ASHLAND AVE,1310.0,CRIMINAL DAMAGE,TO PROPERTY,APARTMENT,False,False,...,46.0,3.0,14,1164821.0,1932394.0,2015.0,2015-07-12 12:42:46,41.9701,-87.669324,"(41.970099796, -87.669324377)"
9,10139697.0,HY329177,2015-07-05 22:10:00,058XX S ARTESIAN AVE,1320.0,CRIMINAL DAMAGE,TO VEHICLE,ALLEY,False,False,...,16.0,63.0,14,1160997.0,1865851.0,2015.0,2015-07-12 12:42:46,41.78758,-87.685233,"(41.787580282, -87.685233078)"


## Edit

After a dataflow is loaded, it can be further edited as needed. In this example, a filter is added.

In [4]:
from azureml.dataprep.api.expressions import col

In [5]:
t = t.filter(col('Description') != 'SIMPLE')
h = t.head(100)
h

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490.0,HY329907,2015-07-05 23:50:00,050XX N NEWLAND AVE,820.0,THEFT,$500 AND UNDER,STREET,False,False,...,41.0,10.0,06,1129230.0,1933315.0,2015.0,2015-07-12 12:42:46,41.973309,-87.800175,"(41.973309466, -87.800174996)"
1,10140270.0,HY329253,2015-07-05 23:20:00,121XX S FRONT AVE,486.0,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,True,...,9.0,53.0,08B,,,2015.0,2015-07-12 12:42:46,,,
2,10139885.0,HY329308,2015-07-05 23:19:00,051XX W DIVISION ST,610.0,BURGLARY,FORCIBLE ENTRY,SMALL RETAIL STORE,False,False,...,37.0,25.0,05,1141721.0,1907465.0,2015.0,2015-07-12 12:42:46,41.902152,-87.754883,"(41.902152027, -87.754883404)"
3,10140379.0,HY329556,2015-07-05 23:00:00,012XX W LAKE ST,930.0,MOTOR VEHICLE THEFT,THEFT/RECOVERY: AUTOMOBILE,STREET,False,False,...,27.0,28.0,07,1168413.0,1901632.0,2015.0,2015-07-12 12:42:46,41.88561,-87.657009,"(41.885610142, -87.657008701)"
4,10140868.0,HY330421,2015-07-05 22:54:00,118XX S PEORIA ST,1320.0,CRIMINAL DAMAGE,TO VEHICLE,VEHICLE NON-COMMERCIAL,False,False,...,34.0,53.0,14,1172409.0,1826485.0,2015.0,2015-07-12 12:42:46,41.679311,-87.644545,"(41.6793109, -87.644545209)"
5,10139762.0,HY329232,2015-07-05 22:42:00,026XX W 37TH PL,1020.0,ARSON,BY FIRE,VACANT LOT/LAND,False,False,...,12.0,58.0,09,1159436.0,1879658.0,2015.0,2015-07-12 12:42:46,41.825501,-87.690578,"(41.825500607, -87.690578042)"
6,10139722.0,HY329228,2015-07-05 22:30:00,016XX S CENTRAL PARK AVE,1811.0,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,ALLEY,True,False,...,24.0,29.0,18,1152687.0,1891389.0,2015.0,2015-07-12 12:42:46,41.857828,-87.715029,"(41.857827814, -87.715028789)"
7,10139774.0,HY329209,2015-07-05 22:15:00,048XX N ASHLAND AVE,1310.0,CRIMINAL DAMAGE,TO PROPERTY,APARTMENT,False,False,...,46.0,3.0,14,1164821.0,1932394.0,2015.0,2015-07-12 12:42:46,41.9701,-87.669324,"(41.970099796, -87.669324377)"
8,10139697.0,HY329177,2015-07-05 22:10:00,058XX S ARTESIAN AVE,1320.0,CRIMINAL DAMAGE,TO VEHICLE,ALLEY,False,False,...,16.0,63.0,14,1160997.0,1865851.0,2015.0,2015-07-12 12:42:46,41.78758,-87.685233,"(41.787580282, -87.685233078)"


## Save

Use the `save()` method of the Package class to write out the DPREP file. To create a Package, pass it a list of dataflow objects.

In [6]:
import tempfile
temp_dir = tempfile._get_default_tempdir()
temp_file_name = next(tempfile._get_candidate_names())
temp_package_path = os.path.join(temp_dir, temp_file_name + '.dprep')

In [7]:
t = t.set_name('New-Crime')
p_to_save = Package([t])
p_to_save = p_to_save.save(temp_package_path)

## Round-trip

This illustrates the ability to load the edited dataflow back in and use it, in this case to get a Pandas dataframe.

In [8]:
p_to_open = Package.open(temp_package_path)
t_to_open = p_to_open['New-Crime']
df = t_to_open.to_pandas_dataframe()
df

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490.0,HY329907,2015-07-05 23:50:00,050XX N NEWLAND AVE,820.0,THEFT,$500 AND UNDER,STREET,False,False,...,41.0,10.0,06,1129230.0,1933315.0,2015.0,2015-07-12 12:42:46,41.973309,-87.800175,"(41.973309466, -87.800174996)"
1,10140270.0,HY329253,2015-07-05 23:20:00,121XX S FRONT AVE,486.0,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,True,...,9.0,53.0,08B,,,2015.0,2015-07-12 12:42:46,,,
2,10139885.0,HY329308,2015-07-05 23:19:00,051XX W DIVISION ST,610.0,BURGLARY,FORCIBLE ENTRY,SMALL RETAIL STORE,False,False,...,37.0,25.0,05,1141721.0,1907465.0,2015.0,2015-07-12 12:42:46,41.902152,-87.754883,"(41.902152027, -87.754883404)"
3,10140379.0,HY329556,2015-07-05 23:00:00,012XX W LAKE ST,930.0,MOTOR VEHICLE THEFT,THEFT/RECOVERY: AUTOMOBILE,STREET,False,False,...,27.0,28.0,07,1168413.0,1901632.0,2015.0,2015-07-12 12:42:46,41.88561,-87.657009,"(41.885610142, -87.657008701)"
4,10140868.0,HY330421,2015-07-05 22:54:00,118XX S PEORIA ST,1320.0,CRIMINAL DAMAGE,TO VEHICLE,VEHICLE NON-COMMERCIAL,False,False,...,34.0,53.0,14,1172409.0,1826485.0,2015.0,2015-07-12 12:42:46,41.679311,-87.644545,"(41.6793109, -87.644545209)"
5,10139762.0,HY329232,2015-07-05 22:42:00,026XX W 37TH PL,1020.0,ARSON,BY FIRE,VACANT LOT/LAND,False,False,...,12.0,58.0,09,1159436.0,1879658.0,2015.0,2015-07-12 12:42:46,41.825501,-87.690578,"(41.825500607, -87.690578042)"
6,10139722.0,HY329228,2015-07-05 22:30:00,016XX S CENTRAL PARK AVE,1811.0,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,ALLEY,True,False,...,24.0,29.0,18,1152687.0,1891389.0,2015.0,2015-07-12 12:42:46,41.857828,-87.715029,"(41.857827814, -87.715028789)"
7,10139774.0,HY329209,2015-07-05 22:15:00,048XX N ASHLAND AVE,1310.0,CRIMINAL DAMAGE,TO PROPERTY,APARTMENT,False,False,...,46.0,3.0,14,1164821.0,1932394.0,2015.0,2015-07-12 12:42:46,41.9701,-87.669324,"(41.970099796, -87.669324377)"
8,10139697.0,HY329177,2015-07-05 22:10:00,058XX S ARTESIAN AVE,1320.0,CRIMINAL DAMAGE,TO VEHICLE,ALLEY,False,False,...,16.0,63.0,14,1160997.0,1865851.0,2015.0,2015-07-12 12:42:46,41.78758,-87.685233,"(41.787580282, -87.685233078)"


In [9]:
if os.path.isfile(temp_package_path):
    os.remove(temp_package_path)