# QC: Rename Trade Statistics Vendor Inventory


## About
- Interactive workflow for renaming files in a vendor inventory that uses DRS ids to filenames that reflect the original digital object's owner-supplied name.
- **Created:** 2023/02/03
- **Updated:** 2023/02/03

### Globals

In [9]:
# path to local util code module
g_util_module_path = '../util'

# mets file
g_mets_file = '../data/trade_statistics/trade_statistics.xml'

# digital object osn inventory file
g_do_osn_inventory = '../data/trade_statistics/005825557.csv'

# mapped inventory file
g_mapped_vendor_inventory = './outputs/trade_statistics/mapped_vendor_inventory.csv'

# data directory for vendor files to rename
g_data_directory = '../data/trade_statistics1'

Add local path to Jupyter system path

In [10]:
import sys
if g_util_module_path not in sys.path:
    sys.path.append(g_util_module_path)

### Modules

In [11]:
import pandas as pd
import pprint
import util # local module

## Rename Vendor Files

### Download and Process METS File (`XML` format)

In [12]:
# print function documentation
print('{}'.format(util.mets_to_dataframe.__doc__))

# load the mets file
mets_df = util.mets_to_dataframe(g_mets_file)

# print number of files
print('Num files: {}'.format(len(mets_df)))

# display result
display(mets_df)


    Read and extract information about files from an XML METS file.

    Parameter
    ---------
    filename : str
        Full path to METS file.

    Return
    ------
    DataFrame

    
Num files: 1586


Unnamed: 0,@id,file_type,@mimetype,mets_url,filename
0,img_44319541,image,image/jpeg,image/44319541.jpg,44319541.jpg
1,img_44319542,image,image/jpeg,image/44319542.jpg,44319542.jpg
2,img_44319543,image,image/jpeg,image/44319543.jpg,44319543.jpg
3,img_44319544,image,image/jpeg,image/44319544.jpg,44319544.jpg
4,img_44319545,image,image/jpeg,image/44319545.jpg,44319545.jpg
...,...,...,...,...,...
1581,csv_44319948_a,csv,text/csv,csv/44319948_a.csv,44319948_a.csv
1582,csv_44319948_b,csv,text/csv,csv/44319948_b.csv,44319948_b.csv
1583,csv_44319949,csv,text/csv,csv/44319949.csv,44319949.csv
1584,csv_44319950_a,csv,text/csv,csv/44319950_a.csv,44319950_a.csv


### Create the vendor inventory
- Based upon METS file output. Assumes that vendor filenames are based upon DRS id.

In [13]:
# print function documentation
print('{}'.format(util.create_vendor_inventory.__doc__))

# create the vendor inventory based upon the mets dataframe
vendor_inventory_df = util.create_vendor_inventory(mets_df, drsids=True, path=g_data_directory)

# print the number of files in the inventory
print('Num files: {}'.format(len(vendor_inventory_df)))

# display inventory
display(vendor_inventory_df)


    Given a DataFrame of METS information (retrieved from 'mets_to_dataframe'),
    process the metadata and return a DataFrame of information about the files
    that can be easily compared to other file inventories.

    Parameters
    ----------
    mets_df : DataFrame
        Output of call to mets_to_dataframe
    drsids : bool
        Parse and output DRS ids (default, True)
        Assumes that mets_df contains DRS ids
    path : str (optional)
        Full path to directory of data files
    
    Return
    ------
    DataFrame
    
Num files: 1586


Unnamed: 0,file_type,mimetype,filename,filepath,path,drs_id
0,image,image/jpeg,44319541.jpg,../data/trade_statistics1/image/44319541.jpg,../data/trade_statistics1,44319541
1,image,image/jpeg,44319542.jpg,../data/trade_statistics1/image/44319542.jpg,../data/trade_statistics1,44319542
2,image,image/jpeg,44319543.jpg,../data/trade_statistics1/image/44319543.jpg,../data/trade_statistics1,44319543
3,image,image/jpeg,44319544.jpg,../data/trade_statistics1/image/44319544.jpg,../data/trade_statistics1,44319544
4,image,image/jpeg,44319545.jpg,../data/trade_statistics1/image/44319545.jpg,../data/trade_statistics1,44319545
...,...,...,...,...,...,...
1581,csv,text/csv,44319948_a.csv,../data/trade_statistics1/csv/44319948_a.csv,../data/trade_statistics1,44319948
1582,csv,text/csv,44319948_b.csv,../data/trade_statistics1/csv/44319948_b.csv,../data/trade_statistics1,44319948
1583,csv,text/csv,44319949.csv,../data/trade_statistics1/csv/44319949.csv,../data/trade_statistics1,44319949
1584,csv,text/csv,44319950_a.csv,../data/trade_statistics1/csv/44319950_a.csv,../data/trade_statistics1,44319950


### Load vendor DRS ids inventory and digital object OSN inventory

In [14]:
# read the digital object osn inventory file
do_osn_inventory_df = pd.read_csv(g_do_osn_inventory, header=0)

#### Create mapping of vendor inventory files to owner-supplied names files

In [15]:
import importlib
importlib.reload(util)

# print function documentation
print('{}'.format(util.map_drs_vendor_inventory.__doc__))

# map the drs vendor inventory
df = util.map_drs_vendor_inventory(vendor_inventory_df, do_osn_inventory_df)

# write mapped vendor inventory to a file
df.to_csv(g_mapped_vendor_inventory,index=False)

display(df)


    Given a vendor inventory that uses DRS ids and an inventory 
    of matching content that uses owner-supplied names, generate an
    inventory of new names based upon owner-supplied names.

    Parameters
    ----------
    vendor_inventory_df : DataFrame
        Vendor inventory that uses DRS ids
    do_osn_inventory_df : DataFrame
        Digital object inventory that uses owner-supplied names

    Return
    ------
    DataFrame
    


Unnamed: 0,file_type,mimetype,filename,filepath,path,drs_id,filename_osn,filepath_osn
0,image,image/jpeg,44319541.jpg,../data/trade_statistics1/image/44319541.jpg,../data/trade_statistics1,44319541,005825557_pt1_00001.innodata.jpg,../data/trade_statistics1/image/005825557_pt1_...
1,image,image/jpeg,44319542.jpg,../data/trade_statistics1/image/44319542.jpg,../data/trade_statistics1,44319542,005825557_pt1_00002.innodata.jpg,../data/trade_statistics1/image/005825557_pt1_...
2,image,image/jpeg,44319543.jpg,../data/trade_statistics1/image/44319543.jpg,../data/trade_statistics1,44319543,005825557_pt1_00003.innodata.jpg,../data/trade_statistics1/image/005825557_pt1_...
3,image,image/jpeg,44319544.jpg,../data/trade_statistics1/image/44319544.jpg,../data/trade_statistics1,44319544,005825557_pt1_00004.innodata.jpg,../data/trade_statistics1/image/005825557_pt1_...
4,image,image/jpeg,44319545.jpg,../data/trade_statistics1/image/44319545.jpg,../data/trade_statistics1,44319545,005825557_pt1_00005.innodata.jpg,../data/trade_statistics1/image/005825557_pt1_...
...,...,...,...,...,...,...,...,...
1581,csv,text/csv,44319948_a.csv,../data/trade_statistics1/csv/44319948_a.csv,../data/trade_statistics1,44319948,005825557_pt3_00138_a.innodata.csv,../data/trade_statistics1/csv/005825557_pt3_00...
1582,csv,text/csv,44319948_b.csv,../data/trade_statistics1/csv/44319948_b.csv,../data/trade_statistics1,44319948,005825557_pt3_00138_b.innodata.csv,../data/trade_statistics1/csv/005825557_pt3_00...
1583,csv,text/csv,44319949.csv,../data/trade_statistics1/csv/44319949.csv,../data/trade_statistics1,44319949,005825557_pt3_00139.innodata.csv,../data/trade_statistics1/csv/005825557_pt3_00...
1584,csv,text/csv,44319950_a.csv,../data/trade_statistics1/csv/44319950_a.csv,../data/trade_statistics1,44319950,005825557_pt3_00140_a.innodata.csv,../data/trade_statistics1/csv/005825557_pt3_00...


#### Rename vendor files

In [None]:
# print function documentation
print('{}'.format(util.rename_vendor_files.__doc__))

# rename files
status = util.rename_vendor_files(df)
print (status)

**End document.**