# P2P (Procure to Pay) Data Analysis & Visualization, Machine Learning Predictive Analytics using Julia Language

This is **Part - 3** of 3 ERP Data analysis notebooks.
- Part 1 - General Ledger, Data Science Basics
- Part 2 - General Ledger Data Analysis & Visualization
- Part 3 - P2P (Procure to Pay) Data Analysis & Visualization

**Related blogs:**
    
- [Web-scrapping, Web automation using Julia Language](https://amit-shukla.medium.com/web-scrapping-web-automation-using-julia-language-2c473db84fbc)
- Working with ODBC, ORM, XML, JSON, PDF, TXT, CSV, XLS
- Working with PDF documents, Image Scanner, OCR Reader

**Target Audience:** This notebook, is meant for ERP consultants, IT Developers, Finance, Supply chain, HR & CRM managers, executive leaders or anyone curious to implement data science concepts in ERP space.

+ **Author:** Amit Shukla
+ **Contact:** info@elishconsulting.com

In part 1, 2 of 3 series notebooks, we covered basics & details of ERP Data Finance model and learned basics of DataFrames.jl package and looked into perform detail ERP Data Analysis with visualizations.


In this part 3 notebook, we will continue to analyze Supply Chain data in aspects of Procure to Pay P2P, often referred as Buy to Pay B2P.

## adding Packages

In [40]:
using Pkg
Pkg.add("DataFrames")
Pkg.add("Dates")
Pkg.add("CategoricalArrays")
Pkg.add("Interact")
Pkg.add("WebIO")
Pkg.add("CSV")
Pkg.add("XLSX")
Pkg.add("DelimitedFiles")
Pkg.add("Distributions")
Pkg.build("WebIO")
Pkg.status();

[32m[1m      Status[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [336ed68f] [39mCSV v0.10.3
 [90m [54eefc05] [39mCascadia v1.0.1
 [90m [324d7699] [39mCategoricalArrays v0.10.5
 [90m [8f4d0f93] [39mConda v1.7.0
 [90m [a93c6f00] [39mDataFrames v1.3.2
 [90m [31c24e10] [39mDistributions v0.25.53
 [90m [e30172f5] [39mDocumenter v0.27.15
 [90m [8f5d6c58] [39mEzXML v1.1.0
 [90m [708ec375] [39mGumbo v0.8.0
 [90m [cd3eb016] [39mHTTP v0.9.17
 [90m [7073ff75] [39mIJulia v1.23.2
 [90m [c601a237] [39mInteract v0.10.4
 [90m [0f8b85d8] [39mJSON3 v1.9.4
 [90m [b9914132] [39mJSONTables v1.0.3
 [90m [4d0d745f] [39mPDFIO v0.1.13
 [90m [c3e4b0f8] [39mPluto v0.18.4
 [90m [2dfb63ee] [39mPooledArrays v1.4.0
 [90m [438e738f] [39mPyCall v1.93.1
 [90m [88034a9c] [39mStringDistances v0.11.2
 [90m [a2db99b7] [39mTextAnalysis v0.7.3
 [90m [05625dda] [39mWebDriver v0.1.2
 [90m [0f1e0344] [39mWebIO v0.8.17
 [90m [fdbf4ff8] [39mXLSX v0.7.9
 [90m [ade2ca70]

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.juli

In [41]:
using DataFrames, Dates, Interact, CategoricalArrays, WebIO, CSV, XLSX, DelimitedFiles, Distributions

*rest of this blog, I will assume, you have added all packages and imported in current namespace/notebook scope.*

--- 
## Supply Chain Data Model
We already covered DataFrames and ERP Finance data model in Part 1 & Part 2 notebooks, in below section, let's recreate all Supply Chain DataFrames to continue advance analytics and visualization.

#### Dimensions

- Item master, Item Attribs, Item Costing

    **UNSPSC:**  The United Nations Standard Products and Services Code® (UNSPSC®) is a global classification system of products and services.
                These codes are used to classify products and services.
    
    **GUDID:** The Global Unique Device Identification Database (GUDID) is a database administered by the FDA that will serve as a reference catalog for every device with a unique device identifier (UDI).

    **GTIN:** Global Trade Item Number (GTIN) can be used by a company to uniquely identify all of its trade items. GS1 defines trade items as products or services that are priced, ordered or invoiced at any point in the supply chain.

    **GMDN:** The Global Medical Device Nomenclature (GMDN) is a comprehensive set of terms, within a structured category hierarchy, which name and group ALL medical device products including implantables, medical equipment, consumables, and diagnostic devices.
    
    
- Vendor master, Vendor Attribs, Vendor Costing
    Customer/Buyer/Procurement Officer Attribs
    shipto, warehouse, storage & inventory locations

#### Transactions

-   PurchaseOrder
-   MSR - Material Service
-   Voucher
-   Invoice
-   Receipt
-   Shipment
-   Sales, Revenue
-   Travel, Expense, TimeCard
-   Accounting Lines

## Item master

In [2]:
###############################
## create SUPPLY CHAIN DATA ###
###############################
# Item master, Item Attribs, Item Costing ##
#       UNSPSC, GUDID, GTIN, GMDN
############################################

##########
# UNSPSC #
##########
# UNSPSC file can be downloaded from this link https://www.ungm.org/Public/UNSPSC
xf = XLSX.readxlsx("sampleData/UNGM_UNSPSC_12-Apr-2022.xlsx")
# xf will display names of sheets and rows with data
# let's read this data in to a DataFrame

# using below command will read xlsx data into DataFrame but will not render column labels
# df = DataFrame(XLSX.readdata("UNGM_UNSPSC_09-Apr-2022..xlsx", "UNSPSC", "A1:D12988"), :auto)
dfUNSPSC = DataFrame(XLSX.readtable("sampleData/UNGM_UNSPSC_09-Apr-2022..xlsx", "UNSPSC")...)
# ... operator will splat the tuple (data, column_labels) into the constructor of DataFrame

# replace missing values with an integer 99999
replace!(dfUNSPSC."Parent key", missing => 99999)
size(dfUNSPSC)

# let's export this clean csv, we'll load this into database
CSV.write("UNSPSC.csv", dfUNSPSC)

# # remember to empty dataFrame after usage
# # Julia will flush it out automatically after session,
# # but often ERP data gets bulky during session
# Base.summarysize(dfUNSPSC)
# empty!(dfUNSPSC)
# Base.summarysize(dfUNSPSC)

14-element Vector{String}:
 ".ipynb_checkpoints"
 "1-installation.ipynb"
 "FuzzyWuzzy_NLP.ipynb"
 "Image_Scanner_Reader_OCR.ipynb"
 "JuliaDataFrames-Part-1.ipynb"
 "JuliaDataFrames-Part-2.ipynb"
 "JuliaDataFrames-Part-3.ipynb"
 "SampleData"
 "UNSPSC.csv"
 "WebScrapper.ipynb"
 "Working_with_ORM-XML_JSON_Parser.ipynb"
 "images"
 "setup local machine, iPad, Andr" ⋯ 23 bytes ⋯ "ang Data Science computing.html"
 "setup local machine, iPad, Andr" ⋯ 22 bytes ⋯ "Lang Data Science computing.pdf"

In [2]:
##########
# GUDID ##
##########
# The complete list of GUDID Data Elements and descriptions can be found at this link.
# https://www.fda.gov/media/120974/download
# The complete GUDID Database (delimited version) download (250+MB)
# https://accessgudid.nlm.nih.gov/release_files/download/AccessGUDID_Delimited_Full_Release_20220401.zip
# let's extract all GUDID files in a folder
# readdir(pwd())
# readdir("sampleData/GUDID")
# since these files are in txt (delimited) format, we'll use delimited pkg

########################
## large txt files #####
## read one at a time ##
########################

# data, header = readdlm("sampleData/GUDID/contacts.txt", '|', header=true)
# dfGUDIDcontacts = DataFrame(data, vec(header))

# data, header = readdlm("sampleData/GUDID/identifiers.txt", '|', header=true)
# dfGUDIDidentifiers = DataFrame(data, vec(header))

data, header = readdlm("sampleData/GUDID/device.txt", '|', header=true)
dfGUDIDdevice = DataFrame(data, vec(header))

# # remember to empty dataFrame after usage
# # Julia will flush it out automatically after session,
# # but often ERP data gets bulky during session
# Base.summarysize(dfGUDIDcontacts),Base.summarysize(dfGUDIDidentifiers),Base.summarysize(dfGUDIDdevice)
# empty!(dfGUDIDcontacts)
# empty!(dfGUDIDidentifiers)
# empty!(dfGUDIDdevice)
# Base.summarysize(dfGUDIDcontacts),Base.summarysize(dfGUDIDidentifiers),Base.summarysize(dfGUDIDdevice)

Unnamed: 0_level_0,PrimaryDI,publicDeviceRecordKey,publicVersionStatus,deviceRecordStatus
Unnamed: 0_level_1,Any,Any,Any,Any
1,846468020071,3b9dc245-4402-48b5-aff0-8ae4187f46e5,Update,Published
2,846468020064,ad12b359-bfe3-4c0d-88da-4ee898f60009,Update,Published
3,846468020057,56f01051-273c-43a2-9451-12d6468f1e11,Update,Published
4,846468020040,d11bb977-56c4-413b-adad-f1183708e484,Update,Published
5,846468020033,f65b67b6-c828-4923-b759-313875487489,Update,Published
6,846468020026,e1a03cd1-dae1-4de3-a680-cb2f9bb1aaec,Update,Published
7,846468020019,46d86100-7844-4626-92dc-9cf400c81f25,Update,Published
8,846468020002,40dbe60d-0f3f-422f-b7cd-4699c8d187e0,Update,Published
9,846468019990,ca90fcf1-40d2-40f3-855f-e630a79ab4a0,Update,Published
10,846468019983,a1ec6893-0254-43cf-a389-aabf664d46ee,Update,Published


In [12]:
# dfGUDIDdevice has more than 3308327 rows,
# let's split this in 6 mini files, 
# so that, it can be loaded into RDBMS easily
size(dfGUDIDdevice)
# CSV.write("dfGUDIDdevice_1.csv", dfGUDIDdevice[1:500000,:])
# CSV.write("dfGUDIDdevice_2.csv", dfGUDIDdevice[500001:1000000,:])
# CSV.write("dfGUDIDdevice_3.csv", dfGUDIDdevice[1000001:1500000,:])
# CSV.write("dfGUDIDdevice_4.csv", dfGUDIDdevice[1500001:2000000,:])
# CSV.write("dfGUDIDdevice_5.csv", dfGUDIDdevice[2000001:2500000,:])
# CSV.write("dfGUDIDdevice_6.csv", dfGUDIDdevice[2500001:3308327,:])

(3308327, 34)

In [24]:
##########
# GTIN ###
##########

# xf = XLSX.readxlsx("SampleData/DS_GTIN_ALL.xlsx")
# xf will display names of sheets and rows with data
# let's read this data in to a DataFrame

# using below command will read xlsx data into DataFrame but will not render column labels
# df = DataFrame(XLSX.readdata("SampleData/DS_GTIN_ALL.xlsx", "Worksheet", "A14:E143403   "), :auto)
dfGTIN = DataFrame(XLSX.readtable("sampleData/DS_GTIN_ALL.xlsx", "Worksheet";first_row=14)...)
# ... operator will splat the tuple (data, column_labels) into the constructor of DataFrame

# replace missing values with an integer 99999
# replace!(dfUNSPSC."Parent key", missing => 99999)
# size(dfUNSPSC)

# let's export this clean csv, we'll load this into database
# CSV.write("UNSPSC.csv", dfUNSPSC)
# readdir(pwd())

# # remember to empty dataFrame after usage
# # Julia will flush it out automatically after session,
# # but often ERP data gets bulky during session
# Base.summarysize(dfGTIN)
# empty!(dfGTIN)
# Base.summarysize(dfGTIN)

Unnamed: 0_level_0,150621010,10603295507444,ATTUNE FB TIB BASE SZ 10 POR,Orthopaedics,EA
Unnamed: 0_level_1,Any,Any,Any,Any,Any
1,254505412-12,10603295480709,ATTUNE RP PS TRL SZ 9 7MM,Orthopaedics,EA
2,L20409-13,10603295258117,BROACH CORAIL AMT 9,Orthopaedics,EA
3,257004050-12,10603295490449,C-STEM AMT SZ0-1 HI NECK,Orthopaedics,EA
4,254505411-12,10603295480693,ATTUNE RP PS TRL SZ 9 6MM,Orthopaedics,EA
5,L20416-13,10603295258186,BROACH CORAIL AMT 16,Orthopaedics,EA
6,257004000-12,10603295490432,C-STEM AMT SZ0-1 STD NECK,Orthopaedics,EA
7,254505410-12,10603295480648,ATTUNE RP PS TRL SZ 9 5MM,Orthopaedics,EA
8,L20414-13,10603295258162,BROACH CORAIL AMT 14,Orthopaedics,EA
9,257004300-12,10603295490494,C-STEM AMT 6-7 STD NECK,Orthopaedics,EA
10,254505413-12,10603295480716,ATTUNE RP PS TRL SZ 9 8MM,Orthopaedics,EA


In [25]:
##########
# GMDN ###
##########

## GMDN data is not available

# # remember to empty dataFrame after usage
# # Julia will flush it out automatically after session,
# # but often ERP data gets bulky during session
# Base.summarysize(dfGMDN)
# empty!(dfGMDN)
# Base.summarysize(dfGMDN)

## Vendor Master

In [54]:
#################
# Vendor master #
#################
# create Vendor Master from GUDID dataset
# show(first(dfGUDIDdevice,5), allcols=true)
# show(first(dfGUDIDdevice[:,[:brandName, :catalogNumber, :dunsNumber, :companyName, :rx, :otc]],5), allcols=true)
# names(dfGUDIDdevice)
# dfVendor = unique(dfGUDIDdevice[:,[:brandName, :catalogNumber, :dunsNumber, :companyName, :rx, :otc]])
# dfVendor = unique(dfGUDIDdevice[:,[:companyName]]) # 7574 unique vendors
dfVendor = unique(dfGUDIDdevice[:,[:brandName, :dunsNumber, :companyName, :rx, :otc]])
# dfVendor is a good dataset, have 216k rows for 7574 unique vendors

# # remember to empty dataFrame after usage
# # Julia will flush it out automatically after session,
# # but often ERP data gets bulky during session
# Base.summarysize(dfVendor)
# empty!(dfVendor)
# Base.summarysize(dfVendor)

Unnamed: 0_level_0,brandName,dunsNumber,companyName
Unnamed: 0_level_1,Any,Any,Any
1,QUANTUM™* SPINAL FIXATION SYSTEM,793384496,"Pioneer Surgical Technology, Inc."
2,BULLET-TIP PEEK IBF SYSTEM,793384496,"Pioneer Surgical Technology, Inc."
3,Streamline® TL Spinal Fixation System,793384496,"Pioneer Surgical Technology, Inc."
4,Cequence Anterior Cervical Plate,793384496,"Pioneer Surgical Technology, Inc."
5,CEQUENCE,793384496,"Pioneer Surgical Technology, Inc."
6,Cequence,793384496,"Pioneer Surgical Technology, Inc."
7,STREAMLINE®* TL SPINAL FIXATION SYSTEM,793384496,"Pioneer Surgical Technology, Inc."
8,Streamline TL Spinal Fixation System,793384496,"Pioneer Surgical Technology, Inc."
9,Songer® Spinal Cable System,793384496,"Pioneer Surgical Technology, Inc."
10,Streamline® MIS Spinal Fixation System,793384496,"Pioneer Surgical Technology, Inc."


## Location Master

In [4]:
data, header = readdlm("sampleData/uscities.csv", ',', header=true)
dfLocation = DataFrame(data, vec(header))

# # remember to empty dataFrame after usage
# # Julia will flush it out automatically after session,
# # but often ERP data gets bulky during session
# Base.summarysize(dfLocation)
# empty!(dfLocation)
# Base.summarysize(dfLocation)

Unnamed: 0_level_0,city,city_ascii,state_id,state_name,county_fips,county_name
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any
1,New York,New York,NY,New York,36061,New York
2,Los Angeles,Los Angeles,CA,California,6037,Los Angeles
3,Chicago,Chicago,IL,Illinois,17031,Cook
4,Miami,Miami,FL,Florida,12086,Miami-Dade
5,Dallas,Dallas,TX,Texas,48113,Dallas
6,Philadelphia,Philadelphia,PA,Pennsylvania,42101,Philadelphia
7,Houston,Houston,TX,Texas,48201,Harris
8,Atlanta,Atlanta,GA,Georgia,13121,Fulton
9,Washington,Washington,DC,District of Columbia,11001,District of Columbia
10,Boston,Boston,MA,Massachusetts,25025,Suffolk


In [36]:
readdir("sampleData/GUDID")

9-element Vector{String}:
 "contacts.txt"
 "device.txt"
 "deviceSizes.txt"
 "environmentalConditions.txt"
 "gmdnTerms.txt"
 "identifiers.txt"
 "premarketSubmissions.txt"
 "productCodes.txt"
 "sterilizationMethodTypes.txt"

## Organization Master

In [5]:
dfOrgMaster = DataFrame(
    ENTITY=repeat(["HeadOffice"], inner=8),
    GROUP=repeat(["Operations"], inner=8),
    DEPARTMENT=["Procurement","Procurement","Procurement","Procurement","Procurement","HR","HR","MFG"],
    UNIT=["Sourcing","Sourcing","Maintenance","Support","Services","Helpdesk","ServiceCall","IT"])

Unnamed: 0_level_0,ENTITY,GROUP,DEPARTMENT,UNIT
Unnamed: 0_level_1,String,String,String,String
1,HeadOffice,Operations,Procurement,Sourcing
2,HeadOffice,Operations,Procurement,Sourcing
3,HeadOffice,Operations,Procurement,Maintenance
4,HeadOffice,Operations,Procurement,Support
5,HeadOffice,Operations,Procurement,Services
6,HeadOffice,Operations,HR,Helpdesk
7,HeadOffice,Operations,HR,ServiceCall
8,HeadOffice,Operations,MFG,IT


--- 

## creating complete Supply Chain Data Model DataFrames
now since we created Supply chain attribute / chartfields/dimensions

- item master
- vendor master
- location master
- org Hierarchy

using above chartfields, let's create following Supply Chain Transactions

-   MSR - Material Service request
-   PurchaseOrder
-   Voucher
-   Invoice
-   Receipt
-   Shipment
-   Sales, Revenue
-   Travel, Expense, TimeCard
-   Accounting Lines

## MSR - Material Service request

In [38]:
sampleSize = 1000 # number of rows, scale as needed

dfMSR = DataFrame(
    UNIT = rand(dfOrgMaster.UNIT, sampleSize),
    MSR_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
    FROM_UNIT = rand(dfOrgMaster.UNIT, sampleSize),
    TO_UNIT = rand(dfOrgMaster.UNIT, sampleSize),
    GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
    QTY = rand(dfOrgMaster.UNIT, sampleSize));
first(dfMSR, 5)

Unnamed: 0_level_0,UNIT,MSR_DATE,FROM_UNIT,TO_UNIT,GUDID,QTY
Unnamed: 0_level_1,String,Date,String,String,Any,String
1,Sourcing,2020-08-12,Helpdesk,Sourcing,8806189833680,Support
2,Sourcing,2022-03-09,Helpdesk,IT,24026704518511,IT
3,Services,2020-09-28,Sourcing,Maintenance,15019517261912,ServiceCall
4,IT,2020-05-11,Support,Sourcing,190746044397,Services
5,Services,2021-03-02,Sourcing,Sourcing,650551143666,ServiceCall


## Purchase Order

In [72]:
sampleSize = 1000 # number of rows, scale as needed

dfPO = DataFrame(
    UNIT = rand(dfOrgMaster.UNIT, sampleSize),
    PO_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
    VENDOR=rand(unique(dfVendor.companyName), sampleSize),
    GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
    QTY = rand(1:150, sampleSize),
    UNIT_PRICE = rand(Normal(100, 2), sampleSize)
    );
show(first(dfPO, 5),allcols=true)

[1m5×6 DataFrame[0m
[1m Row [0m│[1m UNIT        [0m[1m PO_DATE    [0m[1m VENDOR                   [0m[1m GUDID          [0m[1m QTY   [0m[1m UNIT_PRICE [0m
[1m     [0m│[90m String      [0m[90m Date       [0m[90m Any                      [0m[90m Any            [0m[90m Int64 [0m[90m Float64    [0m
─────┼──────────────────────────────────────────────────────────────────────────────────────
   1 │ ServiceCall  2020-12-08  Nisco Co.,Ltd             4019702889710      39    101.154
   2 │ Sourcing     2021-11-22  ALYK, INC.                613994826619       33     96.4384
   3 │ Helpdesk     2020-12-15  Tribofilm Research, Inc.  699753511200       35     98.1189
   4 │ ServiceCall  2022-01-19  FIRST CALL, INC.          4035324027330     112     99.62
   5 │ Sourcing     2020-05-04  APIRA SCIENCE, INC.       10885862269799    110     97.3548

## Voucher Invoice

In [78]:
sampleSize = 1000 # number of rows, scale as needed

dfVCHR = DataFrame(
    UNIT = rand(dfOrgMaster.UNIT, sampleSize),
    VCHR_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
    STATUS=rand(["Closed","Paid","Open","Cancelled","Exception"], sampleSize),
    VENDOR_INVOICE_NUM = rand(10001:9999999, sampleSize),
    VENDOR=rand(unique(dfVendor.companyName), sampleSize),
    GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
    QTY = rand(1:150, sampleSize),
    UNIT_PRICE = rand(Normal(100, 2), sampleSize)
    );
show(first(dfVCHR, 5),allcols=true)

[1m5×8 DataFrame[0m
[1m Row [0m│[1m UNIT        [0m[1m VCHR_DATE  [0m[1m STATUS    [0m[1m VENDOR_INVOICE_NUM [0m[1m VENDOR                           [0m[1m GUDID          [0m[1m QTY   [0m[1m UNIT_PRICE [0m
[1m     [0m│[90m String      [0m[90m Date       [0m[90m String    [0m[90m Int64              [0m[90m Any                              [0m[90m Any            [0m[90m Int64 [0m[90m Float64    [0m
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Maintenance  2020-08-17  Paid                  8914668  HEALTHDENT'L, LLC                 M684EIC7S17R1     108    101.018
   2 │ Services     2021-08-07  Open                  2740714  CERECOR INC.                      3596010562425     116     98.2086
   3 │ Maintenance  2021-10-12  Paid                  9911763  PRODUITS DENTAIRES PIERRE ROLLAN  15019517251111     21     99.5375
   4 │ Sourcing     2020-01-12  Paid  

## SALES

In [80]:
sampleSize = 1000 # number of rows, scale as needed

dfREVENUE = DataFrame(
    UNIT = rand(dfOrgMaster.UNIT, sampleSize),
    SALES_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
    STATUS=rand(["Sold","Pending","Hold","Cancelled","Exception"], sampleSize),
    SALES_RECEIPT_NUM = rand(10001:9999999, sampleSize),
    CUSTOMER=rand(unique(dfVendor.companyName), sampleSize),
    GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
    QTY = rand(1:150, sampleSize),
    UNIT_PRICE = rand(Normal(100, 2), sampleSize)
    );
show(first(dfREVENUE, 5),allcols=true)

[1m5×8 DataFrame[0m
[1m Row [0m│[1m UNIT        [0m[1m SALES_DATE [0m[1m STATUS    [0m[1m SALES_RECEIPT_NUM [0m[1m CUSTOMER                          [0m[1m GUDID            [0m[1m QTY   [0m[1m UNIT_PRICE [0m
[1m     [0m│[90m String      [0m[90m Date       [0m[90m String    [0m[90m Int64             [0m[90m Any                               [0m[90m Any              [0m[90m Int64 [0m[90m Float64    [0m
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Support      2021-04-03  Pending              4775538  CUSTOM ORTHOPAEDIC SOLUTIONS, IN…  380652274807         80    100.989
   2 │ IT           2021-06-15  Cancelled            8054668  PaloDEx Group Oy                   10653160319528       83     98.3144
   3 │ ServiceCall  2021-09-16  Exception            5556389  AUS SYSTEMS PTY. LIMITED.          652221017635         46    100.07
   4 │ IT           2021-10-

## SHIPMENT, RECEIPT

In [82]:
sampleSize = 1000 # number of rows, scale as needed

dfSHIPRECEIPT = DataFrame(
    UNIT = rand(dfOrgMaster.UNIT, sampleSize),
    SHIP_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
    STATUS=rand(["Shippped","Returned","In process","Cancelled","Exception"], sampleSize),
    SHIPMENT_NUM = rand(10001:9999999, sampleSize),
    CUSTOMER=rand(unique(dfVendor.companyName), sampleSize),
    GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
    QTY = rand(1:150, sampleSize),
    UNIT_PRICE = rand(Normal(100, 2), sampleSize)
    );
show(first(dfSHIPRECEIPT, 5),allcols=true)

[1m5×8 DataFrame[0m
[1m Row [0m│[1m UNIT        [0m[1m SHIP_DATE  [0m[1m STATUS    [0m[1m SHIPMENT_NUM [0m[1m CUSTOMER                 [0m[1m GUDID             [0m[1m QTY   [0m[1m UNIT_PRICE [0m
[1m     [0m│[90m String      [0m[90m Date       [0m[90m String    [0m[90m Int64        [0m[90m Any                      [0m[90m Any               [0m[90m Int64 [0m[90m Float64    [0m
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ ServiceCall  2022-04-26  Shippped        3617596  ARUM DENTISTRY Co., Ltd.  M572TPS020143N011     14    101.686
   2 │ ServiceCall  2022-02-28  Returned        9763965  EnviteC-Wismar GmbH       8806344370869         53     98.116
   3 │ Support      2020-06-18  Exception       6876443  Xvivo Perfusion AB        8800057255178         10     96.342
   4 │ Sourcing     2021-08-28  Returned        2856203  Airway Company, The       840159910191          6