# ML4GL Machine Learning for General Ledger using Julia Language

This is **Part - 2** of 3 ERP Data analysis notebooks.
- Part 1 - General Ledger, Data Science Basics
- Part 2 - General Ledger Data Analysis & Visualization
- Part 3 - P2P (Procure to Pay) Data Analysis & Visualization

**Related blogs:**
    
- [Web-scrapping, Web automation using Julia Language](https://amit-shukla.medium.com/web-scrapping-web-automation-using-julia-language-2c473db84fbc)
- Working with ODBC, ORM, XML, JSON, PDF, TXT, CSV, XLS
- Working with PDF documents, Image Scanner, OCR Reader

**Target Audience:** This notebook, is meant for ERP consultants, IT Developers, Finance, Supply chain, HR & CRM managers, executive leaders or anyone curious to implement data science concepts in ERP space.

+ **Author:** Amit Shukla
+ **Contact:** info@elishconsulting.com

In part 1 of 3 series notebooks, we covered basics of ERP Data Finance model and learned basics of DataFrames.jl package to perform ERP Data Analysis.

In this part 2 notebook, we will continue to analyze Finance data, in this notebook, we will cover Visualization, what-if analysis and Machine learning, Statistical modeling to predict organization Financial growth in terms of Finance statements (Balance Sheet, Income and Cash Flow statements.

## adding Packages

In [80]:
using Pkg
Pkg.add("DataFrames")
Pkg.add("Dates")
Pkg.add("CategoricalArrays")
Pkg.add("Interact")
Pkg.add("WebIO")
Pkg.build("WebIO")
using DataFrames, Dates, Interact, CategoricalArrays, WebIO
Pkg.status();

[32m[1m      Status[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [54eefc05] [39mCascadia v1.0.1
 [90m [324d7699] [39mCategoricalArrays v0.10.5
 [90m [a93c6f00] [39mDataFrames v1.3.2
 [90m [8f5d6c58] [39mEzXML v1.1.0
 [90m [708ec375] [39mGumbo v0.8.0
 [90m [cd3eb016] [39mHTTP v0.9.17
 [90m [7073ff75] [39mIJulia v1.23.2
 [90m [c601a237] [39mInteract v0.10.4
 [90m [0f8b85d8] [39mJSON3 v1.9.4
 [90m [b9914132] [39mJSONTables v1.0.3
 [90m [4d0d745f] [39mPDFIO v0.1.13
 [90m [c3e4b0f8] [39mPluto v0.18.4
 [90m [2dfb63ee] [39mPooledArrays v1.4.0
 [90m [88034a9c] [39mStringDistances v0.11.2
 [90m [a2db99b7] [39mTextAnalysis v0.7.3
 [90m [05625dda] [39mWebDriver v0.1.2
 [90m [0f1e0344] [39mWebIO v0.8.17
 [90m [ade2ca70] [39mDates


[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.juli

*rest of this blog, I will assume, you have added all packages and imported in current namespace/notebook scope.*

--- 
## Finance Data Model
We already covered DataFrames and ERP Finance data model in Part 1 notebook, in below section, let's recreate all Finance DataFrames to continue advance analytics and visualization.

In [81]:
# create dummy data
accountsDF = DataFrame(
    ENTITY = "Apple Inc.",
    AS_OF_DATE=Date("1900-01-01", dateformat"y-m-d"),
    ID = 11000:1000:45000,
    CLASSIFICATION=repeat([
        "OPERATING_EXPENSES","NON-OPERATING_EXPENSES", "ASSETS","LIABILITIES","NET_WORTH","STATISTICS","REVENUE"
                ], inner=5),
    CATEGORY=[
        "Travel","Payroll","non-Payroll","Allowance","Cash",
        "Facility","Supply","Services","Investment","Misc.",
        "Depreciation","Gain","Service","Retired","Fault.",
        "Receipt","Accrual","Return","Credit","ROI",
        "Cash","Funds","Invest","Transfer","Roll-over",
        "FTE","Members","Non_Members","Temp","Contractors",
        "Sales","Merchant","Service","Consulting","Subscriptions"],
    STATUS="A",
    DESCR=repeat([
    "operating expenses","non-operating expenses","assets","liability","net-worth","stats","revenue"], inner=5),
    ACCOUNT_TYPE=repeat(["E","E","A","L","N","S","R"],inner=5));

# DEPARTMENT Chartfield
deptDF = DataFrame(
    AS_OF_DATE=Date("2000-01-01", dateformat"y-m-d"), 
    ID = 1100:100:1500,
    CLASSIFICATION=["SALES","HR", "IT","BUSINESS","OTHERS"],
    CATEGORY=["sales","human_resource","IT_Staff","business","others"],
    STATUS="A",
    DESCR=[
    "Sales & Marketing","Human Resource","Infomration Technology","Business leaders","other temp"
        ],
    DEPT_TYPE=["S","H","I","B","O"]);

# LOCATION Chartfield
locationDF = DataFrame(
    AS_OF_DATE=Date("2000-01-01", dateformat"y-m-d"), 
    ID = 11:1:22,
    CLASSIFICATION=repeat([
        "Region A","Region B", "Region C"], inner=4),
    CATEGORY=repeat([
        "Region A","Region B", "Region C"], inner=4),
    STATUS="A",
    DESCR=[
"Boston","New York","Philadelphia","Cleveland","Richmond",
"Atlanta","Chicago","St. Louis","Minneapolis","Kansas City",
"Dallas","San Francisco"],
    LOC_TYPE="Physical");

# creating Ledger
ledgerDF = DataFrame(
            LEDGER = String[], FISCAL_YEAR = Int[], PERIOD = Int[], ORGID = String[],
            OPER_UNIT = String[], ACCOUNT = Int[], DEPT = Int[], LOCATION = Int[],
            POSTED_TOTAL = Float64[]
            );

# create 2020 Period 1-12 Actuals Ledger 
l = "Actuals";
fy = 2020;
for p = 1:12
    for i = 1:10^5
        push!(ledgerDF, (l, fy, p, "ABC Inc.", rand(locationDF.CATEGORY),
            rand(accountsDF.ID), rand(deptDF.ID), rand(locationDF.ID), rand()*10^8))
    end
end

# create 2021 Period 1-4 Actuals Ledger 
l = "Actuals";
fy = 2021;
for p = 1:4
    for i = 1:10^5
        push!(ledgerDF, (l, fy, p, "ABC Inc.", rand(locationDF.CATEGORY),
            rand(accountsDF.ID), rand(deptDF.ID), rand(locationDF.ID), rand()*10^8))
    end
end

# create 2021 Period 1-4 Budget Ledger 
l = "Budget";
fy = 2021;
for p = 1:12
    for i = 1:10^5
        push!(ledgerDF, (l, fy, p, "ABC Inc.", rand(locationDF.CATEGORY),
            rand(accountsDF.ID), rand(deptDF.ID), rand(locationDF.ID), rand()*10^8))
    end
end

# here is ~3 million rows ledger dataframe
size(ledgerDF)

# rename dimensions columns for innerjoin
df_accounts = rename(accountsDF, :ID => :ACCOUNTS_ID, :CLASSIFICATION => :ACCOUNTS_CLASSIFICATION, 
    :CATEGORY => :ACCOUNTS_CATEGORY, :DESCR => :ACCOUNTS_DESCR);
df_dept = rename(deptDF, :ID => :DEPT_ID, :CLASSIFICATION => :DEPT_CLASSIFICATION, 
    :CATEGORY => :DEPT_CATEGORY, :DESCR => :DEPT_DESCR);
df_location = rename(locationDF, :ID => :LOCATION_ID, :CLASSIFICATION => :LOCATION_CLASSIFICATION,
    :CATEGORY => :LOCATION_CATEGORY, :DESCR => :LOCATION_DESCR);

# join Ledger accounts chartfield with accounts chartfield dataframe to pull all accounts fields
# join Ledger dept chartfield with dept chartfield dataframe to pull all dept fields
# join Ledger location chartfield with location chartfield dataframe to pull all location fields
df_ledger = innerjoin(
                innerjoin(
                    innerjoin(ledgerDF, df_accounts, on = [:ACCOUNT => :ACCOUNTS_ID], makeunique=true),
                    df_dept, on = [:DEPT => :DEPT_ID], makeunique=true), df_location,
                on = [:LOCATION => :LOCATION_ID], makeunique=true);

# note, how ledger DF has 28 columns now (inclusive of all chartfields join)
size(df_accounts),size(df_dept),size(df_location), size(ledgerDF), size(df_ledger)

function periodToQtr(x)
    if x ∈ 1:3
        return 1
    elseif x ∈ 4:6
        return 2
    elseif x ∈ 7:9
        return 3
    else return 4
    end
end

# now we will use this function to transform a new column
transform!(df_ledger, :PERIOD => ByRow(periodToQtr) => :QTR)

# let's create one more generic function, which converts a number to USD currency
function numToCurrency(x)
        return string("USD ",round(x/10^6; digits = 2), " million")
end

transform!(df_ledger, :POSTED_TOTAL => ByRow(numToCurrency) => :TOTAL)
df_ledger[1:5,["POSTED_TOTAL","TOTAL"]]
"df_ledger_size after transformation is: ", size(df_ledger)

("df_ledger_size after transformation is: ", (2800000, 30))

## GL BalanceSheet, IncomeStatement & CashFlow

### Balance Sheet (Interactive)

In [83]:
@manipulate for ld = Dict("Actuals"=> "Actuals", "Budget" => "Budget"), 
                rg = Dict("Region A"=> "Region A", "Region B" => "Region B", "Region C" => "Region C"),
                yr = slider(2020:1:2022; value=2021),
                qtr = 1:1:4
    
    @show ld, rg, yr, qtr
    
select(gdf_plot[(
    (gdf_plot.FISCAL_YEAR .== yr)
    .&
    (gdf_plot.QTR .== qtr)
    .&
    (gdf_plot.LEDGER .== ld)
    .&
    (gdf_plot.OPER_UNIT .== rg)
    ),:],
        :OPER_UNIT => :Org,
        :FISCAL_YEAR => :FY,
        :QTR => :Qtr,
        :ACCOUNTS_CLASSIFICATION => :Accounts,
        :DEPT_CLASSIFICATION => :Dept,
        # :LOCATION_CLASSIFICATION => :Region,
        :LOCATION_DESCR => :Loc,
        :TOTAL => :TOTAL)
end

(ld, rg, yr, qtr) = ("Actuals", "Region B", 2021, 2)


(ld, rg, yr, qtr) = ("Actuals", "Region B", 2020, 2)
(ld, rg, yr, qtr) = ("Actuals", "Region B", 2021, 2)
(ld, rg, yr, qtr) = ("Actuals", "Region B", 2022, 2)
(ld, rg, yr, qtr) = ("Actuals", "Region B", 2022, 3)
(ld, rg, yr, qtr) = ("Actuals", "Region B", 2022, 4)
(ld, rg, yr, qtr) = ("Actuals", "Region B", 2022, 3)
(ld, rg, yr, qtr) = ("Actuals", "Region B", 2022, 2)
(ld, rg, yr, qtr) = ("Actuals", "Region B", 2021, 2)
(ld, rg, yr, qtr) = ("Budget", "Region B", 2021, 2)
(ld, rg, yr, qtr) = ("Budget", "Region C", 2021, 2)
(ld, rg, yr, qtr) = ("Actuals", "Region C", 2021, 2)


### Income Statement (Interactive)

In [74]:
@manipulate for ld = Dict("Actuals"=> "Actuals", "Budget" => "Budget"), 
                rg = Dict("Region A"=> "Region A", "Region B" => "Region B", "Region C" => "Region C"),
                yr = slider(2020:1:2022; value=2021),
                qtr = 1:1:4
    
    @show ld, rg, yr, qtr
    
select(gdf_plot[(
    (gdf_plot.FISCAL_YEAR .== yr)
    .&
    (gdf_plot.QTR .== qtr)
    .&
    (gdf_plot.LEDGER .== ld)
    .&
    (gdf_plot.OPER_UNIT .== rg)
    .&
    (in.(gdf_plot.ACCOUNTS_CLASSIFICATION, Ref(["ASSETS", "LIABILITIES", "REVENUE","NET_WORTH"])))
    ),:],
        :OPER_UNIT => :Org,
        :FISCAL_YEAR => :FY,
        :QTR => :Qtr,
        :ACCOUNTS_CLASSIFICATION => :Accounts,
        :DEPT_CLASSIFICATION => :Dept,
        # :LOCATION_CLASSIFICATION => :Region,
        :LOCATION_DESCR => :Loc,
        :TOTAL => :TOTAL)
end

(ld, rg, yr, qtr) = ("Actuals", "Region B", 2021, 2)


### Cash Flow Statement (Interactive)

In [75]:
@manipulate for ld = Dict("Actuals"=> "Actuals", "Budget" => "Budget"), 
                rg = Dict("Region A"=> "Region A", "Region B" => "Region B", "Region C" => "Region C"),
                yr = slider(2020:1:2022; value=2021),
                qtr = 1:1:4
    
    @show ld, rg, yr, qtr
    
select(gdf_plot[(
    (gdf_plot.FISCAL_YEAR .== yr)
    .&
    (gdf_plot.QTR .== qtr)
    .&
    (gdf_plot.LEDGER .== ld)
    .&
    (gdf_plot.OPER_UNIT .== rg)
    .&
    (in.(gdf_plot.ACCOUNTS_CLASSIFICATION, Ref(["NON-OPERATING_EXPENSES","OPERATING_EXPENSES"])))
    ),:],
        :OPER_UNIT => :Org,
        :FISCAL_YEAR => :FY,
        :QTR => :Qtr,
        :ACCOUNTS_CLASSIFICATION => :Accounts,
        :DEPT_CLASSIFICATION => :Dept,
        # :LOCATION_CLASSIFICATION => :Region,
        :LOCATION_DESCR => :Loc,
        :TOTAL => :TOTAL)
end

(ld, rg, yr, qtr) = ("Actuals", "Region B", 2021, 2)


In [18]:

md"""
## Ledger Visual
"""

# ╔═╡ 4e16f723-6c84-49c9-ad64-8f7b81bcc568
@bind ld_p Select(["Actuals", "Budget"])

# ╔═╡ c57d4b85-f157-43e7-85b6-d10af1c9cc9c
@bind yr_p Slider(2020:1:2021, default=2021, show_value=true)

# ╔═╡ 97d23b1b-927a-471c-9e0c-9eafead92167
@bind rg_p Select(["Region A", "Region B", "Region C"])

# ╔═╡ a431b45c-209d-4b31-ab03-e762958b095d
@bind ldescr Select(unique(location.DESCR))

# ╔═╡ 95302cab-0f88-4b93-88b8-47ce4af894fb
@bind adescr Select(unique(accounts.CLASSIFICATION))

# ╔═╡ 7d9b6029-e502-4065-9fa2-ef8f4da39021
@bind ddescr Select(unique(dept.CLASSIFICATION))

# ╔═╡ 90408ebd-ce89-48eb-ba66-5e56db44b8a2
begin
	plot_data = gdf_plot[(
		(gdf_plot.FISCAL_YEAR .== yr_p)
		.&
		(gdf_plot.LEDGER .== ld_p)
		.&
		(gdf_plot.OPER_UNIT .== rg_p)
		.&
		(gdf_plot.LOCATION_DESCR .== ldescr)
		.&
		(gdf_plot.DEPT_CLASSIFICATION .== ddescr)
		.&
		(gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
		, :];
	# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p for $rg_p")
	@df plot_data plot(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", 
		label=[
			"$ld_p by $yr_p for $rg_p $ldescr $adescr $ddescr"
			],
		lw=3)
end

# ╔═╡ 90049f35-f30b-4101-a8cf-1bd17f217998
md"""
## Actuals vs Budget comparison
"""

# ╔═╡ 9c220649-67e7-4adc-b1b6-be2feabe2313
begin
	plot_data_a = gdf_plot[(
		(gdf_plot.FISCAL_YEAR .== yr_p)
		.&
		(gdf_plot.LEDGER .== "Actuals")
		.&
		(gdf_plot.OPER_UNIT .== rg_p)
		.&
		(gdf_plot.LOCATION_DESCR .== ldescr)
		.&
		(gdf_plot.DEPT_CLASSIFICATION .== ddescr)
		.&
		(gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
		, :];
	# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p for $rg_p")
	plot_data_b = gdf_plot[(
		(gdf_plot.FISCAL_YEAR .== yr_p)
		.&
		(gdf_plot.LEDGER .== "Budget")
		.&
		(gdf_plot.OPER_UNIT .== rg_p)
		.&
		(gdf_plot.LOCATION_DESCR .== ldescr)
		.&
		(gdf_plot.DEPT_CLASSIFICATION .== ddescr)
		.&
		(gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
		, :];
	# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p for $rg_p")
	@df plot_data_a plot(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", 
		label=[
			"Actuals by $yr_p for $rg_p $ldescr $adescr $ddescr"
			],
		lw=3)
	@df plot_data_b plot!(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", 
		label=[
			"Budget by $yr_p for $rg_p $ldescr $adescr $ddescr"
			],
		lw=3)
end

# ╔═╡ bea7c89e-78ca-4a68-8487-f10f91ee6449
md"""
raw data in table format
"""

# ╔═╡ 3f6f4feb-3425-4f89-814d-4ebb7334d6c6
plot_data

# ╔═╡ b370f5c2-c6cc-4f2c-9d68-98a0dab6db3b
begin
	# plot_data = gdf_plot[(
	# 	(gdf_plot.FISCAL_YEAR .== yr_p)
	# 	.&
	# 	(gdf_plot.LEDGER .== ld_p)
	# 	.&
	# 	(gdf_plot.OPER_UNIT .== rg_p)
	# 	.&
	# 	(gdf_plot.LOCATION_DESCR .== ldescr)
	# 	.&
	# 	(gdf_plot.DEPT_CLASSIFICATION .== ddescr)
	# 	.&
	# 	(gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
	# 	, :];
	# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p for $rg_p")
	# @df gdf_plot plot(:QTR, :ACCOUNTS_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", 
	# 	label=[
	# 		"$ld_p by $yr_p for $rg_p $ldescr $adescr $ddescr"
	# 		],
	# 	lw=3)
	@df gdf_plot scatter(:QTR, :ACCOUNTS_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", 
		label=[
			"$ld_p by $yr_p for $rg_p $ldescr for $ddescr"
			],
		lw=3)
end

# ╔═╡ 9313e997-cead-48e0-aac9-784708c4221c
begin
	# plot_data = gdf_plot[(
	# 	(gdf_plot.FISCAL_YEAR .== yr_p)
	# 	.&
	# 	(gdf_plot.LEDGER .== ld_p)
	# 	.&
	# 	(gdf_plot.OPER_UNIT .== rg_p)
	# 	.&
	# 	(gdf_plot.LOCATION_DESCR .== ldescr)
	# 	.&
	# 	(gdf_plot.DEPT_CLASSIFICATION .== ddescr)
	# 	.&
	# 	(gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
	# 	, :];
	# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p for $rg_p")
	# @df gdf_plot plot(:QTR, :ACCOUNTS_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", 
	# 	label=[
	# 		"$ld_p by $yr_p for $rg_p $ldescr $adescr $ddescr"
	# 		],
	# 	lw=3)
	@df gdf_plot scatter(:QTR, :DEPT_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", 
		label=[
			"$ld_p by $yr_p for $rg_p $ldescr for $adescr"
			],
		lw=3)
end

# ╔═╡ 5bd9fabf-52c6-4d5b-a871-74791237f5f4
md"""
## what-if, would, could, should
	Region A is merged with Region B
	Employee resume work from office, how much Travel amounts % will increase.
	% of Office supply expenses given to Employee as home office setup
	would Region A, Cash Flow Investment have returned 7% ROI
	would Region B received Government/investor funding
	could have increased IT operating expenses by 5%
	could have reduced HR temp staff
	
	should have paid vendor invoiced on time to recive rebate
	should have applied loan to increase production
	should have retired a particular Asset
"""

# ╔═╡ 60ce8d4c-433d-4ce0-918b-7b8512749fb3
md"""
## Real-time TimeSeries, StatsModel predictions
	Predict Operating and non-operating expense for year
	Predict Actuals to Budget variance and FORECAST
	using SARIMA model to predict "Region A" NET-WORTH
"""

# ╔═╡ c93c46dc-1b18-43fc-babe-d8bc81c38d5c
md"""
# Supply chain Dashboard - live inventory
	below is an example dashboard (image) built in Pluto
	This dashboard uses OnlineStats.jl for "real-time" udpates
![Supply Chain Dashboard](https://github.com/AmitXShukla/AmitXShukla.github.io/raw/master/blogs/PlutoCon/scm.png)
"""

# Feature Requests
Pluto as an Enterprise Reproting tool.
Pluto provides a cohesive real-time, reactive data wrangling, tranformation, reporting & analytics framework for big data /ERP data sets.
	Cloud/on-Premise Server deployment
	PIN - live KPI Reports like TOC (Floating fluid content)
	Integarete pluto with BI tools like Microsoft Power BI, Tableau etc.
	Drill-through, Drill-down functionalities
	linking variables for easy navigation


LoadError: LoadError: UndefVarError: @bind not defined
in expression starting at In[18]:7

## using interactive visualization

In [19]:
a = [5] # this is a simple assignment
b = copy(a) # b is a copy of a
a === b
a, b
# b = [6]
# a
# A1[1] === A2[1]                 # true
# A3 = deepcopy(A1)
# A1[1] === A3[1]  

([5], [5])

## data analysis - would, could, should

In [20]:
# plot 3 chartfields
p1 = plot((combine(groupby(accountsDF, :CLASSIFICATION), nrow)).nrow,(combine(groupby(accountsDF, :CLASSIFICATION), nrow)).CLASSIFICATION, seriestype=scatter, label = "# of accounts by classification", xlabel = "# of accounts", ylabel="Class", xlims = (0, 5.5))
p2 = plot((combine(groupby(deptDF, :CLASSIFICATION), nrow)).nrow,(combine(groupby(deptDF, :CLASSIFICATION), nrow)).CLASSIFICATION, seriestype=scatter, label = "# of dept by classification", xlabel = "# of depts", ylabel="Class", xlims = (0, 2))
p3 = plot((combine(groupby(accountsDF, :CLASSIFICATION), nrow)).nrow,(combine(groupby(locationDF, :CLASSIFICATION), nrow)).CLASSIFICATION, seriestype=scatter, label = "# of locations by classification", xlabel = "# of locations", ylabel="Class", xlims = (1, 6.5))
plot(p1, p2, p3, layout = (3, 1), legend = false)

LoadError: UndefVarError: scatter not defined

ld = "Actuals"
ld = "Actuals"
ld = "Actuals"
ld = "Actuals"
ld = "Actuals"
ld = "Actuals"


--- 
## creating complete Supply Chain Data Model DataFrames
now since we got a handle of dataframe basics, let's create other chartfields/dimensions and create a complete Supply Chain DataFrame

view, @view, copy and deepcopy

normal/guassian distribution

## type systems
- ledger
- subledger
- accounting
- chartfields

- category
- typeof
- subtypes
- supertype
- eltypes // eltype.(eachcol(accountsDF)) # displays type of each column


#### reading dataframe
show
eachcol
describe
eltype

## transformation
- select
- transform
- combine
- unique rows
- group by
- order by
- sort


mapscols
broadcasting
regesx
match


#group by

Hello Friends,
In this video, we will discuss everything one need to know about Julia Data Frames to perform a detail ERP Data analyis.

In case if you are not familiar with Julia Language, it's one of newer langauge for Data Science, you can compare this with R and Python. However, it's a newer language, which runs like C and walks like Python.

I'm not going to discuss, R vs Python vs Julia, I think, each language has Pros and Cons. Please don't waste your time on pointless powerpoint comparisons, specially when it's easier to just pick these languages and start coding, and you will sooner or later, once you get a hang of programming language, there comes a time, you will know, which language meets your need.

In this blog, we will discuss following topics.

1. about ERP data analysiswhat are 
2. why Julia Language
3. Julia & package Installation
4. using Julia Data Frames for data analysis
5. Data Visualization
6. other packages like online stats, ODBC, JuliaDB
7. Data Cleansing, Wrangling, Masking & Analysis
