We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I still can't load the Fannie Mae data.
I have written some example code but it takes more than 5 hours and still fails.
The data requires a login to download and the data is 135G in size.
using Distributed, Statistics addprocs(6) @time @everywhere using JuliaDB, Dagger datapath = "c:/data/Performance_All/" ifiles = joinpath.(datapath, readdir(datapath)) colnames = ["loan_id", "monthly_rpt_prd", "servicer_name", "last_rt", "last_upb", "loan_age", "months_to_legal_mat" , "adj_month_to_mat", "maturity_date", "msa", "delq_status", "mod_flag", "zero_bal_code", "zb_dte", "lpi_dte", "fcc_dte","disp_dt", "fcc_cost", "pp_cost", "ar_cost", "ie_cost", "tax_cost", "ns_procs", "ce_procs", "rmw_procs", "o_procs", "non_int_upb", "prin_forg_upb_fhfa", "repch_flag", "prin_forg_upb_oth", "transfer_flg"]; fsz = (x->stat(x).size).(ifiles) nchunks = ceil.(fsz./(250*1024*1024)) mkcmd(in, out, nchunks) = begin "split $in -n l/$(Int(nchunks)) -d $out" end open("c:/data/script", "w") do f for m in mkcmd.(ifiles, "/c/data/Performance_All_split/".*readdir(datapath), nchunks)[nchunks .>= 2] write(f, m*"\n") end end ##################################################################### ############## execute the above to split the csvs into smaller csvs ##################################################################### const fmtypes = [ String, Union{String, Missing}, Union{String, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{String, Missing}, Union{String, Missing}, Union{String, Missing}, Union{String, Missing}, Union{String, Missing}, Union{String, Missing}, Union{String, Missing}, Union{String, Missing}, Union{String, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{Float64, Missing}, Union{String, Missing}, Union{Float64, Missing}, Union{String, Missing}] #datapath = "C:/data/Performance_All_split" datapath = "C:/data/ok" ifiles = joinpath.(datapath, readdir(datapath)) # takes more than 5 hours and then fails. @time jll = loadtable( ifiles, output = "c:/data/fm.jldb\\", delim='|', header_exists=false, filenamecol = "filename", chunks = length(ifiles), #type_detect_rows = 20000, colnames = colnames, colparsers = fmtypes, indexcols=["loan_id", "monthly_rpt_prd"]) using JuliaDB @time a = load("c:/data/fm.jldb/")
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I still can't load the Fannie Mae data.
I have written some example code but it takes more than 5 hours and still fails.
The data requires a login to download and the data is 135G in size.
The text was updated successfully, but these errors were encountered: