Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRN inference error when getting nodes. #29

Closed
koenvandenberge opened this issue May 2, 2020 · 5 comments · Fixed by #30
Closed

GRN inference error when getting nodes. #29

koenvandenberge opened this issue May 2, 2020 · 5 comments · Fixed by #30

Comments

@koenvandenberge
Copy link

koenvandenberge commented May 2, 2020

Hi,

I'm trying to estimate a GRN on a dataset matrix that can be found here.

However, I get an error in the first step, also when I use uniform widths as discretizer. Any ideas on how to fix this?

# Include packages

using NetworkInference
using LightGraphs
using GraphPlot


dataset_name = string("ExpressionData.csv")
algorithm = PIDCNetworkInference()
threshold = 0.15
@time genes = get_nodes(dataset_name, discretizer = "uniform_width");

ERROR: ArgumentError: collection must be non-empty
Stacktrace:
 [1] _extrema_itr(::typeof(identity), ::Array{Float64,1}) at ./operators.jl:472
 [2] _extrema_dims at ./multidimensional.jl:1601 [inlined]
 [3] #extrema#434 at ./multidimensional.jl:1588 [inlined]
 [4] extrema at ./multidimensional.jl:1588 [inlined]
 [5] get_bin_ids!(::Array{Float64,1}, ::String, ::Int64, ::Array{Int64,1}) at /Users/koenvandenberge/.julia/packages/InformationMeasures/fdfJk/src/Discretization.jl:107
 [6] Node(::Array{Any,2}, ::String, ::String, ::Int64) at /Users/koenvandenberge/.julia/packages/NetworkInference/z8pnG/src/common.jl:32
 [7] get_nodes(::String; delim::Bool, discretizer::String, estimator::String, number_of_bins::Int64) at /Users/koenvandenberge/.julia/packages/NetworkInference/z8pnG/src/infer_network.jl:35
 [8] get_nodes at /Users/koenvandenberge/.julia/packages/NetworkInference/z8pnG/src/infer_network.jl:26 [inlined]
 [9] top-level scope at ./util.jl:175

Note that reading the dataset using CSV works:

julia> dataset = CSV.read(dataset_name)
19×2001 DataFrames.DataFrame. Omitted printing of 1993 columns
│ Row │ Column1 │ E37_5_927   │ E42_7_69    │ E20_7_209   │ E70_2_163  │ E107_6_328 │ E131_7_61  │ E135_3_524 │
│     │ String  │ Float64     │ Float64     │ Float64     │ Float64    │ Float64    │ Float64    │ Float64    │
├─────┼─────────┼─────────────┼─────────────┼─────────────┼────────────┼────────────┼────────────┼────────────┤
│ 1   │ DMRT1   │ 0.00239255  │ 0.0228187   │ 1.7351      │ 1.15105    │ 1.89054    │ 0.00972208 │ 2.0532     │
│ 2   │ FGF9    │ 0.0188525   │ 0.0372795   │ 1.98033     │ 4.01631e-5 │ 1.99057    │ 0.0488233  │ 0.114901   │
│ 3   │ RSPO1   │ 2.20423     │ 1.69426     │ 0.0148483   │ 1.31783    │ 0.00808689 │ 2.02124    │ 0.0190963  │
│ 4   │ DHH     │ 0.000755998 │ 0.0187628   │ 1.75397     │ 0.010211   │ 1.2898     │ 0.0404907  │ 0.017903   │
│ 5   │ CTNNB1  │ 2.77642     │ 2.05514     │ 0.00463818  │ 0.511992   │ 0.004185   │ 2.0759     │ 0.0189984  │
│ 6   │ PGD2    │ 0.00813081  │ 0.0119802   │ 2.12718     │ 0.00934665 │ 2.15162    │ 0.004154   │ 0.0196394  │
│ 7   │ WT1mKTS │ 2.19433     │ 1.3142      │ 1.99798     │ 1.64531    │ 1.7993     │ 2.43534    │ 1.92463    │
⋮
│ 12  │ AMH     │ 0.000717303 │ 0.0275178   │ 1.60484     │ 0.152546   │ 1.29728    │ 0.00579285 │ 0.0252601  │
│ 13  │ NR0B1   │ 2.39157     │ 1.60968     │ 0.0143284   │ 2.28907    │ 0.333701   │ 1.50397    │ 2.16528    │
│ 14  │ NR5A1   │ 0.00334859  │ 0.0156287   │ 1.5703      │ 0.724982   │ 2.08801    │ 0.0162079  │ 1.12717    │
│ 15  │ WT1pKTS │ 0.272286    │ 0.0113008   │ 2.5884      │ 2.27792    │ 1.65995    │ 0.169417   │ 2.32935    │
│ 16  │ FOXL2   │ 1.4524      │ 1.62472     │ 0.000722346 │ 0.00980684 │ 0.00119802 │ 2.0794     │ 0.00196965 │
│ 17  │ UGR     │ 0.0465009   │ 0.000586337 │ 0.00702597  │ 0.0109601  │ 0.0931931  │ 0.00770349 │ 0.00754363 │
│ 18  │ SOX9    │ 0.0166821   │ 0.00586449  │ 2.19017     │ 0.228339   │ 1.24644    │ 0.0035871  │ 1.42638    │
│ 19  │ GATA4   │ 2.30659     │ 2.01715     │ 2.02201     │ 2.2878     │ 1.69821    │ 1.75037    │ 2.0366     │

@koenvandenberge
Copy link
Author

This could be fixed using @time genes = get_nodes(dataset_name, delim=',', discretizer = "uniform_width");

However, using a bigger dataset the following error then pops up:

julia> @time genes = get_nodes(dataset_name, delim=',');
ERROR: ArgumentError: indexed assignment with a single value to many locations is not supported; perhaps use broadcasting `.=` instead?
Stacktrace:
 [1] setindex_shape_check(::Int64, ::Int64) at ./indices.jl:258
 [2] macro expansion at ./multidimensional.jl:779 [inlined]
 [3] _unsafe_setindex!(::IndexLinear, ::Array{Int64,1}, ::Int64, ::UnitRange{Int64}) at ./multidimensional.jl:774
 [4] _setindex! at ./multidimensional.jl:769 [inlined]
 [5] setindex! at ./abstractarray.jl:1073 [inlined]
 [6] get_bin_ids!(::Array{Float64,1}, ::String, ::Int64, ::Array{Int64,1}) at /Users/koenvandenberge/.julia/packages/InformationMeasures/fdfJk/src/Discretization.jl:111
 [7] Node(::Array{Any,2}, ::String, ::String, ::Int64) at /Users/koenvandenberge/.julia/packages/NetworkInference/z8pnG/src/common.jl:32
 [8] get_nodes(::String; delim::Char, discretizer::String, estimator::String, number_of_bins::Int64) at /Users/koenvandenberge/.julia/packages/NetworkInference/z8pnG/src/infer_network.jl:35
 [9] top-level scope at ./util.jl:175

@Tchanders
Copy link
Owner

Hi,

This could be fixed using @time genes = get_nodes(dataset_name, delim=',', discretizer = "uniform_width");

Glad you got this working - the delimiter defaults to tab.

For the error with the bigger dataset: how big is the dataset, and does the error still occur without @time?

@koenvandenberge
Copy link
Author

koenvandenberge commented May 4, 2020

It is fairly large though not huge, ~16K cells and as many genes.
The error indeed occurs also without @time.
I have shared the dataset here for reference.

@koenvandenberge
Copy link
Author

Hi, just checking in whether you have any ideas on whether you have an idea on how this problem may be solved.

Thanks,
Koen

Tchanders added a commit that referenced this issue Jul 26, 2020
Implicit broadcasting was deprecated in 7.0.

Closes #29
@Tchanders
Copy link
Owner

@koenvandenberge Apologies for the delay, and thanks for reporting. The error was happening because of a deprecation in Julia 7.0.

After fixing, calling @time genes = get_nodes(dataset_name, delim=',', discretizer = "uniform_width"); with the linked dataset worked with no errors.

The size wasn't a problem; the error was only happening if there were genes in the dataset that had all identical values.

Tchanders added a commit that referenced this issue Jul 26, 2020
Implicit broadcasting was deprecated in 7.0.

Closes #29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants