Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addDCElement is too slow #18

Closed
mpadge opened this issue May 24, 2022 · 3 comments
Closed

addDCElement is too slow #18

mpadge opened this issue May 24, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request user support
Milestone

Comments

@mpadge
Copy link

mpadge commented May 24, 2022

I've got code that loops over a list of metadata terms which can't be know in advance and does something like this, acting on a DCEntry object called dcmi:

    for (i in seq (nrow (terms))) {
        dc_fn <- # code to match `addDC...` fn from `atom4R`
        value <- data [[terms$value [i]]]
        do.call (dcmi [[dc_fn]], list (value))
    }

The problem is this is the slowest bit of my code by far, because all of the addDCElement() calls take so long. This in turn seems to be because of these lines:

list_of_classes <- list_of_classes[sapply(list_of_classes, function(x){
clazz <- try(eval(parse(text=x)),silent=TRUE)
if(is(clazz, "try-error")) clazz <- try(eval(parse(text=paste0("atomR::",x))),silent=TRUE)
r6Predicate <- class(clazz)[1]=="R6ClassGenerator"
if(!r6Predicate) return(FALSE)
atomObjPredicate <- FALSE
superclazz <- clazz
while(!atomObjPredicate && !is.null(superclazz)){
clazz_fields <- names(superclazz)
if(!is.null(clazz_fields)) if(length(clazz_fields)>0){
if("get_inherit" %in% clazz_fields){
superclazz <- superclazz$get_inherit()
atom4RPredicate <- FALSE
if("parent_env" %in% clazz_fields) atom4RPredicate <- environmentName(superclazz$parent_env)=="atom4R"
atomObjPredicate <- superclazz$classname == classname && atom4RPredicate
}else{
break
}
}
}
return(atomObjPredicate)
})]

For a single term, that can end up looping around 300 times, each time doing a full eval(parse()) call. Would it not be possible to restructure that entire call as its own reference function that did not accept classname as a dynamic parameter, rather did the single trawl over everything and dumped c(classname, atom4RPredicate) to create a single reference table. That call could easily be memoised for instant recall, and then the list_of_classes for a single classname could also be instantly extracted. Given my estimates for a few terms of 300 or so loops of that funciton times unknown numbers of terms going into my code like the above, it should be possible to achieve a speed-up here of O(1000).

@eblondel
Copy link
Owner

eblondel commented May 25, 2022

Yes, this is well known issue that I have to look into very soon. I had already improved this in similar package methods (eg geometa) and I have to see if this can be applied here. I didn't know about 'memoise' and I will look into it, thanks for that. Also something you have to know is that the 'add' methods check if there is alread a value, and will return FALSE if already available, but since some elements consist in complex objects (and not simple string values) the comparison is done by comparing the out xml.

An immediate approach, faster, that is actually implemented in the decode function, is not to use the util adders (and then much less convenient), but directly set the fields with the appropriate elements. Example:

dc <- DCEntry$new()
dc[["contributor"]] <- lapply(c("Bob", "Peter", "Bunny"), function(x){DCContributor$new( value = x)})

On this basis, I will support new setters as talked in #14

@mpadge
Copy link
Author

mpadge commented May 25, 2022

Great stuff - thanks!

@eblondel eblondel self-assigned this May 30, 2022
@eblondel eblondel added enhancement New feature or request user support labels May 30, 2022
@eblondel eblondel added this to the 0.3 milestone May 30, 2022
eblondel added a commit that referenced this issue May 30, 2022
@eblondel
Copy link
Owner

@mpadge I've slightly improved the code, first by fetching DC rdf vocabulary by default, (previously done at each DC element creation), secondly by simplifying each DC element class by aligning with the R6 class writing convention used in the package, for what concern native DC elements (for which we don't really need to look into all class inheritance). Together with that, i've provided setters for lists of elements . See #14

@eblondel eblondel closed this as completed Jun 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request user support
Projects
None yet
Development

No branches or pull requests

2 participants