You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to be able to dynamically add a new rule to the ExtractorEngine.
I know how to do this with readSimpleFile(), but if the rule in question requires resources (e.g. Word2Vec) or variables, then it must use readMasterFile(). But this rereads all resources and variables each time it is called. And if the Word2Vec file is large, this can be very slow.
I tried to extendRuleReader and then overridereadMasterFile() so that it can optionally take an existing OdinConfig:
/** * Custom Rule Reader that *stores* the [[OdinConfig]]*/classCustomRuleReader(
actions: Actions,
charset: Charset
) extendsRuleReader(actions, charset) {
// empty configvarconfig=OdinConfig(resources =OdinResourceManager(Map.empty))
/** * non-private form of readResources*/defreadResources(data: Map[String, Any]):OdinResourceManager= {
//println(s"resources: ${data.get("resources")}")valresourcesMap:Map[String, String] = data.get("resources") match {
caseSome(m: JMap[_, _]) => m.asScala.map(pair => (pair._1.toString, pair._2.toString)).toMap
case _ =>Map.empty
}
OdinResourceManager(resourcesMap)
}
/** * non-private form of readTaxonomy*/defreadTaxonomy(data: Any):Taxonomy= data match {
caset: Collection[_] =>Taxonomy(t.asInstanceOf[Collection[Any]])
casepath: String=>valurl= mkURL(path)
valsource=Source.fromURL(url)
valinput= source.mkString
source.close()
valyaml=newYaml(newConstructor(classOf[Collection[Any]]))
valdata= yaml.load(input).asInstanceOf[Collection[Any]]
Taxonomy(data)
}
/** * non-private form of readRules*/defreadRules(
rules: Collection[JMap[String, Any]],
config: OdinConfig
):Seq[Rule] = {
// return Rule objects
rules.asScala.toSeq.flatMap { r =>valm= r.asScala.toMap
if (m contains "import") {
// import rules from a file and return them
importRules(m, config)
} else {
// gets a label and returns it and all its hypernymsvalexpand:String=>Seq[String] = label => config.taxonomy match {
caseSome(t) => t.hypernymsFor(label)
caseNone=>Seq(label)
}
// interpolates a template variable with ${variableName} notation// note that $variableName is not supported and $ can't be escapedvaltemplate:Any=>String= a => replaceVars(a.toString, config.variables)
// return the rule (in a Seq because this is a flatMap)Seq(mkRule(m, expand, template, config))
}
}
}
/** * non-private form of importRules*/defimportRules(
data: Map[String, Any],
config: OdinConfig
):Seq[Rule] = {
// apply variable substitutions to importvalpath= {
valp= data("import").toString
valres= replaceVars(p, config.variables)
res
}
valurl= mkURL(path)
valsource=Source.fromURL(url)
valinput= source.mkString // slurp
source.close()
// read rules and vars from fileval (jRules: Collection[JMap[String, Any]], localVars: Map[String, String]) =try {
// try to read file with rules and optional vars by trying to read a JMapvalyaml=newYaml(newConstructor(classOf[JMap[String, Any]]))
valdata= yaml.load(input).asInstanceOf[JMap[String, Any]].asScala.toMap
// read list of rulesvaljRules= data("rules").asInstanceOf[Collection[JMap[String, Any]]]
// read optional varsvallocalVars= getVars(data)
(jRules, localVars)
} catch {
casee: ConstructorException=>// try to read file with a list of rules by trying to read a Collection of JMapsvalyaml=newYaml(newConstructor(classOf[Collection[JMap[String, Any]]]))
valjRules= yaml.load(input).asInstanceOf[Collection[JMap[String, Any]]]
(jRules, Map.empty)
}
// variables specified by the call to `import`valimportVars= getVars(data)
// variable scope:// - an imported file may define its own variables (`localVars`)// - the importer file can define variables (`importerVars`) that override `localVars`// - a call to `import` can include variables (`importVars`) that override `importerVars`valupdatedVars= localVars ++ config.variables ++ importVars
valnewConf= config.copy(variables = updatedVars)
readRules(jRules, newConf)
}
/** * non-private form of cleanVars*/defcleanVar(s: String):String= {
valclean= s.
replaceAll("\\$\\{\\s+", "\\$\\{").
replaceAll("\\s+\\}", "\\}")
clean
}
/** * non-private form of replaceVars*/defreplaceVars(s: String, vars: Map[String, String]):String= {
valvaluesMap= vars.asJava
// NOTE: StrSubstitutor is NOT threadsafevalsub=newStrSubstitutor(valuesMap)
// allow for recursive substitution
sub.setEnableSubstitutionInVariables(true)
valclean= cleanVar(s)
sub.replace(clean)
}
/** * Override of readmMasterFile that *also returns* the [[OdinConfig]]*/overridedefreadMasterFile(input: String):Vector[Extractor] = {
valyaml=newYaml(newConstructor(classOf[JMap[String, Any]]))
valmaster= yaml.load(input).asInstanceOf[JMap[String, Any]].asScala.toMap
valtaxonomy= master.get("taxonomy").map(readTaxonomy)
valvars= getVars(master)
valresources= readResources(master)
valjRules= master("rules").asInstanceOf[Collection[JMap[String, Any]]]
valgraph= getGraph(master)
this.config =OdinConfig(taxonomy = taxonomy, resources = resources, variables = vars, graph = graph)
valrules= readRules(jRules, this.config)
mkExtractors(rules)
}
/** * alternative form of readSimpleFile that takes in an already-loaded Config*/defreadSimpleFile(input: String, loadedConfig: OdinConfig):Vector[Extractor] = {
valyaml=newYaml(newConstructor(classOf[Collection[JMap[String, Any]]]))
valjRules= yaml.load(input).asInstanceOf[Collection[JMap[String, Any]]]
// no resources are specifiedvalrules= readRules(jRules, this.config)
mkExtractors(rules)
}
You'll notice the most important part is that there is another form of readMasterFile() which can take an existing OdinConfig, and thus not require re-reading the resources. But because many of the important methods in RuleReader are private, I had to basically port the entire class. While this is doable, I'm not keen on diverging so drastically from the codebase.
So my first question is "Is there a good reason for making most of the methods in RuleReaderprivate?
And if there is an answer to that question, my next question would be to ask for suggestions on how to handle this.
The text was updated successfully, but these errors were encountered:
I'd like to be able to dynamically add a new rule to the
ExtractorEngine
.I know how to do this with
readSimpleFile()
, but if the rule in question requires resources (e.g.Word2Vec
) or variables, then it must usereadMasterFile()
. But this rereads all resources and variables each time it is called. And if theWord2Vec
file is large, this can be very slow.I tried to
extend
RuleReader
and thenoverride
readMasterFile()
so that it can optionally take an existingOdinConfig
:You'll notice the most important part is that there is another form of
readMasterFile()
which can take an existingOdinConfig
, and thus not require re-reading the resources. But because many of the important methods inRuleReader
areprivate
, I had to basically port the entire class. While this is doable, I'm not keen on diverging so drastically from the codebase.So my first question is "Is there a good reason for making most of the methods in
RuleReader
private
?And if there is an answer to that question, my next question would be to ask for suggestions on how to handle this.
The text was updated successfully, but these errors were encountered: