Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to dynamically load a new rule requiring resources or variables #309

Closed
michaelcapizzi opened this issue Feb 18, 2019 · 1 comment
Assignees
Labels

Comments

@michaelcapizzi
Copy link

michaelcapizzi commented Feb 18, 2019

I'd like to be able to dynamically add a new rule to the ExtractorEngine.

I know how to do this with readSimpleFile(), but if the rule in question requires resources (e.g. Word2Vec) or variables, then it must use readMasterFile(). But this rereads all resources and variables each time it is called. And if the Word2Vec file is large, this can be very slow.

I tried to extend RuleReader and then override readMasterFile() so that it can optionally take an existing OdinConfig:

/**
  * Custom Rule Reader that *stores* the [[OdinConfig]]
  */
class CustomRuleReader(
                      actions: Actions,
                      charset: Charset
                      ) extends RuleReader(actions, charset) {

  // empty config
  var config = OdinConfig(resources = OdinResourceManager(Map.empty))

  /**
    * non-private form of readResources
    */
  def readResources(data: Map[String, Any]): OdinResourceManager = {
    //println(s"resources: ${data.get("resources")}")
    val resourcesMap: Map[String, String] = data.get("resources") match {
      case Some(m: JMap[_, _]) => m.asScala.map(pair => (pair._1.toString, pair._2.toString)).toMap
      case _ => Map.empty
    }
    OdinResourceManager(resourcesMap)
  }

  /**
    * non-private form of readTaxonomy
    */
  def readTaxonomy(data: Any): Taxonomy = data match {
    case t: Collection[_] => Taxonomy(t.asInstanceOf[Collection[Any]])
    case path: String =>
      val url = mkURL(path)
      val source = Source.fromURL(url)
      val input = source.mkString
      source.close()
      val yaml = new Yaml(new Constructor(classOf[Collection[Any]]))
      val data = yaml.load(input).asInstanceOf[Collection[Any]]
      Taxonomy(data)
  }

  /**
    * non-private form of readRules
    */
  def readRules(
                 rules: Collection[JMap[String, Any]],
                 config: OdinConfig
               ): Seq[Rule] = {

    // return Rule objects
    rules.asScala.toSeq.flatMap { r =>
      val m = r.asScala.toMap
      if (m contains "import") {
        // import rules from a file and return them
        importRules(m, config)
      } else {
        // gets a label and returns it and all its hypernyms
        val expand: String => Seq[String] = label => config.taxonomy match {
          case Some(t) => t.hypernymsFor(label)
          case None => Seq(label)
        }
        // interpolates a template variable with ${variableName} notation
        // note that $variableName is not supported and $ can't be escaped
        val template: Any => String = a => replaceVars(a.toString, config.variables)
        // return the rule (in a Seq because this is a flatMap)
        Seq(mkRule(m, expand, template, config))
      }
    }

  }

  /**
    * non-private form of importRules
    */
  def importRules(
                   data: Map[String, Any],
                   config: OdinConfig
                 ): Seq[Rule] = {
    // apply variable substitutions to import
    val path = {
      val p = data("import").toString
      val res = replaceVars(p, config.variables)
      res
    }
    val url = mkURL(path)
    val source = Source.fromURL(url)
    val input = source.mkString // slurp
    source.close()
    // read rules and vars from file
    val (jRules: Collection[JMap[String, Any]], localVars: Map[String, String]) = try {
      // try to read file with rules and optional vars by trying to read a JMap
      val yaml = new Yaml(new Constructor(classOf[JMap[String, Any]]))
      val data = yaml.load(input).asInstanceOf[JMap[String, Any]].asScala.toMap
      // read list of rules
      val jRules = data("rules").asInstanceOf[Collection[JMap[String, Any]]]
      // read optional vars
      val localVars = getVars(data)
      (jRules, localVars)
    } catch {
      case e: ConstructorException =>
        // try to read file with a list of rules by trying to read a Collection of JMaps
        val yaml = new Yaml(new Constructor(classOf[Collection[JMap[String, Any]]]))
        val jRules = yaml.load(input).asInstanceOf[Collection[JMap[String, Any]]]
        (jRules, Map.empty)
    }
    // variables specified by the call to `import`
    val importVars = getVars(data)
    // variable scope:
    // - an imported file may define its own variables (`localVars`)
    // - the importer file can define variables (`importerVars`) that override `localVars`
    // - a call to `import` can include variables (`importVars`) that override `importerVars`
    val updatedVars = localVars ++ config.variables ++ importVars
    val newConf = config.copy(variables = updatedVars)
    readRules(jRules, newConf)
  }

  /**
    * non-private form of cleanVars
    */
  def cleanVar(s: String): String = {
    val clean = s.
      replaceAll("\\$\\{\\s+", "\\$\\{").
      replaceAll("\\s+\\}", "\\}")
    clean
  }

  /**
    * non-private form of replaceVars
    */
  def replaceVars(s: String, vars: Map[String, String]): String = {
    val valuesMap = vars.asJava
    // NOTE: StrSubstitutor is NOT threadsafe
    val sub = new StrSubstitutor(valuesMap)
    // allow for recursive substitution
    sub.setEnableSubstitutionInVariables(true)
    val clean = cleanVar(s)
    sub.replace(clean)
  }

  /**
    * Override of readmMasterFile that *also returns* the [[OdinConfig]]
    */
  override def readMasterFile(input: String): Vector[Extractor] = {
    val yaml = new Yaml(new Constructor(classOf[JMap[String, Any]]))
    val master = yaml.load(input).asInstanceOf[JMap[String, Any]].asScala.toMap
    val taxonomy = master.get("taxonomy").map(readTaxonomy)
    val vars = getVars(master)
    val resources = readResources(master)
    val jRules = master("rules").asInstanceOf[Collection[JMap[String, Any]]]
    val graph = getGraph(master)
    this.config = OdinConfig(taxonomy = taxonomy, resources = resources, variables = vars, graph = graph)
    val rules = readRules(jRules, this.config)

    mkExtractors(rules)
  }

  /**
    * alternative form of readSimpleFile that takes in an already-loaded Config
    */
  def readSimpleFile(input: String, loadedConfig: OdinConfig): Vector[Extractor] = {
    val yaml = new Yaml(new Constructor(classOf[Collection[JMap[String, Any]]]))
    val jRules = yaml.load(input).asInstanceOf[Collection[JMap[String, Any]]]
    // no resources are specified
    val rules = readRules(jRules, this.config)
    mkExtractors(rules)
  }

You'll notice the most important part is that there is another form of readMasterFile() which can take an existing OdinConfig, and thus not require re-reading the resources. But because many of the important methods in RuleReader are private, I had to basically port the entire class. While this is doable, I'm not keen on diverging so drastically from the codebase.

So my first question is "Is there a good reason for making most of the methods in RuleReader private?

And if there is an answer to that question, my next question would be to ask for suggestions on how to handle this.

@kwalcock
Copy link
Member

@michaelcapizzi, you may be interested in #717.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants