Skip to content

Common Code Structure

Chris Lu edited this page Oct 24, 2015 · 1 revision

Static code in a dynamic world

Golang is statically compiled. It's not planned to have a feature to dynamically move computation closure across servers, AFAIK.

So Glow has this strategy: one binary code, 3 running mode: standalone mode, driver mode and task mode.

  register_one_or_more_flows  //shared by all modes
  ...
  if option "-glow.flow.id" and "-glow.taskGroup.id" is passed in {
     the binary will run in task mode.
  }else if option "-glow" is passed in {
     the binary will run in driver mode.
  }else{
     starts in standalone mode.
  }

This way, all the flows will be statically registered. Given a "-glow.flow.id", the task mode will know which flow to execute.

Simple case without driver specific code

If you have just one flow, without any driver specific logic, everything can be just in the main() section.

  func main(){
    flag.Parse()    // required for distributed execution
    flow.New().Map(...).Reduce(...).Join(...).Run()
  }

For many data processing job, this could be enough.

Note: when starting in task mode, all the original driver's environment variables, commandline options, are all passed to the task mode.

the case with driver specific code

If we need to do additional things in driver mode, such as feed into input channels, read from output channels, we need to mark where the shared code stops and the driver/task mode should run in their own way.

We need to add this magic line:

  func main(){
    flag.Parse()
    f1 = register_one_flow           // shared by all modes
    flow.Ready()                     // the magic line!
    // start driver specific logic
    ...
    // run in driver mode
    f1.Run()
  }

Actually the function name "Ready" may not correctly show its meaning. If you have any suggestions, please let me know.

This flow.Ready() function will check which mode it should run. If it is task mode, it will start to run the specified task, and stop. If it is driver mode or standalone mode, it will just continue.

Idiomatic way for all cases

When a Glow program starts, it should register the flow, or a list of flows. This is ideal for Golang's "func init()" section. So Glow's idiomatic way is to create the flows in the init sections.

  // file1.go
  var f1 *flow.FlowContext
  func init(){
    f1 = flow.New().Map(...).Reduce(...).Join(...)
  }
  
  // file1.go
  var f2 *flow.FlowContext
  func init(){
    f2 = flow.New().Map(...).Reduce(...).Join(...)
  }
  
  // main.go
  func main(){
    flag.Parse()
    flow.Ready()
    // starts driver specific logic
    ...
    // complicated logic, such as if/else, for loop, etc.
    if something {
      f1.Run()
    }else{
      f2.Run()
    }

  }