Skip to content
Chris Lu edited this page Nov 1, 2015 · 3 revisions

Use struct

Many other systems favors a database-oriented row/column approach when dealing with data. However, it loosens the type information.

It's totally fine to use the struct.

  type Product struct{
    Id    int
    Name  string
  }
  type Rating struct{
    ProductId    int
    Score        int
  }

  productChan := make(chan Product)
  ratingChan := make(chan Rating)

  f := flow.New()
  prod := f.Channel(productChan).Map(func(p Product) (int, Product){
    return p.Id, p
  })
  rating := f.Channel(ratingChan).Map(func(r Rating) (int, Rating){
    return r.ProductId, r
  })
  prod.Join(rating).Map(func(id int, p Product, r Rating) {
    ...
  })



Standalone Mode vs Distributed Mode

In many other frameworks, the standalone mode is mostly for learning or debugging, not for real production usage.

With Glow, the standalone mode runs efficiently with Golang's idiomatic goroutines and channels, and skips unnecessary steps, e.g., serializing/deserializing, network hops, complicated scheduling, etc. It's possible that the standalone mode may actually be faster than the distributed mode with 2~3 nodes.

In many cases, your data is not that large, really! I would recommend just start with Glow's standalone mode. At least it's easier to debug and analyze performance.

Clone this wiki locally