Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Walkthrough #4

Open
adamakhtar opened this issue Jul 26, 2012 · 15 comments
Open

Walkthrough #4

adamakhtar opened this issue Jul 26, 2012 · 15 comments

Comments

@adamakhtar
Copy link
Member

Ill give this a shot but instead of using the cuba-app example Im going to use a much more simple example such as this

### hello.rb

require "cuba"

Cuba.define do
  on get do
    on "hello" do
      res.write "Hello world!"
    end
  end
end

and a corresponding config.ru

# cat config.ru
require "./hello_world"

run Cuba
@adamakhtar
Copy link
Member Author

So on the console you start your cuba app with

rackup config.ru

which requires hello.rb.

Cuba.define is called with a block. If we go take a look at the define method we see

def self.define(&block)
    app.run new(&block)
  end

where app is a method

  def self.app
    @app ||= Rack::Builder.new
  end

assigning a Rack::Builder object to @app.

Recap of Rack - Whats Rack::Builder ?

Rack::Builder implements a small DSL to iteratively construct Rack applications.

it provides three instance methods to construct a rack app - run, use and map.

run allows you to specify your app which will be wrapped by Rack::Builder
use allows you to stack middleware on top the app you specified with run
map allows to map various apps and middlewares to specific paths.

@cyx provided us all a link to an (article)[http://cyrildavid.com/articles/2012/02/16/composing-apps]
and gave an example of how to use map to compose an app made up of several other apps.

map "/blog" do
  run Blog
end

map "/support" do
  run Support
end

map "/docs" do
  run Docs
end

Here we see we use Rack::Builder to create a website for a company split up into 3 seperate apps, each for blog, docs and support.

Note that Rack::Builder.run { blah } doesnt actually run a server or anything (right?) it simply wraps the passed code.
To get the servers started you would do something like this

Rack::Handler::Mongrel.run Rack::Builder.run Proc.new{ [response, header, body] }, :port => 9879

heres an example of Rack::Builder in use from the (Rack docs) [http://rack.rubyforge.org/doc/classes/Rack/Builder.html]

app = Rack::Builder.new {
   use Rack::CommonLogger
   use Rack::ShowExceptions
   map "/lobster" do
     use Rack::Lint
     run Rack::Lobster.new
   end
 }

Ok continuing

@adamakhtar
Copy link
Member Author

so we got ourselves as yet to be configured Rack::Builder object.

So back in

  def self.define(&block)
    app.run new(&block)
  end

this is where we pass our app to rackbuilder.

First Cuba creates an instance of itself passing the block we defined in hello.rb (see below)

Cuba.define do
  on get do
    on "hello" do
      res.write "Hello world!"
    end
  end
end

in Cuba#initialize it simply stores that block in an instance variable @blk.

 def initialize(&blk)
    @blk = blk
    @captures = []
  end

So now rack builder has a Cuba object. But for it to be valid rack app it must have a call method. Which it does!

def call(env)
    dup.call!(env)
  end

@adamakhtar
Copy link
Member Author

So in summary,

requiring 'hello world' results in an instance of a Cuba being created with the contents of the define method being stored away in it.

So on run Cuba I guess our app is started and simply waiting for a request to come in.

When it does hit the server, rack handles it and calls our app passing the environment hash.

So assuming a request for "/hello"

our Cuba#call is called which immediately duplicates itself (self being a cuba instance) and calls .call!

Im not sure why its being used here but usually I use dup when I want to make preserve the values of an object.

copy_obj = orignal_object.dup

a_method_which_could_change_stuff_in(copy_obj)

#original_object is still the way i want it!

anyway onwards!

  def call(env)
    dup.call!(env)
  end

  def call!(env)
    @env = env
    @req = Rack::Request.new(env)
    @res = Cuba::Response.new

    # This `catch` statement will either receive a
    # rack response tuple via a `halt`, or will
    # fall back to issuing a 404.
    #
    # When it `catch`es a throw, the return value
    # of this whole `_call` method will be the
    # rack response tuple, which is exactly what we want.
    catch(:halt) do
      instance_eval(&@blk)

      res.status = 404
      res.finish
    end
  end

here we see call! stores the env, creates a Rack::Request object(a handy helper from rack which provides some methods such as get? post?), and Cuba::Response object.

Then on to the meat and potatoes.

We start with catch(:halt). If you are not familiar with catch and its counterpart throw its similiar to raise and rescue but whereas the later are used for error situations, the former are used to simply controll the flow of a program.

@cyx mentioned it's kind of similiar to a goto statement. A short excellent explanation can be found here by Avdi Grimm at rubylearning.

theres no sign of its friend throw here but we'll be meeting it shortly. In the meantime catch sits waiting for it whilst the contents of the block it is passed are executed.

And its finally here where the DSL we used to create our app gets executed with

instance_eval(&@blk)

@adamakhtar
Copy link
Member Author

on get do
    on "hello" do
      res.write "Hello world!"
    end
end

so heres the on method.

def on(*args, &block)
    try do

      @captures = []
      return unless args.all? { |arg| match(arg) }

      yield(*captures)

      halt res.finish
 end

immediately we see a call to Cuba#try (not the rails#try). Now I dont fully understand this method but I know its something to help allow mapping stuff to paths ( I think? ) - Id love it if someone could do an explanation of how that works. For our simple app I dont think it does anything we need to care about. So Im assuming that the contents of the block passed to it are executed.

We get an @captures set to []

Now we come to

return unless args.all? { |arg| match(arg) }

Well what our are args?

on get do...

so we have one... get. Its not a string or symbol but actually the result of a method called get.

def get; req.get? end

get simply returns true if the request from the client was a get request. In our case it was so the boolean true is returned to "on".

So this args.all? { |arg| match(arg) }

looks like this [true].all? { |arg| match(arg) }

So what does match(true) return?

  def match(matcher, segment = "([^\\/]+)")
    case matcher
    when String then consume(matcher.gsub(/:\w+/, segment))
    when Regexp then consume(matcher)
    when Symbol then consume(segment)
    when Proc   then matcher.call
    else
      matcher
    end
 end

in this case true is none of the four types in the case expression so match simply returns true back

return unless args.all? { |arg| match(arg) }

this means the above return isnt executed and instead skipped.

Phew.

So next is

yield(*captures)

First @captures was unaltered in our case. If you look back at the case expression in #match you`ll notice for those 4 types, #consume is called. In that method @capture can be modified. For now its blank.

we yield the inner block of our dsl

    on "hello" do
      res.write "Hello world!"
    end

which again calls on....yup its one of those inception style things - but bear with me.

ignoring try again, we are asked if the args ("hello") match.

So back in match we see

when String then consume(........

to keep things short consume basically tests to see if "hello" matches the path from the request and if it does try and extract any params and store them in @captures ( Im not sure if captures is for params - just a hunch. )

The result of match is true so again return isnt executed and we go back to

yield(*captures) where finally

res.write "Hello world!" is called, storing "hello world" in @res.

Unlike previous iterations the final block doesnt call the method "on"

      yield(*captures) #we just finished here

      halt res.finish
    end
  end

so we finally call halt res.finish

where res.finish returns

def finish
      [@status, @headers, @body]
end

and is passed to halt as a response

def halt(response)
    throw :halt, response
end

And here we finally get to meet the previously mentioned catch's counterpart - throw and it's payload the response.

We our immediatley taken out of this inception headbanger back to the catch in the method call!(env)

def call!(env)

    ...
    # This `catch` statement will either receive a
    # rack response tuple via a `halt`, or will
    # fall back to issuing a 404.

    catch(:halt) do
      instance_eval(&@blk)

      res.status = 404
      res.finish
    end
end

catch recieves our throw which was buried deep in the bowels of that instance_eval and returns it, without executing the code involving res.status = 404.

So call returns the response as rack expects .

@adamakhtar
Copy link
Member Author

In summary the DSL appears to act like a huge case statement.

It compares every 'on' expression with the request and if it doesn't match moves on to the next sibling 'on'. However, if it's true it descends down into the matched on via its block and continues to check if any of the nested 'on' expressions match the request. At some point we end up with a block of code intended to be the result.

The reason why we have the catch and throw system is to avoid unnecessary processing I guess. Image if it was like this

on get "/hello" do #two arguments here for brevity sake but perfectly valid in cuba.
do_something
end

on post "blah" do
do_something_else
end

on get "blahblah" do
do_something_else!
end

If the request was a get to "/hello" then Cuba would have found the correct code to run on the first on. Without catch and throw though, it would unnecessarily try to check all the other actions declared.

Im not sure if thats a massive performance penalty but still, it's nice to be lean.

Maybe someone can confirm this as being the primary reason or just a nice side effect.

Right that's me done.

Somethings id like to know are what are @captures and the reason for the #try method.

@ericgj
Copy link
Member

ericgj commented Jul 26, 2012

wow, @RoboDisco, nice job.

re. #try, see the point cyx makes in #3. The way nested on calls work, at least for the regexp/string/symbol matchers, is to progressively 'eat' the URL by shifting the matching piece from PATH_INFO into SCRIPT_NAME, and leaving the remainder in PATH_INFO. That way, a route nested inside another matches on the remainder of the path, rather than the whole path. So effectively when you have something like

# GET /users/1/posts
on 'users/:id' do

  # Here, PATH_INFO = "/posts", rather than "/users/1/posts", 
  # so that the following route matches properly
  on 'posts' do
      # ...   
  end

end

But the downside to mutating the path like this, is that you would be left with inconsistent state after the nested route finished, if it didn't get reset afterwards. I can't think of a great example, but say you have some Rack middleware that comes into play after your app finishes, and needs to access PATH_INFO. Then you want that to be reset to the full path, not the last matching piece of the path. So this is the function of #try: by wrapping your route handler in #try, the path gets reset before finishing (halt).

  # @private Used internally by #on to ensure that SCRIPT_NAME and
  #          PATH_INFO are reset to their proper values.
  def try
    script, path = env["SCRIPT_NAME"], env["PATH_INFO"]

    yield

  ensure
    env["SCRIPT_NAME"], env["PATH_INFO"] = script, path
  end

@cyx
Copy link

cyx commented Jul 26, 2012

Hi Eric,

For your example, there should be no inconsistent state.

# GET /users/1/posts
on 'users/:id' do

  # Here, PATH_INFO = "/posts", rather than "/users/1/posts", 
  # so that the following route matches properly
  on "posts" do
     # PATH_INFO=""
  end

  # PATH_INFO back to /posts
end

# PATH_INFO back to /users/1/posts

Thanks,
cyx

But the downside to mutating the path like this, is you are left with inconsistent state after the nested route finishes. I can't think of a great example, but say you have some Rack middleware that comes into play after your app finishes, and needs to access PATH_INFO. Then you want that to be reset to the full path, not the last matching piece of the path. So this is the function of #try: by wrapping your route handler in #try, the path gets reset before finishing (halt).

 # @private Used internally by #on to ensure that SCRIPT_NAME and
 #          PATH_INFO are reset to their proper values.
 def try
   script, path = env["SCRIPT_NAME"], env["PATH_INFO"]

   yield

 ensure
   env["SCRIPT_NAME"], env["PATH_INFO"] = script, path
 end

Reply to this email directly or view it on GitHub:
#4 (comment)

@ericgj
Copy link
Member

ericgj commented Jul 26, 2012

I think the Readme does a good job explaining captures, but they are basically very similar to Sinatra's - they are pieces of the URL that get matched and then passed into your route handler as parameters.

Except that Cuba has two kinds of captures that Sinatra doesn't have -

  1. on get, extension("css") do |basename| end will give you basename = "example" from the path GET /example.css
  2. on post, "foo", param("a"), param("b"), param("c") do |a,b,c| end will give you a, b, c = "1", "2", "3" from the path POST /foo?a=1&b=2&c=3 (or likewise if the params come from a form). Basically saving you the work of assigning local variables for each param within your handler.

(Note I haven't tested any of this, this is just going on what the Readme says and what it looks like in the code, the syntax may not be exactly right.)

@ericgj
Copy link
Member

ericgj commented Jul 26, 2012

@cyx, exactly, what I meant was (but wasn't very clear): if you didn't have the try wrapper, you'd be left with inconsistent state. Thus the need for try, which robodisco was asking about.

So it seems like the only bits of state that get mutated by the framework (and reset appropriately for nested routes), are the SCRIPT_NAME and PATH_INFO, and @captures. Is that right?

@adamakhtar
Copy link
Member Author

thanks @cyx and @ericgj that helps out a lot. A few more things im not sure of but a rewalkthrough should clear things up.

One thing however, i see you mentioning the word 'state' a lot. Is this a cuba 'thing' or rack 'thing'. I feel a bit stupid asking :-) but as I said in the beginning - no such thing as a stupid question. What other constants like script_name are there reating to state?

@cyx
Copy link

cyx commented Jul 26, 2012

Hi Erik,

Here are all the stuff manipulated:

req - when you change SCRIPT_NAME and PATH_INFO, it technically changes.
res - when you write response

and yes @captures, but this is more internal state rather than something that the user should know.

@RoboDisco - it's more a Rack thing. More examples of state related to rack can be seen in middleware,
which use env a lot, env["rack.session"] is the most common example, and last I checked the warden
middleware uses env a lot too.

Thanks,
cyx

Yes @captures, SCRIPT_NAME and PATH_INFO are the only ones manipulated.
On Jul 26, 2012, at 11:50 PM, robodisco wrote:

thanks @cyx and @ericgj that helps out a lot. A few more things im not sure of but a rewalkthrough should clear things up.

One thing however, i see you mentioning the word 'state' a lot. Is this a cuba 'thing' or rack 'thing'. I feel a bit stupid asking :-) but as I said in the beginning - no such thing as a stupid question. What other constants like script_name are there reating to state?


Reply to this email directly or view it on GitHub:
#4 (comment)

@adamakhtar
Copy link
Member Author

Just realised I never mentioned everyone in this issue. Better late than never so....

oi @codereading/readers just done a walkthrough - come and check it out!

@adamakhtar
Copy link
Member Author

thanks @cyx!

@cyx
Copy link

cyx commented Jul 26, 2012

Hi @RoboDisco,

Good eye regarding the throw as a performance improvement. It's not much, but it was something like a 2-3ms improvement, depending on the number of on statements you have in your app (we tried with around 10 that time).

We used to do it differently and changed it somewhere along 2.x. Here's the sketch commit that I made 10 months ago: fe467d2

Thanks,
cyx

@cyx
Copy link

cyx commented Jul 26, 2012

Aside from the performance improvement, it was also a refactoring, since it used to be that run depended on a throw, and the normal flow didn't. We kinda hit 2 birds with one stone by making throw :halt, tuple the defacto way to tell Cuba that, "ok we're done, here's the response". Overall we're still happy with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants