Skip to content

Commit

Permalink
Added a pure-ruby version of the parser for times when you can't comp…
Browse files Browse the repository at this point in the history
…ile the C or Java extensions. You should avoid using it if at all possible because it is 32 times slower (and has some other problems, too)!

Squashed commit of the following:

commit d82359a74e543a6c25f5783983926c117045e0bb
Author: Jason Garber <jg@jasongarber.com>
Date:   Mon Mar 16 14:44:20 2009 -0400

    Update with best code style for pureruby (result of optimization was F1)

commit aa01f61fb075a50d44c29f0831c6b2cb3ee90bee
Author: Jason Garber <jg@jasongarber.com>
Date:   Mon Mar 16 13:17:32 2009 -0400

    Fix latex double-escaping some characters in pureruby parser

commit 7a09a7c25b2087d9b6f658b3d57443668cc41ef8
Author: Jason Garber <jg@jasongarber.com>
Date:   Mon Mar 16 09:08:08 2009 -0400

    Update profiler so rake pureruby optimize works

commit ffd93efc1228be4e245bde536518f60bd47aa6be
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 12:26:46 2009 -0400

    cleanup unneeded eof variable

commit dc1b0f89d1ff868c285ad1d0eaa2e494757950ea
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 11:46:30 2009 -0400

    Fix custom tag fallback

commit e10c378f57a232915efc42f8d78859aae55b7a0a
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 11:35:41 2009 -0400

    Fixed attributes getting escaped multiple times because of passing by reference

commit 20c50513f521299d1917fcefa7fb2aaaf149117e
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 11:13:12 2009 -0400

    fix list continuation (hope JRuby works; can't test from this machine)

commit 5000560e28945910d41b580bd3bb01fb75167536
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 10:31:03 2009 -0400

    Fix attribute parser running the wrong machine

commit 71d7f9d1c0fcd654b39c41de3a341e8f6d95c400
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 10:12:03 2009 -0400

    Fix TRANSFORM macro

commit 9824be7410ef9224c80227105059e732f9a908a3
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 10:02:42 2009 -0400

    Fixed comparison error when @reg was not set

commit 0f32766aa906bad9514198eab9694cee369d7cd5
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 09:43:56 2009 -0400

    Fix backup attributes not being set in regs properly.

commit 65a45406a8b81b2be5c43b8fb172b6129214e9d4
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 09:16:07 2009 -0400

    Fixed definition lists

commit 8441dc220d351264f6f7357eb681402d17d4e0fe
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 08:55:35 2009 -0400

    Fix capture of title/alt

commit c6b44a228c5d248e91fefa61da7da350279d10a3
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 08:27:59 2009 -0400

    Fix ignore action

commit 091fb62632ce21ef75fbd3575a185c7427461aa8
Author: Jason Garber <jg@jasongarber.com>
Date:   Sun Mar 15 03:04:35 2009 -0400

    Fixed lists and lookaheads, among other things.

commit 6d38837050d593db29d40d059e884ea3e0fd5b21
Author: Jason Garber <jg@jasongarber.com>
Date:   Fri Mar 13 16:41:20 2009 -0400

    Fix it up just a bit.

commit 96a54a7188c73c670d11e75e4177de4a39e944bb
Author: Jason Garber <jg@jasongarber.com>
Date:   Fri Mar 13 15:36:27 2009 -0400

    Switch back to instances, but with class methods for invoking.  Nearly working now.

commit 2437f528ffc5ca1f30f9774c2c8fe025e6216468
Author: Jason Garber <jg@jasongarber.com>
Date:   Thu Mar 12 17:16:34 2009 -0400

    Converted scanners to class methods & class variables. Nearly works.

commit 3d7c554e7cfca60c7f7a1fcc69b1cbfce54038d6
Author: Jason Garber <jg@jasongarber.com>
Date:   Thu Mar 12 14:32:51 2009 -0400

    Almost working.  Maybe should convert them to class methods, though.

commit c375474f4f2773458cb0b4b50c6ea5c3a49cca71
Author: Jason Garber <jg@jasongarber.com>
Date:   Tue Mar 10 16:42:32 2009 -0400

    Convert the rest to ruby and hook lots of things up.  Still not working, though.

commit 1b379c147b0b90aa314117014308f3d215b1c3b0
Author: Jason Garber <jg@jasongarber.com>
Date:   Mon Mar 9 12:32:53 2009 -0400

    wip

commit 3393bb4a9a0e59cb2455f68844e93b9ec4ba82b2
Author: Jason Garber <jg@jasongarber.com>
Date:   Mon Mar 9 09:31:47 2009 -0400

    wip (15 minutes)

commit ebfbe336bff92ec85fa7d7a5602f4670b5598606
Author: Jason Garber <jg@jasongarber.com>
Date:   Fri Mar 6 08:22:50 2009 -0500

    Add files that were missing before.  WIP on parser macros. (15 min)

commit e7a92338f3c2417b763bb9f8c8e75a733bb453f7
Author: Jason Garber <jg@jasongarber.com>
Date:   Fri Mar 6 08:10:46 2009 -0500

    Converted everything to pure Ruby.  Haven't tested yet, though. (100 minutes)
  • Loading branch information
jgarber committed Mar 16, 2009
1 parent 0142eb1 commit 11f8f8a
Show file tree
Hide file tree
Showing 21 changed files with 775 additions and 198 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Expand Up @@ -5,9 +5,11 @@ ext/redcloth_scan/*.so
ext/redcloth_scan/*.jar
ext/redcloth_scan/*.class
ext/redcloth_scan/*.java
ext/redcloth_scan/redcloth_*.rb
lib/*.bundle
lib/*.so
lib/*.jar
lib/redcloth_scan.rb
doc/rdoc/*
tmp/*
pkg/*
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG
@@ -1,5 +1,7 @@
=== Edge

* Added a pure-ruby version of the parser for times when you can't compile the C or Java extensions. You should avoid using it if at all possible because it is 32 times slower (and has some other problems, too)! [Jason Garber]

* Ignore spaces and tabs on blank lines between blocks. #120 [Jason Garber]

* Allow HTML tags with quoted attributes to be inside link text. To do this, I had to remove the possibility that attributes in HTML tags could have spaces around the equals sign or unquoted attributes. This change also greatly expands the complexity of the state machine, so compilation takes a long time. Sorry. [Jason Garber]
Expand Down
5 changes: 5 additions & 0 deletions Manifest
Expand Up @@ -6,15 +6,19 @@ ext/redcloth_scan/extconf.rb
ext/redcloth_scan/redcloth.h
ext/redcloth_scan/redcloth_attributes.c.rl
ext/redcloth_scan/redcloth_attributes.java.rl
ext/redcloth_scan/redcloth_attributes.rb.rl
ext/redcloth_scan/redcloth_attributes.rl
ext/redcloth_scan/redcloth_common.c.rl
ext/redcloth_scan/redcloth_common.java.rl
ext/redcloth_scan/redcloth_common.rb.rl
ext/redcloth_scan/redcloth_common.rl
ext/redcloth_scan/redcloth_inline.c.rl
ext/redcloth_scan/redcloth_inline.java.rl
ext/redcloth_scan/redcloth_inline.rb.rl
ext/redcloth_scan/redcloth_inline.rl
ext/redcloth_scan/redcloth_scan.c.rl
ext/redcloth_scan/redcloth_scan.java.rl
ext/redcloth_scan/redcloth_scan.rb.rl
ext/redcloth_scan/redcloth_scan.rl
extras/ragel_profiler.rb
lib/case_sensitive_require/RedCloth.rb
Expand All @@ -26,6 +30,7 @@ lib/redcloth/formatters/latex_entities.yml
lib/redcloth/textile_doc.rb
lib/redcloth/version.rb
lib/redcloth.rb
lib/tasks/pureruby.rake
Manifest
Rakefile
README
Expand Down
63 changes: 56 additions & 7 deletions Rakefile
Expand Up @@ -2,12 +2,13 @@ require 'lib/redcloth/version'
require 'rubygems'
gem 'echoe', '>= 3.0.1'
require 'echoe'
Dir["#{File.dirname(__FILE__)}/lib/tasks/*.rake"].sort.each { |ext| load(ext) }

e = Echoe.new('RedCloth', RedCloth::VERSION.to_s) do |p|
p.summary = RedCloth::DESCRIPTION
p.author = "Jason Garber"
p.email = 'redcloth-upwards@rubyforge.org'
p.clean_pattern += ['ext/redcloth_scan/**/*.{bundle,so,obj,pdb,lib,def,exp,c,o,xml,class,jar,java}', 'lib/*.{bundle,so,o,obj,pdb,lib,def,exp,jar}', 'ext/redcloth_scan/Makefile']
p.clean_pattern += ['ext/redcloth_scan/**/*.{bundle,so,obj,pdb,lib,def,exp,c,o,xml,class,jar,java}', 'lib/*.{bundle,so,o,obj,pdb,lib,def,exp,jar}', 'ext/redcloth_scan/**/redcloth_*.rb', 'lib/redcloth_scan.rb', 'ext/redcloth_scan/Makefile']
p.url = "http://redcloth.org"
p.project = "redcloth"
p.rdoc_pattern = ['README', 'COPING', 'CHANGELOG', 'lib/**/*.rb', 'doc/**/*.rdoc']
Expand All @@ -20,10 +21,14 @@ e = Echoe.new('RedCloth', RedCloth::VERSION.to_s) do |p|
p.platform = 'x86-mswin32-60'
elsif Platform.java?
p.platform = 'universal-java'
elsif RUBY_PLATFORM == 'pureruby'
p.platform = 'ruby'
end

if RUBY_PLATFORM =~ /mingw|mswin|java/
p.need_tar_gz = false
elsif RUBY_PLATFORM == 'pureruby'
p.need_gem = false
else
p.need_zip = true
p.need_tar_gz = true
Expand All @@ -36,6 +41,8 @@ e = Echoe.new('RedCloth', RedCloth::VERSION.to_s) do |p|
self.files += ['lib/redcloth_scan.so']
when /java/
self.files += ['lib/redcloth_scan.jar']
when 'pureruby'
self.files += ['lib/redcloth_scan.rb']
else
self.files += %w[attributes inline scan].map {|f| "ext/redcloth_scan/redcloth_#{f}.c"}
end
Expand All @@ -45,7 +52,9 @@ e = Echoe.new('RedCloth', RedCloth::VERSION.to_s) do |p|

end

#### Pre-compiled extensions for alternative platforms
def remove_other_platforms
Dir["lib/redcloth_scan.{bundle,so,jar,rb}"].each { |file| rm file }
end

def move_extensions
Dir["ext/**/*.{bundle,so,jar}"].each { |file| mv file, "lib/" }
Expand Down Expand Up @@ -75,6 +84,7 @@ when /mingw/
ruby "-I. extconf.rb"
system(PLATFORM =~ /mswin/ ? 'nmake' : 'make')
end
remove_other_platforms
move_extensions
rm "#{ext}/rbconfig.rb"
end
Expand All @@ -86,9 +96,19 @@ when /java/
sources = FileList["#{ext}/**/*.java"].join(' ')
sh "javac -target 1.5 -source 1.5 -d #{ext} #{java_classpath_arg} #{sources}"
sh "jar cf lib/redcloth_scan.jar -C #{ext} ."
remove_other_platforms
move_extensions
end

when /pureruby/
filename = "lib/redcloth_scan.rb"
file filename => FileList["#{ext}/redcloth_scan.rb", "#{ext}/redcloth_inline.rb", "#{ext}/redcloth_attributes.rb"] do |task|

remove_other_platforms
sources = task.prerequisites.join(' ')
sh "cat #{sources} > #{filename}"
end

else
filename = "#{ext}/redcloth_scan.#{Config::CONFIG['DLEXT']}"
file filename => FileList["#{ext}/redcloth_scan.c", "#{ext}/redcloth_inline.c", "#{ext}/redcloth_attributes.c"]
Expand All @@ -97,8 +117,21 @@ end
task :compile => [filename]

def ragel(target_file, source_file)
host_language = (target_file =~ /java$/) ? "J" : "C"
code_style = (host_language == "C") ? " -" + (@code_style || "T0") : ""
host_language = case target_file
when /java$/
"J"
when /rb$/
"R"
else
"C"
end
preferred_code_style = case host_language
when "R"
"F1"
else
"T0"
end
code_style = " -" + (@code_style || preferred_code_style)
ensure_ragel_version(target_file) do
sh %{ragel #{source_file} -#{host_language}#{code_style} -o #{target_file}}
end
Expand Down Expand Up @@ -129,23 +162,39 @@ file "#{ext}/RedclothAttributes.java" => ["#{ext}/redcloth_attributes.java.rl",
ragel "#{ext}/RedclothAttributes.java", "#{ext}/redcloth_attributes.java.rl"
end

# Ragel-generated pureruby files
file "#{ext}/redcloth_scan.rb" => ["#{ext}/redcloth_scan.rb.rl", "#{ext}/redcloth_scan.rl", "#{ext}/redcloth_common.rb.rl", "#{ext}/redcloth_common.rl"] do
ragel "#{ext}/redcloth_scan.rb", "#{ext}/redcloth_scan.rb.rl"
end
file "#{ext}/redcloth_inline.rb" => ["#{ext}/redcloth_inline.rb.rl", "#{ext}/redcloth_inline.rl", "#{ext}/redcloth_common.rb.rl", "#{ext}/redcloth_common.rl"] do
ragel "#{ext}/redcloth_inline.rb", "#{ext}/redcloth_inline.rb.rl"
end
file "#{ext}/redcloth_attributes.rb" => ["#{ext}/redcloth_attributes.rb.rl", "#{ext}/redcloth_attributes.rl", "#{ext}/redcloth_common.rb.rl", "#{ext}/redcloth_common.rl"] do
ragel "#{ext}/redcloth_attributes.rb", "#{ext}/redcloth_attributes.rb.rl"
end


#### Optimization

# C/Ruby code styles
RAGEL_CODE_GENERATION_STYLES = {
'T0' => "Table driven FSM (default)",
'T1' => "Faster table driven FSM",
'F0' => "Flat table driven FSM",
'F1' => "Faster flat table-driven FSM",
'F1' => "Faster flat table-driven FSM"
}
# C only code styles
RAGEL_CODE_GENERATION_STYLES.merge!({
'G0' => "Goto-driven FSM",
'G1' => "Faster goto-driven FSM",
'G2' => "Really fast goto-driven FSM"
}
}) if RUBY_PLATFORM !~ /pureruby/

desc "Find the fastest code generation style for Ragel"
task :optimize do
require 'extras/ragel_profiler'
results = []

RAGEL_CODE_GENERATION_STYLES.each do |style, name|
@code_style = style
profiler = RagelProfiler.new(style + " " + name)
Expand All @@ -162,7 +211,7 @@ task :optimize do
profiler.measure(:test) do
Rake::Task['test'].invoke
end
profiler.ext_size(ext_so)
profiler.ext_size(filename)

end
puts RagelProfiler.results
Expand Down
13 changes: 9 additions & 4 deletions ext/redcloth_scan/redcloth.h
Expand Up @@ -51,6 +51,8 @@ VALUE red_pass_code(VALUE, VALUE, VALUE, ID);
/* parser macros */
#define CLEAR_REGS() regs = rb_hash_new();
#define RESET_REG() reg = NULL
#define MARK() reg = p;
#define MARK_B() bck = p;
#define CAT(H) rb_str_cat(H, ts, te-ts)
#define CLEAR(H) H = STR_NEW2("")
#define RSTRIP_BANG(H) rb_funcall(H, rb_intern("rstrip!"), 0)
Expand All @@ -62,19 +64,19 @@ VALUE red_pass_code(VALUE, VALUE, VALUE, ID);
#define PARSE_ATTR(A) red_parse_attr(self, regs, ID2SYM(rb_intern(A)))
#define PARSE_LINK_ATTR(A) red_parse_link_attr(self, regs, ID2SYM(rb_intern(A)))
#define PARSE_IMAGE_ATTR(A) red_parse_image_attr(self, regs, ID2SYM(rb_intern(A)))
#define PASS_CODE(H, A, T, O) rb_str_append(H, red_pass_code(self, regs, ID2SYM(rb_intern(A)), rb_intern(T)))
#define PASS_CODE(H, A, T) rb_str_append(H, red_pass_code(self, regs, ID2SYM(rb_intern(A)), rb_intern(T)))
#define ADD_BLOCK() \
rb_str_append(html, red_block(self, regs, block, refs)); \
extend = Qnil; \
CLEAR(block); \
CLEAR_REGS()
#define ADD_EXTENDED_BLOCK() rb_str_append(html, red_block(self, regs, block, refs)); CLEAR(block);
#define END_EXTENDED() extend = Qnil; CLEAR_REGS();
#define IS_NOT_EXTENDED() NIL_P(extend)
#define ADD_BLOCKCODE() rb_str_append(html, red_blockcode(self, regs, block)); CLEAR(block); CLEAR_REGS()
#define ADD_EXTENDED_BLOCKCODE() rb_str_append(html, red_blockcode(self, regs, block)); CLEAR(block);
#define ASET(T, V) rb_hash_aset(regs, ID2SYM(rb_intern(T)), STR_NEW2(V));
#define AINC(T) red_inc(regs, ID2SYM(rb_intern(T)));
#define INC(N) N++;
#define SET_ATTRIBUTES() \
SET_ATTRIBUTE("class_buf", "class"); \
SET_ATTRIBUTE("id_buf", "id"); \
Expand Down Expand Up @@ -141,6 +143,9 @@ VALUE red_pass_code(VALUE, VALUE, VALUE, ID);
#define STORE_LINK_ALIAS() \
rb_hash_aset(refs_found, rb_hash_aref(regs, ID2SYM(rb_intern("text"))), rb_hash_aref(regs, ID2SYM(rb_intern("href"))))
#define CLEAR_LIST() list_layout = rb_ary_new()
#define SET_LIST_TYPE(T) list_type = T;
#define NEST() nest ++;
#define RESET_NEST() nest = 0;
#define LIST_ITEM() \
int aint = 0; \
VALUE aval = rb_ary_entry(list_index, nest-1); \
Expand All @@ -152,9 +157,9 @@ VALUE red_pass_code(VALUE, VALUE, VALUE, ID);
if (nest > RARRAY_LEN(list_layout)) \
{ \
sprintf(listm, "%s_open", list_type); \
if (list_continue == 1) \
if (!NIL_P(rb_hash_aref(regs, ID2SYM(rb_intern("list_continue"))))) \
{ \
list_continue = 0; \
rb_hash_aset(regs, ID2SYM(rb_intern("list_continue")), Qnil); \
rb_hash_aset(regs, ID2SYM(rb_intern("start")), rb_ary_entry(list_index, nest-1)); \
} \
else \
Expand Down
63 changes: 63 additions & 0 deletions ext/redcloth_scan/redcloth_attributes.rb.rl
@@ -0,0 +1,63 @@
#
# redcloth_attributes.rb.rl
#
# Copyright (C) 2009 Jason Garber
#

%%{

machine redcloth_attributes;
include redcloth_common "redcloth_common.rb.rl";
include redcloth_attributes "redcloth_attributes.rl";

}%%

module RedCloth
class RedclothAttributes < BaseScanner
def self.redcloth_attributes(str)
self.new.redcloth_attributes(str)
end

def self.redcloth_link_attributes(str)
self.new.redcloth_link_attributes(str)
end

def redcloth_attribute_parser(cs, data)
@data = data + "\0"
@regs = {}
@p = 0
@pe = @data.length

%% write init; #%

@cs = cs

%% write exec; #%

return @regs
end

def redcloth_attributes(str)
self.cs = self.redcloth_attributes_en_inline
return redcloth_attribute_parser(cs, str)
end

def redcloth_link_attributes(str)
self.cs = self.redcloth_attributes_en_link_says;
return redcloth_attribute_parser(cs, str)
end

def initialize
%%{
variable data @data;
variable p @p;
variable pe @pe;
variable cs @cs;
variable ts @ts;
variable te @te;

write data nofinal;
}%%
end
end
end
2 changes: 2 additions & 0 deletions ext/redcloth_scan/redcloth_common.c.rl
Expand Up @@ -14,5 +14,7 @@
action starts_phrase {
p == orig_p || *(p-1) == '\r' || *(p-1) == '\n' || *(p-1) == '\f' || *(p-1) == ' '
}
action extended { !NIL_P(extend) }
action not_extended { NIL_P(extend) }

}%%;
2 changes: 2 additions & 0 deletions ext/redcloth_scan/redcloth_common.java.rl
Expand Up @@ -14,5 +14,7 @@
action starts_phrase {
p == orig_p || data[(p-1)] == '\r' || data[(p-1)] == '\n' || data[(p-1)] == '\f' || data[(p-1)] == ' '
}
action extended { !extend.isNil() }
action not_extended { extend.isNil() }

}%%;
20 changes: 20 additions & 0 deletions ext/redcloth_scan/redcloth_common.rb.rl
@@ -0,0 +1,20 @@
%%{

machine redcloth_common;
include redcloth_common "redcloth_common.rl";

action esc { rb_str_cat_escaped(@block, @ts, @te); }
action esc_pre { rb_str_cat_escaped_for_preformatted(@block, STR_NEW(@ts, @te-@ts)); }
action ignore { @block << @textile_doc.ignore(@regs); }

# conditionals
action starts_line {
@p == 0 || @data[(@p-1), 1] == "\r" || @data[(@p-1), 1] == "\n" || @data[(@p-1), 1] == "\f"
}
action starts_phrase {
@p == 0 || @data[(@p-1), 1] == "\r" || @data[(@p-1), 1] == "\n" || @data[(@p-1), 1] == "\f" || @data[(@p-1), 1] == " "
}
action extended { !@extend.nil? }
action not_extended { @extend.nil? }

}%%;
6 changes: 3 additions & 3 deletions ext/redcloth_scan/redcloth_common.rl
Expand Up @@ -2,8 +2,8 @@

machine redcloth_common;

action A { reg = p; }
action B { bck = p; }
action A { MARK(); }
action B { MARK_B(); }
action T { STORE("text"); }
action X { CLEAR_REGS(); RESET_REG(); }
action cat { CAT(block); }
Expand Down Expand Up @@ -34,7 +34,7 @@
S = ( S_CSPN | S_RSPN )* ;
C = ( C_CLAS | C_STYL | C_LNGE )* ;
D = ( D_HEADER ) ;
N_CONT = "_" %{ list_continue = 1; };
N_CONT = "_" %{ ASET("list_continue", "true"); };
N_NUM = digit+ >A %{ STORE("start"); };
N = ( N_CONT | N_NUM )? ;
PUNCT = ( "!" | '"' | "#" | "$" | "%" | "&" | "'" | "," | "-" | "." | "/" | ":" | ";" | "=" | "?" | "\\" | "^" | "`" | "|" | "~" | "[" | "]" | "(" | ")" | "<" ) ;
Expand Down
4 changes: 2 additions & 2 deletions ext/redcloth_scan/redcloth_inline.c.rl
Expand Up @@ -29,7 +29,7 @@ red_parse_attr(VALUE self, VALUE regs, VALUE ref)
{
VALUE txt = rb_hash_aref(regs, ref);
VALUE new_regs = redcloth_attributes(self, txt);
return rb_funcall(regs, rb_intern("update"), 1, new_regs);
return rb_funcall(regs, rb_intern("merge!"), 1, new_regs);
}

VALUE
Expand All @@ -38,7 +38,7 @@ red_parse_link_attr(VALUE self, VALUE regs, VALUE ref)
VALUE txt = rb_hash_aref(regs, ref);
VALUE new_regs = red_parse_title(redcloth_link_attributes(self, txt), ref);

return rb_funcall(regs, rb_intern("update"), 1, new_regs);
return rb_funcall(regs, rb_intern("merge!"), 1, new_regs);
}

VALUE
Expand Down

0 comments on commit 11f8f8a

Please sign in to comment.