Integration master performance #20

Merged
merged 59 commits into from Mar 9, 2012

2 participants

@MattiSG
Owner

This is it. Could you please ant test on your Linux machines to make sure this can be merged into master? Can't wait… ;)

and others added some commits Mar 5, 2012
@MattiSG Heavy index and deletion management refactor: abstracted everything i…
…n `NodeMappedObject`.

Indexes are now implicitly named after the class name of the specific NMO, or use an overriding `INDEX_KEY`, instead of explicitly naming them all the time.
Deletion now uses an `onDelete` hook instead of specifying deletion on their own.
5ecfe8b
@MattiSG Now using new deletion management within Definitions.
Corrected `onDelete` visibility.
8737ad6
@MattiSG Implemented `toString` on `NodeMappedObject`, removed the asterisk on…
… `MutableWord`'s one.
6305b22
@MattiSG Added `LexicalCategory` class. ce8a8c2
@MattiSG Added tests and logic for `LexicalRelation` parsing. 189416c
@MattiSG Abstracted indexing transactions management into NMO. 7a3c9f3
@MattiSG Added lexical categories tests. 33730cc
@MattiSG Improved all deletion tests. bc4683a
@MattiSG Added lexical categories management to words. e1fa0c6
@MattiSG Minor MutableWord documentation improvement. 8df53bc
@MattiSG Split LexicalCategory in Mutable / immutable classes. 515c91c
@MattiSG `MutableWord.create` -> `MutableWord(String)` 5757f79
@MattiSG Corrected `DefinitionTest` imports. 18982d2
@MattiSG Updated `LexicalCategory` tests. 5e846e3
@MattiSG Minor documentation improvements. 03dae1e
@MattiSG Corrected `MutableLexicalCategoryTest` inheritance. 51189b4
@MattiSG `LexicalCategory` deletion won't work, so it will simply not be allow…
…ed for the moment. Use cases are minimal.
d76d6ac
@MattiSG Changed indexing method to associate `pattern -> true` rather than `i…
…ndexName -> pattern`, to allow multiple elements indexing.
17c5aa3
@MattiSG Added `LazyPatternsManager` class. a533b4f
@MattiSG `LEXICALCATEGORY` -> `LEXICAL_CATEGORY` cb851b7
@MattiSG Added exception stacktrace printing in case of parser exception. a180ccd
@MattiSG Added models parsing! 50b2f49
@MattiSG Added lexical categories printing in `Main`. 9a6af92
@MattiSG Diverse corrections to parser and lazy patterns registration. 1dcaea1
@MattiSG Replaced `{{-verb-}}` by `{{-nom-}}` to have some test data. 04379da
@MattiSG Lexical categories parsing works. c0fa648
@MattiSG Minor performance improvement: not logging `|` pattern parameters sep…
…arator.
fdbaf20
@MattiSG Improved performance by listing trashed sections. fcba519
@MattiSG Much improved natures parsing performance by trashing most common unt…
…reated sections.
d3c9dc5
@MattiSG Corrected <TRASH> to exit on section end. 95e6b9a
@MattiSG Corrected <PRONUNCIATION> to stop generating false mismatched bracket…
…ing.
f15e9a6
@MattiSG Corrected <PATTERN> to stop logging "unexpected" messages on table ma…
…rkers (`(-)`).
08918f0
@MattiSG Added natures (lexical categories) parsing.
Added needed helper methods to `NodeMappedObject`.

Merge branch 'refs/heads/master' into natures

Conflicts:
	src/edu/unice/polytech/kis/semwiktionary/SemWiktionary.java
	src/edu/unice/polytech/kis/semwiktionary/database/Relation.java
	src/edu/unice/polytech/kis/semwiktionary/model/MutableWord.java
	src/edu/unice/polytech/kis/semwiktionary/parser/WikimediaDump.jflex
	test/edu/unice/polytech/kis/semwiktionary/model/WordTest.java
	test/edu/unice/polytech/kis/semwiktionary/parser/ParserTest.java
	test/resources/frwiktionary-test-extracts.xml
06f5238
@MattiSG Merged `<TRASH>` and `<CONTENT_WORD>` into `<WORD_ENTRY>`. Corrected …
…language leaks after `<TRASH>`.
af60420
@MattiSG Added support and tests for `&quot;` HTML entity. 8008a0b
@MattiSG Activated HTML entities matching in `<SECTION>` and other states by p…
…reventing catch-alls to match `&`.
a804fca
@MattiSG Trying to optimize performance by caching indexes. ae50d75
@MattiSG Making `find` methods static, for the sake of performance :( 2669037
@MattiSG Removing `tick` (individual word parse time), replacing nanoseconds b…
…y milliseconds .

(20% of parse time spent in nanos calculation according to profiler)
d73b9b1
@MattiSG Adding time calculations to `FREQUENCY` regular parser messager. 9c751e5
@MattiSG Adding a missing transaction validation, removing useless imports. b5b819b
@MattiSG Much improved documentation for `NodeMappedObject`. 1ba35b1
@MattiSG Went back to using `INDEX_KEY -> key` indexing rather than the `key -…
…> true` (revert 17c5aa3) due to exponential indexing complexity.
2441142
@MattiSG Corrected Mediawiki exit fallbacks. Definitions immediately followed …
…by `</text>` are now stored.

Greatly minimized automaton by putting fallbacks at the end of the parser.
Passes "berdeller".
35a7ad5
@MattiSG Improved states documentation. 8987ef5
@MattiSG Added heuristics for decoding HTML entities.
Passes "primitive".
cfb0c1d
@MattiSG Corrected space matching after `&lt;` entity.
Passes "cérémonie".
5fd1456
@MattiSG Corrected and enhanced time estimations. 001e043
@MattiSG Corrected and improved time calculation. f616d69
@MattiSG Simplified and improved time calculation. 5eb3892
@MattiSG Added sections to natures blacklist. aa126e7
@RubixR4 RubixR4 Created JUnit test for WordEntry. 6fd27c0
@RubixR4 RubixR4 Created JUnit test for WordEntry. 97699e3
@MattiSG Integrated natures and master. This is the wannabe v0.3.0. 11325ef
@MattiSG Corrected natures printing in main executable. 902c11b
@MattiSG Corrected test for "fauchaisons" and "fédéralismes" (<WORD_ENTRY> tra…
…sh state leaking).
432cbfc
@MattiSG Corrected parser to save definitions when entry ends (`</text>`) afte…
…r a definition.
03e374c
@MattiSG Corrected model parsing on Linux.
Made "Modèle" length calculated, instead of hardcoding a unicode char's byte size.
9be6c72
@MattiSG Updated README: lexical categories are now indeed offered, as well as…
… related vocabulary.
45e037e
@MattiSG MattiSG merged commit 45e037e into master Mar 9, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment