compromise uses semver, and pushes to npm frequently
(github-releases occasionally)
- Major is a breaking api change - method or response changes that can cause runtime errors.
- Minor is a behaviour change - Tagging or grammar changes.
- Patch is an obvious, non-controversial bugfix.
While all Major releases should be reviewed, our only two large releases are v6 in 2016 and and v12 in 2019. Others have been mostly incremental, or niche.
Major release - see Release Notes for full details
- [breaking] - remove
.parent()
and.parents()
chain - (use.all()
instead) - [breaking] - remove
@titleCase
alias (use @isTitleCase) - [breaking] - remove '.get()' alias - use '.eq()'
- [breaking] - remove
.json(0)
shorthand - use.json()[0]
- [breaking] - remove
.tagger()
- use .compute('tagger') - [breaking] - remove
.export()
-> .load() - use .json() -> nlp(json) - [breaking] - remove
nlp.clone()
- [breaking] - remove
.join()
deprecated - [breaking] - remove
.lists()
deprecated - [breaking] - remove
.segment()
deprecated - [breaking] - remove
.sententences().toParticiple()
&.verbs().toParticiple()
- [breaking] - remove
.nouns().toPossessive()
&.nouns().hasPlural()
- [breaking] - remove array support in match methods - (use
.match().match()
instead) - [breaking] - refactor
.out('freq')
output format - (uses.compute('freq').terms().unique().json()
instead) - [breaking] - change
.json()
result format for subsets - [change] merge re-used capture-group names in one match
- [change] drop support for undocumented empty '.split()' methods - which used to split the parent
- [change] subtle changes to
.text('fmt')
formats - [change] @hasContraction is no-longer secretly-greedy. use
@hasContraction{2}
- [change]
.and()
now does a set 'union' operation of results (no overlaps) - [change] bestTag is now
.compute('tagRank')
- [change]
.sort()
is no longer in-place (its now immutable) - [change] drop undocumented options param to
.replaceWith()
method - [change] add match-group as 2nd param to split methods
- [change] remove #FutureTense tag - which is not really a thing in english
- [change]
.unique()
no-longer mutates parent - [change]
.normalize()
inputs cleanup - [change] drop agreement parameters in .numbers() methods
- [change] - less-magical money parsing -
nlp('50 cents').money().get()
is no-longer0.5
- [change] - .find() does not return undefined on an empty result anymore
- [change] - fuzzy matches must now be wrapped in tildes, like
~this~
- [new]
.union()
, .intersection(), .difference() and .complement() methods - [new]
.confidence()
method - approximate tagging confidence score for arbitrary selections - [new]
.settle()
- remove overlaps in matches - [new]
.isDoc()
- helper-method for comparing two views - [new]
.none()
- helper-method for returning an empty view of the document - [new]
.toView()
method - drop back to a normal Class instance - [new]
.grow()
.growLeft()
and.growRight()
methods - [new] add punctuation match support via pre/post params
- [new] add ambiguous empty .map() state as 2nd param
- [fix] - regex backtracing issue #847 (thanks @srubin)
- misc tagging fixes update deps
- [fix] - verbphrase conjugation fixes
- [fix] - verbphrase tagger fixes
- [fix] - url tagging regex improvements (thanks Axay!) update deps plugin-releases: dates
- [fix] - obscure runtime error in capture-groups update deps plugin-releases: typeahead
- [change] - use babel default build target (drop ie11 polyfill)
- [change] - dont compile esm build w/ babel anymore
- [fix] - sentence conjugation fixes
- [fix] - improvements to phrasal verbs
- [change] - keep tokenization for some more dashed suffixes like 'snail-like' plugin-releases: dates, numbers, sentences
- [change] - tokenize '2 - 5' as NumerRange, like '2-5' is
- [fix] - edge-cases for URLs with numbers
- [fix] - some sentences.toPastTense() fixes
- [fix] - 'n weekends from now' math in plugin-date plugin-releases: dates, sentences
- [fix] - support more time-ranges plugin-releases: dates@2.0.2
- [new] - support Time-range like '3pm-4pm'
- [change] - cleanup some unicode regexes plugin-releases: dates
- [fix] - match syntax tokenization fix
- [change] - improved performance monitoring
- [fix] - support complicated regular-expressions in match syntax
- improved performance testing
- [fix] - support matching implicit terms in (or|blocks)
- [change] - add #Timezone tag (from date-plugin)
- [change] - add many more cities and regions
- [change] - #Date terms can still be a #Conjunction
- [new] - #Imperative tag and
.verbs().isImperative()
method - [fix] - some tagger issues
- update deps plugin-releases: dates
- [new] - #Fraction tag and improved fraction support (thanks Jakeii!)
- [fix] - edge-case match issues with
!
syntax - [change] - update deps
- updates for
compromise-dates@1.4.0
,compromise-numbers@1.2.0
- [fix] - fix weird ordering issue with named exports #815
- [fix] - typescript issue
- [fix] - matches over a contraction
- [new] - add 'implicit' text output
- [new] - World.addConjugations() method
- [new] - World.addPlurals() method
- [new] - start compromise-penn-tags plugin
- [new] - add fuzzy option to match commands
- [new] - support multiple-word matches in OR matches (a|b|foo bar|c)
- [change] (internal) - rename 'oneOf' match syntax to 'fastOr'
- [change] - use new export maps format
- [fix] - conjugations fixes #800
- [fix] - tokenization fixes #801
- [change] improved support for fractions in numbers-plugin #793
- [change] remove zero-width characters in normalized output #759
- [change] improved Person tagging with particles #794
- [change] improved i18n Person names
- [change] tagger+tokenization fixes
- [change] remove empty results from .out('array') #795
- [change]
.tokenize()
runs any postProcess() scripts from plugins - [change] improved support for lowercase acronyms
- [change] - support years like '97
- [change] - change tokenizer for '20-aug'
- [change] - update deps of all plugins
- [fix] - NumberRange tagging issue #795
- [fix] - improved support for ordinal number ranges
- [fix] - improved regex support in match-syntax
- [fix] - improved support for
softmatch syntax #797 - [fix] - better handling of
{0,n}
match syntax - [new] - new plugin
strict-match
- [new] - set NounPhrase, VerbPhrase tags in nlp-sentences plugin
- [new] -
.phrases()
method in nlp-sentences plugin - [new] - support
.apppend(doc)
and.prepend(doc)
- [new] -
values.normalize()
method
- [change] many misc tagging fixes
- 'if' is now a #Preposition
- possessive pronouns are #Pronoun and #Possessive
- more phrasal verbs
- make #Participle tag #PastTense
- favor #PastTense over #Participle interpretation in tagger
- [change]
@hasHyphen
returns false for sentence dashes - a lot more testing
- [new] first-attempt at
verbs().subject()
method - [change] avoid conjugating imperative tense - 'please close the door'
- [change] misc tagging fixes #786
- [change] .nouns() results split on quotations #783
- [change] NumberRange must be < 4 digits #735
- [change] reduction in #Person tag false-positives
- [new] add
.parseMatch()
method for pre-parsing match statements
- [change] stop including adverbs and some auxiliaries in
.conjugate()
results - [change] .append() and .prepend() on an empty doc now creates a new doc
- [new] add
verbs().toParticiple()
method (add to observables/verb) - [new] add
sentences().toParticiple()
method (add to observables/verb) - [fix] some verb-tagging issues
- [fix] contractions issue in
.clone()
- [fix] try harder to retain modal-verbs in conjugation - 'i should drive' no-longer becomes 'i will drive'
- fix for offset issue #771
- fix for
{min,max}
syntax #767 - typescript fixes
- update deps
-support unicode spaces for #759
- major improvements to
compromise-plugin-dates
(1.0.0)
- bugfixes (conjugation and tagging) 752, 737, 725, 751, 743 748, 755, 758, 706, 761
- support tokenized array as input
- update deps
- bugfix updates to
plugin-sentences
, andplugin-dates
- deprecate
.money()
and favour overloaded method in compromise-numbers plugin - add
.percentages()
and.fractions()
to compromise-numbers plugin - add
.hasAfter()
and.hasBefore()
methods - change handling of slashes
- add
.world()
method to constructor - add more abbreviations
- fix regex backtracking #739
- tokenize build:
-
- remove conjugation and inflection data
-
- remove conjugation and inflection functions
- remove sourcemap from build process (too big)
- improvements to
.numbers().units()
- fix for linked-list runtime error #744 with contractions
- fix
verbs.json()
runtime-error - improve empty
.lists()
methods - allow custom tag colors
- test new github action workflow
- significant (~30%) speed up of parsing
- change sensitivity of input in
.lookup()
for major speed improvements. - improved typescript types
- subtle changes to internal caching
- adds 'oneOf' match syntax param
- fixes
[word?]
syntax parsing
major changes to .export()
and [capture] group
match-syntax.
- [breaking] move .export() and .load() methods to plugin (compromise-export)
-
- change .export() format - this hasn't worked properly since v12. (mis-parsed contractions) see #669
- [breaking] split
compromise-output
intocompromise-html
andcompromise-hash
plugins - [breaking]
.match('foo [bar]')
no-longer returns 'bar'. (use.match('foo [bar]', 0)
) - [breaking] capture groups are no longer merged.
.match('[foo] [bar]')
returns two groups accessible with the new.groups()
function - [breaking] change
.sentences()
method to return only full-sentences of matches (use.all()
instead)
modifications:
- [fix] - nlp.clone() - hasn't worked properly, since v12. (@Drache93)
- [fix] - issues with greedy capture [*] and [.+] -(@Drache93) 💛
- add whitespace properties (pre+post) to default json output (suppress with
.json({ whitespace: false })
) .lookup({ key: val })
with an object now returns an object back ({val: Doc})- add nlp constructor as a third param to
.extend()
- support lexicon object param in tokenize -
.tokenize('my word', { word: 'tag' })
- clean-up of scripts and tooling
- improved typescript types
- add support for some french contractions like
j'aime -> je aime
- allow null results in
.map()
function - better typescript support
- allow longer acronyms
- [fix] - offset length issue
- [new] - add new named-match syntax, with .groups() method (@Drache93)
- [new] - add
nlp.fromJSON()
method - [new] - add a new
compromise-tokenize.js
build, without the tagger, or data included.
- prefer
@titleCase
instead of#TitleCase
tag - update dependencies
- fix case-sensitive paths
- fix greedy-start match condition regression #651
- fix single period sentence runtime error
- fix potentially-unsafe regexes
- improved tagging for '-ed' verbs (#616)
- improve support for auxilary-pastTense ('was lifted') verb-phrases
- more robust number-tagging regexes
- setup typescript types for plugins #661 (thanks @Drache93!)
- verb conjugation and tagger bugfixes
- disambiguate between acronyms & yelling
- fix 'aint' contraction
- make Doc.world writable
- update deps
- more tests
- fix shared period with acronym at end of sentence
- fix some mis-classification of contraction
- fix over-active emoji regex
- tag 'cookin', 'hootin' as
Gerund
- support unicode single-quote symbols in contractions
- improved splitting in .nouns()
- add
.nouns().adjectives()
method - add
concat
param to.pre()
and.post()
- allow ellipses at start of term "....so" in
@hasEllipses
- fix matches with optional-end
foo?$
match syntax - add typescript types for subsets
- add 'sideEffect:false' flag to build
- considerable speedup (20%) in tagger
- ensure trimming of whitespace for root/clean/reduced text formats
- fix client-side logging
- more flexible params to
replace()
andreplaceWith()
- see Release Notes
- support singular units in
.value()
.quotations()
no-longer return repeated results for nested quotes- simplify quotation tagset
.out('normal')
no longer includes quotes or trailing-possessives- improve
.debug()
on client-side
- better honorific support, add
honorifics
feature to .normalize() - elipses bugfixes
- replace unicode chars in
.normalize()
now by default acronyms().stripPeriods()
andacronyms().addPeriods()
- tag professions as
#Actor
- add more behaviours to
.normalize()
- support match-results as inputs to .match() and .not()
- support some us-state abbreviations like 'Phoeniz AZ'
- add
nouns().toPossessive()
- ngrams now remove empty-terms in contractions - fixes counting issue #476
- expose internal
sentences().isQuestion()
method .join()
as an alias for.flatten()
- slightly different behavior for wildcards in capture-groups pull/472
.possessives()
subset +#Possessive
tagging fixes- hide massive
world
output for console.log of a term
- improve quotations() method
- add .parentheses() method
- add 'nickname' support to .people()
- 'will be #Adjective' now tagged as Copula
- include adverbs in verb conjugation (more) consistently
sentences().toContinuous()
andverbs().toGerund()
- some more aliases for jquery-like methods api
- move
getPunctuation
,setPunctuation
from .sentence to main Text method - rename internal
endPunctuation
togetPunctuation
- more consistent
cardinal/ordinal
tagging for values
- add #Abbreviation tag
- add #ProperNoun tag
- fixes for noun inflection
- include old ending punctuation in a
.replace()
cmd
- almost-double the support for first-names
- changes to bestTag method
- rolls-back some aggressive JustesonKatz stuff
- better support for emdash numberRange
- 'can't' contraction bugfix
- fix for dates().toShortForm()
- add
#Multiple
Values tag, and changes to how invalid numbers like 'sixty fifteen hundred' are understood - better em-dash/en-dash support
- better conjugate implicit verbs inside contractions - "i'm", "we've"
- nouns().articles() method
- neighborhoods as #Place
- support more complex noun-phrases with JustesonKatz in
.nouns()
- support for persistent lexicon/tagset changes
addTags, addWords, addRegs, addPlurals, addConjugations
methods to extend native data-
.plugin()
method to wrap all of these into one
-
- (removal of
.packWords()
method)
- (removal of
- more
.organizations()
matches - regex-support in .match() -
nlp('it is waaaay cool').match('/aaa/').out()//'waaaay'
- improved apostrophe-s disambiguation
- support whitespace before sentence boundary
- improved QuestionWord tagging, some
.questions()
without a question-mark - phrasalVerb conjugation
- new #Activity tag for Gerunds as nouns 'walking is fun'
- change ngram params to an object
{size:int, max:int}
- implement '[]' capture-group syntax in .match()
- bring-back
map, filter, foreach and reduce
methods - set
.words()
as alias for .terms() people().firstNames()
,people().lastNames()
- split-out comma-separated adverbs
- fix for '.watch' reserved word in efrt
- improved
places()
parsing - improved
{min,max}
match syntax - new
.out('match')
method - quiet addition of .pack() and .unpack() for owen
- move internal lexicon around, to support new format in v11
- added states & provinces as #Region
- added #Comparable tag for adjectives that conjugate
- add increment/decrement/add/subtract methods to .values()
- add units(), noUnits() methods to .values()
- 'uncountable' nouns are no longer assumed to be singular
- money tag is no longer always a value
- improved tagging of
VerbPhrase
andCondition
- fixes to contractions in sentence-changes - "i'm going -> i went"
- several verb conjugation fixes
- accept Terms & Result objects in .match() and .replace()
- new
Percent
tag - lump more units in with
.values()
- .trim() method,
- adjective tagging fixes
- some new .out() methods
- fix return format of .isPlural(), so it acts like a match filter
- less-greedy date tagging & ambiguous month fixes
- cleanup & rename some
.value()
methods - change lumping behaviour of lexicon terms with multiple words
- keep more former tags after a term replace method
- new
.random()
method - new
.lessThan()
,.greaterThan()
,.equalTo()
methods - new prefix/suffix/infix matches with
_ffix
syntax tag()
supports a sequence of tags for a sequence of terms- .match 'range' queries now use a real match -
#Adverb{2,4}
- new
.before()
and.after()
match methods - removes
.lexicon()
method for many-lexicons concept - changes params of
.replaceWith()
method to a 'keyTags' boolean - improved .debug() and logging on client-side
- pretty-real filesize reduction by swapping es6 classes for es5 inheritance
- rename
Term.tag
object toTerm.tags
so the.tag()
method can work throughout more-consistently - fix 'Auxillary' tag typo to 'Auxiliary'
- optimisation of .match(), and tagset - significant speedup!
- adds
.tagger()
method and cleanup extra params - adds
wordStart
andwordEnd
offsets to.out('offset')
for whitespace+punctuation - new
.has()
method for faster lookups
- add
nlp.out('index')
method, 12 bugs
- add
nlp.tokenize()
method for disabling pos-tagging of input
- less-ambitious date-parsing of nl-date forms
- filesize reduction using efrt data structure (254k -> 214k)
- fix for IE9
- weee! big change! npm package rename
- builds now using browserify + derequire()
- re-written term-lumper logic
- new nlp.lexicon({word:'POS'}) flow
- be consistent with
text.normal()
,term.all_forms()
,text.word_count()
.text.normal()
includes sentence-terminators, like periods etc.
- airport codes support, helper methods for specific POS
- newlines split sentences
- Text methods now return this, instead of array of sentences
- more-sensible responses for invalid, non-string inputs
- 14 PRs, with fixes for currencies, pluralization, conjugation
- Value.to_text() new method, fix "Posessive" POS typo
- return of the text.spot() method (Re:#107)
- more aggressive lumping of dates, like 'last week of february'
- whitespace reproduction in .text() methods
- move negate from sentence to verb & statement
- rename 'implicit' to 'expansion' for smarter contractions
- added readable-compression to adj, verbs (121kb -> 117kb)
- hyphenated words are normalized into spaces
- grammar-aware match & replace functions
- Statement & Question classes
- split ngram, locale, and syllables into plugins in seperate repo
- es6 classes, babel building
- better test coverage
- ngram uses term tokenization, so that 'Tony Hawk' us one term, and not two
- more organized pos rules
- Pos tagging is done implicitly now once nlp.Text is run
- Entity spotting is split into .people(), .place(), .organisations()
- unicode normalisation is killed
- opaque two-letter tags are gone
- plugin support
- passive tense detection
- lexicon can be augmented third-party
- date parsing results are different
- smarter handling of ambiguous contractions ("he's" -> ["he is", "he has"])
- added name genders and beginning of co-reference resolution ('Tony' -> 'he') API.
- small breaking change on
Noun.is_plural
andNoun.is_entity
, affording significant pos() speedup. Bumped Major version for these changes.
- Phrasal verbs ('step up'), firstnames and .people()
- Major file-size reduction through refactoring
- New NER choosing algorithm, better capitalisation logic, consolidated tests
- Sentence class methods, client-side demos