30 Jan 22:04

kbenoit

77c5596

CRAN v1.4.0

Bug fixes and stability enhancements

Fixed bug in dfm_compress() and dfm_group() that changed or deleted docvars attributes of dfm objects (#1506).
Fixed a bug in textplot_xray() that caused incorrect facet labels when a pattern contained multiple list elements or values (#1514).
kwic() now correctly returns the pattern associated with each match as the "keywords" attribute, for all pattern types (#1515)
Implemented some improvements in efficiency and computation of unusual edge cases for textstat_simil() and textstat_dist().

New features

textstat_lexdiv() now works on tokens objects, not just dfm objects. New methods of lexical diversity now include MATTR (the Moving-Average Type-Token Ratio, Covington & McFall 2010) and MSTTR (Mean Segmental Type-Token Ratio).
New function tokens_split() allows splitting single into multiple tokens based on a pattern match. (#1500)
New function tokens_chunk() allows splitting tokens into new documents of equally-sized "chunks". (#1520)
New function textstat_entropy() now computes entropy for a dfm across feature or document margins.
The documentation for textstat_readability() is vastly improved, now providing detailing all formulas and providing full references.
New function dfm_match() allows a user to specify the features in a dfm according to a fixed vector of feature names, including those of another dfm. Replaces dfm_select(x, pattern) where pattern was a dfm.
A new argument vertex_labelsize added to textplot_network() to allow more precise control of label sizes, either globally or individually.

Behaviour changes

tokens.tokens(x, remove_hyphens = TRUE) where x was generated with remove_hyphens = FALSE now behaves similarly to how the same tokens would be handled had this option been called on character input as tokens.character(x, remove_hyphens = TRUE). (#1498)

Assets 2

19 Nov 20:01

kbenoit

v1.3.14

11f07a3

CRAN v1.3.14

quanteda v.1.3.14

Bug fixes and stability enhancements

Improved the robustness of textstat_keyness() (#1482).
Improved the accuracy of sparsity reporting for the print method of a dfm (#1473).

New Features

Added the following measures to textstat_lexdiv(): Yule's K, Simpson's D, and Herdan's Vm.

Assets 2

01 Nov 21:25

kbenoit

v1.3.13

437bc06

CRAN v1.3.13

Bug fixes and stability enhancements

Fixed a bug causing incorrect counting in fcm(x, ordered = TRUE). (#1413) Also set the condition that window can be of size 1 (formerly the limit was 2 or greater).
Fixed deprecation warnings from adding a dfm as docvars, and this now inmports the feature names as docvar names automatically. (related to #1417)
Fixed behaviour from tokens(x, what = "fasterword", remove_separators = TRUE) so that it correctly splits words separated by \n and \t characters. (#1420)
Add error checking for functions taking dfm inputs in case a dfm has empty features (#1419).
For textstat_readability(), fixed a bug in Dale-Chall-based measures and in the Spache word list measure. These were caused by an incorrect lookup mechanism but also by limited implementation of the wordlists. The new wordlists include all of the variations called for in the original measures, but using fast fixed matching. (#1410)
Fixed problems with basic dfm operations (rowMeans(), rowSums(), colMeans(), colSums()) caused by not having access to the Matrix package methods. (#1428)
Fixed problem in textplot_scale1d() when input a predicted wordscores object with se.fit = TRUE (#1440).
Improved the stability of textplot_network(). (#1460)

New Features

Added new argument intermediate to textstat_readability(x, measure, intermediate = FALSE), which if TRUE returns intermediate quantities used in the computation of readability statistics. Useful for verification or direct use of the intermediate quantities.
Added a new separator argument to kwic() to allow a user to define which characters will be added between tokens returned from a keywords in context search. (#1449)
Reimplemented textstat_dist() and textstat_simil() in C++ for enhanced performance. (#1210)
Added a tokens_sample() function (#1478).

Behaviour changes

Removed the Hamming distance method from textstat_dist() (#1443), based on the reasoning in #1442.
Removed the "chisquared" and "chisquared2" distance measures from textstat_simil(). (#1442)

Assets 2

05 Oct 20:08

kbenoit

v1.3.10

2f4af96

(not accepted by CRAN 😞) v1.3.10

Prepared for and submitted to CRAN, and the version current with the publication of the JOSS article about quanteda.

Assets 2

05 Jun 18:43

kbenoit

v1.3.0

4ee90db

CRAN v1.3.0

New Features

Added to = "tripletlist" output type for convert(), to convert a dfm into a simple triplet list. (#1321)
Added tokens_tortl() and char_tortl() to add markers for right-to-left language tokens and character objects. (#1322)

Behaviour changes

Improved corpus.kwic() by adding new arguments split_context and extract_keyword.
dfm_remove(x, selection = anydfm) is now equivalent to dfm_remove(x, selection = featnames(anydfm)). (#1320)
Improved consistency of predict.textmodel_nb() returns, and added type = argument. (#1329)

Bug fixes

Fixed a bug in textmodel_affinity() that caused failure when the input dfm had been compiled with tolower = FALSE. (#1338)
Fixed a bug affecting tokens_lookup() and dfm_lookup() when nomatch is used. (#1347)
Fixed a problem whereby NA texts created a "document" (or tokens) containing "NA" (#1372)

Assets 2

16 Apr 12:23

kbenoit

v1.2.0

57e19ac

CRAN v1.2.0

New Features

Added an nsentence() method for spacyr parsed objects. (#1289)

Bug fixes and stability enhancements

Fix bug in nsyllable() that incorrectly handled cased words, and returned wrong names with use.names = TRUE. (#1282)
Fix the overwriting of summary.character() caused by previous import of the network package namespace. (#1285)
dfm_smooth() now correctly sets the smooth value in the dfm (#1274). Arithmetic operations on dfm objects are now much more consistent and do not drop attributes of the dfm, as sometimes happened with earlier versions.

Behaviour changes

tokens_toupper() and tokens_tolower() no longer remove unused token types. Solves #1278.
dfm_trim() now takes more options, and these are implemented more consistently. min_termfreq and max_termfreq have replaced min_count and max_count, and these can be modified using a termfreq_type argument. (Similar options are implemented for docfreq_type.) Solves #1253, #1254.
textstat_simil() and textstat_dist() now take valid dfm indexes for the relevant margin for the selection argument. Previously, this could also be a direct vector or matrix for comparison, but this is no longer allowed. Solves #1266.
Improved performance for dfm_group() (#1295).

Assets 2

08 Mar 10:19

kbenoit

v1.1.1

050c8d0

CRAN v1.1.1

Changed the default number of threads to 2.

Assets 2

06 Mar 15:21

kbenoit

v1.1.0

9866f3b

CRAN v1.1.0

New Features

Added as.dfm() methods for tm DocumentTermMatrix and TermDocumentMatrix objects. (#1222)
predict.textmodel_wordscores() nows includes an include_reftexts argument to exclude training texts from the predicted model object (#1229). The default behaviour is include_reftexts = TRUE, producing the same behaviour as existed before the introduction of this argument. This allows rescaling based on the reference documents (since rescaling requires prediction on the reference documents) but provides an easy way to exclude the reference documents from the predicted quantities.
textplot_wordcloud() now uses code entirely internal to quanteda, instead of using the wordcloud package.

Bug fixes and stability enhancements

Eliminated unnecessary dependency on the digest package.
Updated the vignette title to be less generic.
Improved the robustness of dfm_trim() and dfm_weight() for previously weighted dfm objects and when supplied thresholds are proportions instead of counts. (#1237)
Fixed a problem in summary.corpus(x, n = 101) when ndoc(x) > 100 (#1242).
Fixed a problem in predict.textmodel_wordscores(x, rescaling = "mv") that always reset the reference values for rescaling to the first and second documents (#1251).
Issues in the color generation and labels for textplot_keyness() are now resolved (#1233, #1233).

Performance improvements

textmodel methods are now exported, to facilitate extension packages for other textmodel methods (e.g. wordshoal).

Behaviour changes

Changed the default in textmodel_wordfish() to sparse = FALSE, in response to #1216.
dfm_group() now preserves docvars that are constant for the group aggregation (#1228).

Assets 2

29 Jan 09:31

kbenoit

v1.0.0

d071cc8

CRAN v1.0.0

New Features

Added vertex_labelfont to textplot_network().
Added textmodel_lsa() for Latent Semantic Analysis models.
Added textmodel_affinity() for the Perry and Benoit (2017) class affinity scaling model.
Added Chinese stopwords.
Added a pkgdown vignette for applications in the Chinese language.
Added textplot_network() function.
The stopwords() function and the associated internal data object data_char_stopwords have been removed from quanteda, and replaced by equivalent functionality in the stopwords package.
Added tokens_subset(), now consistent with other *_subset() functions (#1149).

Bug fixes and stability enhancements

Performance has been improved for fcm() and for textmodel_wordfish().
dfm() now correctly passes through all ... arguments to tokens(). (#1121)
All dfm_*() functions now work correctly with empty dfm objects. (#1133)
Fixed a bug in dfm_weight() for named weight vectors (#1150)
Fixed a bug preventing textplot_influence() from working (#1116).

Behaviour Changes

The convenience wrappers to convert() are simplified and no longer exported. To convert a dfm, convert() is now the only official function.
nfeat() replaces nfeature(), which is now deprecated. (#1134)
textmodel_wordshoal() has been removed, and relocated to a new package (wordshoal).
The generic wrapper function textmodel(), which used to be a gateway to specific textmodel_*() functions, has been removed.
(Most of) the textmodel_*() have been reimplemented to make their behaviour consistent with the lm/glm() families of models, including especially how the predict, summary, and coef methods work (#1007, #108).
The GitHub home for the repository has been moved to https://github.com/quanteda/quanteda.

Assets 2

13 Nov 11:23

kbenoit

v0.99.22

e890ff0

CRAN v0.99.22

New Features

tokens_select() has a new window argument, permitting selection within an asymmetric window around the pattern of selection. (#521)
tokens_replace() now allows token types to be substituted directly and quickly.
Added a spacy_parse method for corpus objects. Also restored quanteda methods for spacyr spacy_parsed objects.

Bug fixes and stability enhancements

Improved documentation for textmodel_nb() (#1010), and made output quantities from the fitted NB model regular matrix objects instead of Matrix classes.

Behaviour Changes

All of the deprecated functions are now removed. (#991)
tokens_group() is now significantly faster.
The deprecated "list of characters" tokenize() function and all methods associated with the tokenizedTexts object types have been removed.
Added convenience functions for keeping tokens or features: tokens_keep(), dfm_keep(), and fcm_keep(). (#1037)
textmodel_NB() has been replaced by textmodel_nb().

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes and stability enhancements

New features

Behaviour changes

quanteda v.1.3.14

Bug fixes and stability enhancements

New Features

Bug fixes and stability enhancements

New Features

Behaviour changes

New Features

Behaviour changes

Bug fixes

New Features

Bug fixes and stability enhancements

Behaviour changes

New Features

Bug fixes and stability enhancements

Performance improvements

Behaviour changes

New Features

Bug fixes and stability enhancements

Behaviour Changes

New Features

Bug fixes and stability enhancements

Behaviour Changes

Releases: quanteda/quanteda

CRAN v1.4.0

Bug fixes and stability enhancements

New features

Behaviour changes

CRAN v1.3.14

quanteda v.1.3.14

Bug fixes and stability enhancements

New Features

CRAN v1.3.13

Bug fixes and stability enhancements

New Features

Behaviour changes

(not accepted by CRAN 😞) v1.3.10

CRAN v1.3.0

New Features

Behaviour changes

Bug fixes

CRAN v1.2.0

New Features

Bug fixes and stability enhancements

Behaviour changes

CRAN v1.1.1

CRAN v1.1.0

New Features

Bug fixes and stability enhancements

Performance improvements

Behaviour changes

CRAN v1.0.0

New Features

Bug fixes and stability enhancements

Behaviour Changes

CRAN v0.99.22

New Features

Bug fixes and stability enhancements

Behaviour Changes