The package now offers a simplified and seamless workflow for dictionary-based sentiment analysis: The weigh()-method has been implemented for the classes count and count_bundle. Via inheritance, it will also be available for the partition- and partition_bundle-classes. Then, a new summary()-method for partition-class objects is introduced. If the object has been weighed, the list that is returned will include a report on weights. There is an example that explains the workflow.
The partition_bundle-method for context-objects has been reworked entirely (and is working again);
a new partition-method for context-objects has been introduced. Buth steps are intended for workflows for dictionary-based sentiment analysis.
The highlight()-method is now implemented for class kwic. You can highlight words in the neighborhood of a node that are part of a dictionaty.
A new knit_print()-method for textstat- and kwic-objects offers a seamless inclusion of analyses in Rmarkdown documents.
A coerce()-method to turn a kwic-object into a htmlwidget has been singled out from the show()-method for kwic-objects. Now it is possible to generate a htmlwidget from a kwic object, and to include the widget into a Rmarkdown document.
A new coerce()-method to turn textstat-objects into an htmlwidget (DataTable), very useful for Rmarkdown documents such as slides.
A new argument height for the html()-method will allow to define a scroll box. Useful to embed a fulltext output to a Rmarkdown document.
The partition_bundle-class, rather than inheriting from bundle-class directly, will now inherit from the count_bundle-class
The use()-function is limited now to activating the corpus in data packages. Having introduced the session registry, switching registry directories is not needed any more.
The as.regions()-function has been turned into a as.regions()-method to have a more generic tool.
Some refactoring of the context-method, so that full use of data.table speeds up things.
The highlight()-method allows definitions of terms to be highlighted to be passed in via three dots (...);
no explicit list necessary.
A new as.character()-method for kwic-class objects is introduced.
The size_coi-slot (coi for corpus of interest) of the context-object included the node; the node (i.e. matches for queries) is excluded now from the count of size_coi.
When calling use(), the registry directory is reset for CQP, so that the corpora in the package that have been activated can be used with CQP syntax.
The script configure.win has been removed so that installation works on Windows without an installation of Rtools.
Bug removed from s_attributes()-method for partition-objects: "fast track" was activated without preconditions.
Bug removed that would swallow metadata/s-attributes to be displayed in kwic-output after highlighting.
As a matter of consistency, the argument meta has been renamed to s_attributes for the kwic()-method for context-objects, and for the enrich()-method for kwic-objects.
To avoid confusion (with argument s_attributes), the argument s_attribute to check for integrity within
a struc has been renamed into boundary.
Documentation for kwic-objects has been reworked thoroughly.
new as.list,bundle-method for convenience, to access slot objects
as.bundle is more generic now, so that any kind of object can be coerced to a bundle now
as.speeches-method turned into function that allows partition and corpus as input
sAttributes,partition-method in line with RcppCWB requirements (no negative values of strucs)
count repaired for muliple p-attributes
bug removed causing a crash for as.markdown-method when cutoff is larger than number of tokens
polmineR will now work with a temporary registry in the temporary session directory
a (new) registry_move() function is used to copy files to the tmp registry
the (new) registry() function will get the temporary registry directory
the use() function will add the registry file of a package to the tmp registry
a bug removed that has prevented the name<- method to work properly for bundle objects
new partition_bundle,partition_bundle-method introduced
naming of methods and functions, classes and most arguments moved to snake_case, maintaining backwards compatibility
utility function getObjects not exported any more
for count,partition_bundle-method, column 'partition' will be a character vector now (not factor)
new argument 'type' added to partition_bundle
new method 'get_type' introduced to make getting corpus type more robust
bug removed that has caused a crash when cutoff is larger than number of tokens in a partition when calling get_token_stream
count-method will now return count-object if query is NULL, making it easier to write pipes
upon loading the package, check that data directories are set correctly in registry files to make sure that sample data in pre-compiled packages can be used
sample corpus GermaParlMini added to the package (replacing suggested package polmineR.sampleCorpus)
configuration mechanism added to set path to data directory in registry file upon installation
class hits now inherits from class 'textstat', exposing a set of generic functions (such as dim, nrow etc.); slot 'dt' changed to 'stat' for this purpose
count,partitionBundle and hits,partitionBundle: cqp parameter added
RegistryFile class replaced by a set of leightweight-functions (corpus_...)
encode-method moved to cwbtools package
getTerms,character-method and terms,partition-method merged
examples using EUROPARL corpus have been replaced by REUTERS corpus (including vignette)
param id2str has been renamed to decode in all functions to avoid unwanted behavior
robust indexing of bundle objects for subsetting
optional settings have been cleaned
reliance on cwb command line tools removed
encoding issue with names of partitionBundle solved
functionality of matches-method (breakdown of frequencies of matches) integrated
into count-method (new param breakdown)
corpus REUTERS included (as data for testsuite)
adjust data directory of REUTERS corpus upon loading package
a pkgdown-generated website is included in the docs directory
consistent use of .message helper function to make shiny app work
bug removed for count-method when options("polmineR.cwb-lexdecode") is TRUE and options("polmineR.Rcpp") is FALSE
if CORPUS_REGISTRY is not defined, the registry directory in the package will
be used, making REUTERS corpus available
getSettings-function removed, was not sufficiently useful, and was superseded by
new class 'count' introduced to organize results from count operations
at startup, default template is assigned for corpora without explicitly defined templates to make read() work in a basic fashion
new cpos,hits-method to support highlight method
tooltips-method to reorder functionality of html/highlight/tooltip-methods
param charoffset added to html-method
coerce-method from partition to json and vice versa, potentially useful for storing partitions
sAttributes2cpos to work properly with nested xml
partition,partition-method reworked to work properly with nested XML
encoding of return value of sAttributes will be locale
references added to methods count, kwic, cooccurrences, features.
as.DocumentTermMatrix,character-method reworked to allow for subsetting and divergence of
strucs and struc_str
html,partition-method has new option beautify, to remove whitespace before interpunctuation
output error removed in html,partition-method (that misinterprets `` as code block)
the class Corpus now has a slot sAttribute to keep/manage a data.table with corpus positions and struc values, and there is a new partition,Corpus-method. In compbination, it will be a lot faster to derive a partition, particularly if you need to do that repeatedly
a new function install.cwb() provides a convenient way to install CWB in the package
added a missing encoding conversion for the count method
class 'Regions' renamed to class 'regions' as a matter of consistency
data type of slot cpos of class 'regions' is a matrix now
rework and improved documentation for decode- and encode-methods
new functions copy.corpus and rename.corpus
as.DocumentTermMatrix-method checks for strucs with value -1
improved as.speeches-method: reordering of speeches, default values
blapply-method: verbose output will be suppressed of progress is TRUE
applying stoplists and positivelists working again for context-method
matches-method to learn about matches for CQP queries replacing frequencies-method
Rework of enrich-method, including documentation.
param 'neighbor' dropped from kwic,context-method; params positivelist and negativelist offer equivalent functionality
highlight-method for (newly exported) kwic-method (for validation purposes)
performance improvement for partitionBundle,character-method
a new Labels class and label method for generating test data
bug removed for partitionBundle,character-class, and performance improved
Improved explanation of the installation procedure for Mac in the package vignette
for context-method: param sAttribute working again to check boundaries of match regions
sample-method for objects of class kwic and context
kwic, cpos, and context method will accept queries of length > 1
use-function and resetRegistry-function reworked
more explicit startup message to get info about version, registry and interface
encoding issues solved for size-method, hits-method and dispersion-method
use-function will now work for users working with polmineR.Rcpp as interface
new installed.corpora() convenience function to list all data packages with corpora
view-method and show-method for cooccurrences-objects now successfully redirect
output to RStudio viewer
data.table-style indexing of objects inheriting from textstat-class
for windows compatibility, as.corpusEnc/as.nativeEnc for encoding conversion
performance gain for size-method by using polmineR.Rcpp
dissect-method dropped (replaced by size)
improved documentation of size-method
labels for cooccurrences-output
cooccurrencesBundle-class and cooccurrence-method for bundle restored
as.data.table for cooccurrencesBundle-class
count-method for whole corpus for pAttribute > 1
functionality of meta-method merged into sAttributes-method (meta-method dropped)
speed improvements for generating html output for reading
previously unexported highlight-method now exported, and more robust than before (using xml2)
progress bars for multicore operations now generated by pbapply package
starting to use testthat for unit testing
updated documentation of partition-method.
documentation of hits-method improved
use-methode: default value for pkg ist NULL (return to default registry), function more robust
Rework for parsing the registry
rework of templates, are part of options now (see ?setTemplate, ?getTemplate)
experimental use of polmineR.Rcpp-package for fast counts for whole corpus
new convenience function install.corpus to install CWB corpus wrapped into R data package
adjustments to make package compatible with polmineR.shiny
cpos-method to get hits more robust if there are not matches for string
hits-method removes NAs
compare-method renamed to features-method
warnings caused by startup on windows removed
size-method now allows for a param 'sAttribute'
hits-method reworked, allows for names query vectors
first version that can be installed on windows
rcqp package moved to suggests, to facilitate installation
more generic implementation of as.markdown-method to prepare use of templates
LICENSE file updated
getTokenStream,character-method: new default behavior for params left and right
use of templates for as.markdown-method
Regions and TokenStream class (not for frontend use, so far)
getTermFrequencies-method merged into count-method
Corpus class introduced
decode- and encode-methods introduced
refactoring of context-method to prepare more consistent usage
progress bar for context-method (using blapply)
progress bar for partitionBundle (using blapply)
more coherent naming of parameters in partitionBundle-method
partitionBundle,character-method debugged and more robust
usage of blapply in as.speeches-method
hits-method: paramter cqp defaults to FALSE for hits-method, size defaults to FALSE
new parameter cqp for dispersion-method
aggregation for dispersion-method when length(sAttribute) == 1
bugfix for ngrams-method, sample code for the method
configure file removed to avoid unwanted bugs
this is the first version that passes all CRAN tests and that is available via CRAN
the 'rcqp' remains the interface to the CWB, but usage of rcqp functions is wrapped into an new new CQI.rcqp (R6) class. CQI.perl and CQI.cqpserver are introduced as alternative interfaces to prepare portability to Windows systems
code in the vignette and method examples will be executed conditionally, if rcqp and the polmineR.sampleCorpus are available