Skip to content

Commit

Permalink
version 1.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
jonthegeek authored and cran-robot committed Mar 3, 2022
1 parent 87e85f3 commit c8ad573
Show file tree
Hide file tree
Showing 6 changed files with 25 additions and 20 deletions.
10 changes: 5 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: piecemaker
Title: Tools for Preparing Text for Tokenizers
Version: 1.0.0
Version: 1.0.1
Authors@R: c(
person(given = "Jon",
family = "Harmon",
Expand All @@ -20,18 +20,18 @@ Description: Tokenizers break text into pieces that are more usable by machine
provides those shared steps, along with a simple tokenizer.
License: Apache License (>= 2)
Encoding: UTF-8
RoxygenNote: 7.1.1
RoxygenNote: 7.1.2
URL: https://github.com/macmillancontentscience/piecemaker
BugReports: https://github.com/macmillancontentscience/piecemaker/issues
Suggests: testthat (>= 3.0.0)
Config/testthat/edition: 3
Imports: purrr, rlang (>= 0.4.2), stringi, stringr
Imports: rlang (>= 0.4.2), stringi, stringr
Depends: R (>= 2.10)
NeedsCompilation: no
Packaged: 2021-08-05 22:01:09 UTC; jonth
Packaged: 2022-03-03 14:07:56 UTC; jonth
Author: Jon Harmon [aut, cre] (<https://orcid.org/0000-0003-4781-4346>),
Jonathan Bratt [aut] (<https://orcid.org/0000-0003-2859-0076>),
Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
Maintainer: Jon Harmon <jonthegeek@gmail.com>
Repository: CRAN
Date/Publication: 2021-08-06 17:50:06 UTC
Date/Publication: 2022-03-03 15:50:06 UTC
10 changes: 5 additions & 5 deletions MD5
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
c767b88edf1431c320ec308ea028608a *DESCRIPTION
59bb7c51f35c4f5f909f31c120778ca5 *DESCRIPTION
e40f9d15973f27dbd85a5b7ca13062bc *NAMESPACE
85e3bf16169091c7126a39899d22a92d *NEWS.md
abdfe1a5b353519ce58aff60629ce4a5 *R/clean.R
43c8031e43e1d30c6aea95555d0b037d *NEWS.md
a5522e64e6dba23da3330330350b9f0f *R/clean.R
e8d211b2576bb8953aacca4f2e7ec7b6 *R/space.R
006f1b514ff3af4d7eb082ac04fcdf9d *R/sysdata.rda
90253c7c4d2b4c4f737c6911c9b7529a *R/tokenize.R
dc7be9461a60c95c57c324edbd9128f0 *README.md
676f3051082f94d86c8d113210375891 *man/dot-coerce_to_utf8.Rd
4a53ef5c81afd882764ca44014b3d7aa *README.md
47b6424eda6652547190bc6b29e44327 *man/dot-coerce_to_utf8.Rd
defea65ac9f9add858e4ec26d36bf1ba *man/dot-make_unicode_block_regex.Rd
a15d6972ad34dad2855d96f4333a91f6 *man/dot-space_regex_selector.Rd
d92843f3e86ec6583db3d96b020d5b2e *man/prepare_and_tokenize.Rd
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# piecemaker 1.0.1

* Removed purrr dependency.

# piecemaker 1.0.0

* Added a `NEWS.md` file to track changes to the package.
Expand Down
10 changes: 6 additions & 4 deletions R/clean.R
Original file line number Diff line number Diff line change
Expand Up @@ -43,17 +43,19 @@ validate_utf8 <- function(text) {
Encoding(text[in_encoding_status]) <- "UTF-8"

# Now try to coerce the leftovers to UTF-8.
text[!in_encoding_status] <- purrr::map_chr(
text[!in_encoding_status],
.coerce_to_utf8
text[!in_encoding_status] <- vapply(
X = text[!in_encoding_status],
FUN = .coerce_to_utf8,
FUN.VALUE = character(1),
USE.NAMES = FALSE
)

return(text)
}

#' Coerce to UTF8
#'
#' @param this_text Character scalar; a piece of text to attemp to coerce.
#' @param this_text Character scalar; a piece of text to attempt to coerce.
#'
#' @return The text as UTF8.
#' @keywords internal
Expand Down
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,18 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h

Tokenizers break text into pieces that are more usable by machine
learning models. While writing
[wordpiece](https://github.com/jonathanbratt/wordpiece) and
[wordpiece](https://github.com/macmillancontentscience/wordpiece) and
[morphemepiece](https://github.com/macmillancontentscience/morphemepiece),
we found that many steps were shared between those package. This package
provides those shared steps.
we found that many steps were shared between those packages. This
package provides those shared steps.

## Installation

You can install the released version of piecemaker from
[CRAN](https://CRAN.R-project.org) with:

``` r
# Not yet.
#install.packages("piecemaker")
install.packages("piecemaker")
```

And the development version from [GitHub](https://github.com/) with:
Expand Down
2 changes: 1 addition & 1 deletion man/dot-coerce_to_utf8.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit c8ad573

Please sign in to comment.