Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project X #20

Open
jwijffels opened this issue Oct 28, 2018 · 6 comments
Open

Project X #20

jwijffels opened this issue Oct 28, 2018 · 6 comments

Comments

@jwijffels
Copy link

jwijffels commented Oct 28, 2018

Project X will be about showing the wonders of what x is, what is should be, it's definitions and it's usage. Project X will wonder in documentation space of R functions.
I'm an R user, all functions arguments in R are documented. Many of them are are called x. Example below

plot(x, y, ...)
Arguments

x
the coordinates of points in the plot. Alternatively, a single plotting structure, function or any R object with a plot method can be provided.

print(x, ...)
Arguments

x
an object used to select a method.

So x can be a lot.

For this submission to NaNoGenMo, I'll get all textual documentation of all arguments of all R functions which are called x and I'll generate new possible values of x. How will these be generated?

  • Either based on my R package ruimtehol (https://github.com/bnosac/ruimtehol) by building content based recommendation suggestions for x
  • Using the examples of the R functions. Each R function also has examples. These examples will be used by extracting the argument names which are used in the function calls and the documentation of these argument names will be used to construct new corpora of documentation sequences.
@jwijffels
Copy link
Author

jwijffels commented Dec 2, 2018

Project X has finished. It has wondered in embeddings space of documentation of R objects called x.
The R code generating the text is put below. The text will be put into a next comment.

library(tools)
library(data.table)
library(ruimtehol)
library(udpipe)
library(digest)
library(Rfiglet)
## GET DATA
x <- installed.packages() 
x <- rownames(x)
x <- lapply(x, FUN=function(pkg){
  x <- Rd_db(pkg)
  x <- lapply(x, FUN=function(x){
    x <- tools:::.Rd_get_metadata(x, "arguments") 
    x <- grep('\"x\"', x, value=TRUE)
    if(length(x) == 0){
      return(NULL)
    }
    x <- lapply(x, FUN=function(x) unlist(eval(parse(text = x))))
    x <- x[sapply(x, FUN=function(x) grepl("x", head(x, 1)))]
    x <- sapply(x, FUN=function(x) paste(x, collapse = " "))
    x <- gsub("[^[:alnum:]]", " ", x)
    x <- gsub(" +", " ", x)
    x
  })
  if(length(x) == 0){
    return(NULL)
  }
  x <- x[sapply(x, FUN=function(x) length(x) > 0)]
  x <- Map(rd = names(x), x, f=function(rd, x){
    data.frame(documentationfile = rd, text = x, stringsAsFactors = FALSE)
  })
  x <- rbindlist(x)
  x <- setDF(x)
  x$rpackage <- rep(pkg, nrow(x))
  x
})
x <- rbindlist(x, fill = TRUE)
x$text <- trimws(x$text)
x <- setnames(x, old = c("rpackage", "documentationfile", "text"), new = c("doc_id", "sentence_id", "text"))
x_train <- x[, list(token = tolower(unlist(strsplit(text, " ")))), by = list(doc_id, sentence_id)]
x_train <- setDF(x_train)
save(x, x_train, file = "x.RData")
load("x.RData")

## BUILD TEXT
model <- embed_sentencespace(x = x_train, early_stopping = 0.9, dim = 50, 
                             loss = "hinge", lr = 0.10, ngrams = 3, minCount = 2,
                             similarity = "cosine", adagrad = TRUE, epoch = 25,
                             maxTrainTime = 10, negSearchLimit = 3, validationPatience = 25)

nwords <- function(x) length(unlist(strsplit(x, split = " ")))
X <- sprintf("Project x %s", "A Workbook object")
Z <- X
N <- nwords(X)
while(N < 50000){
  Y <- predict(model, Z, basedoc = txt_sample(unique(x$text), n = 500), k = 50)
  Z <- sample(Y$prediction$label_starspace, size = 1, prob = Y$prediction$similarity)
  Z <- sprintf("Project %s", Z)
  X <- sprintf("%s\n%s", X, Z)
  N <- N + nwords(Z)
}
cat(X, sep = "\n", file = "X.txt")
digest(X, algo = "xxhash64")
as.numeric(starspace_embedding(model, X))

## ASCIIFY OUTPUT
set.seed(123)
X <- readLines("X.txt")
f <- file("X_figlet.txt", open = "at")
for(i in 1:length(X)){
  print(sprintf("%s/%s", i, length(X)))
  x <- X[i]
  writeLines(sprintf("\n%s", x), con = f, sep = "\n")
  x <- Rfiglet::figlet(x, font = sample(Rfiglet:::.__Rfiglet_fonts, 1))
  writeLines(paste(paste(x, collapse = "\n"), sep = "\n"), con = f, sep = "\n")
}
close(f)

@jwijffels
Copy link
Author

And the result:

Project x A Workbook object
Project x a vector a factor a matrix or a data frame
Project x An integer representing the age for which the T x value shall be returned
Project x object having a proj4string method or if y is missing list with objects that have a proj4string method
Project x Input
Project x The fitted forest from the result of madlib randomForest
Project x Output of mantel correlog
Project x An object of the class solist essentially a list of two dimensional spatial datasets
Project x x location of the bar can be left missing
Project x The vector like object to be modified
Project x Unweighted variable
Project xrange Optional range of values of x that should be mapped to the new interval
Project x A matrix with same number of rows as in y columns coding the levels of sampling hierarchy The number of groups within the hierarchy must decrease from left to right If x is missing function performs an overall decomposition into alpha beta and gamma diversities
Project x Vector of prediction location eastings x coordinates
Project x default is FALSE Set to TRUE to return the expanded design matrix as element x without intercept indicators of the returned fit object
Project x an integer value 1 for the left side of the plot 1 for the right side
Project x ANY Vector list or other type supported by split
Project x An object of class stft as obtained by the function stft
Project x a vector of any mode including a list or a factor or for rep only a POSIXct or POSIXlt or Date object or an S4 object containing such an object
Project x a contingency table in array form with optional category labels specified in the dimnames x attribute The table is best created by the table command
Project x an object of appropriate class for the default method an integer vector with k different integer cluster codes or a list with such an x clustering component Note that silhouette statistics are only defined if 2 le k le n 1 2 k n 1
Project x a numeric vector or matrix or a data frame with all numeric data
Project x The first sample for qqplot
Project x A single predictor or a matrix of predictors Categorical predictors are required to be coded as integers as factor does internally For predict x is a data matrix with the same integer codes that were originally used for categorical variables
Project x The data frame in the required format
Project x character User provided argument to match
Project x Table with population size versus time as computed by extract popsize
Project newdata linear predictors x times conf int see survest cph One of newdata linear predictors x must be given linear predictors includes the intercept If times is omitted predictions are made at 200 equally spaced points between 0 and the maximum failure censoring time used to fit the model x can also be a result from survest psm
Project x An H2O frame that wraps a single string column
Project x character vector the values to be matched or the values to be ordered or grouped
Project x object to test for class membership
Project x An object of class geeglm such as from geeglm
Project x a rules or itemsets object
Project x y numeric vectors of coordinates where the text labels should be written If the length of x and y differs the shorter one is recycled
Project x The syllable sum object
Project x A data frame in which each successive column represents subcategories of the previous column
Project rand x NULL or the random number generator for the x slot a function such that rand x n generates a numeric vector of length n Typical examples are rand x rnorm or rand x runif the default is nice for didactical purposes
Project x Object to create test or print
Project x A vector of four elements or a two by two matrix or in the case of YuleBonett or YuleCor this can also be a data matrix
Project x variable to be aggregated
Project minmax if cuts is specified but min x min cuts or max x max cuts augments cuts to include min and max x
Project x A list
Project x plot object
Project x input data can be one of the following r x 2 table vector of numbers from a contigency table will be transformed into r x 2 table in row wise order or single factor or character vector that will be combined with y into a table
Project x For s3saveRDS a single R object to be saved via saveRDS and uploaded to S3 x is analogous to the object argument in saveRDS
Project x A piece of HTML code for tables usually generated by kable and kableExtra
Project x Output from the bestglm function
Project x1 Vector of x coordinates of second endpoint of each segment
Project x An object of class fv containing the variables to be plotted or variables from which the plotting coordinates can be computed
Project x A single categorical column
Project x An object of class merMod such as those from lmer glmer or nlmer
Project x Data points on x axis measured in current data coordinate
Project x either a tk2widget object or a character string with its class name
Project x an object of class betadisper the result of a call to betadisper
Project x A list containing as many numeric vectors as there are sets The first vector contains the counts or percentages of the elements that are only in one set the next vector contains the counts or percentages of elements that are in two sets and so on A matrix of set membership indicators or a two column matrix of object identifiers and attribute identifiers can be passed see Details
Project x an interval difftime or numeric object
Project x The fitted tree from the result of madlib rpart
Project x the object to be forced
Project x x variable
Project x table
Project x a ff vector containing values to be differenced
Project x a vector a factor a matrix or a data frame
Project x a matrix containing the data to be ranked or the kernel matrix of data to be ranked or a list of character vectors
Project x A character vector or a factor
Project x X coordinates
Project x the stack of independent network variables Note that NA s are permitted as is dichotomous data
Project x A vector of numbers
Project x numeric matrix of data or an object that can be coerced to such a matrix such as a numeric vector or a data frame with all numeric columns
Project xtype a vector of one letter character codes specifying how each predictor is to be modeled in order of columns of x The codes are s for smooth function using restricted cubic splines l for no transformation linear or c for categorical to cause expansion into dummy variables Default is s if nk 0 and l if nk 0
Project x A single categorical column
Project useExpScaled Numeric geq 0 0 The smallest value of x x for which the ratio is calculated using the exponentially scaled Bessel function values
Project x A list containing the names or indices of the variables to encode A target encoding column will be created for each element in the list Items in the list can be multiple columns For example if x list c A c B C then the resulting frame will have a target encoding column for A and a target encoding column for B C in this case we group by two columns
Project xlab ylab each a character string giving the labels for the x and y axis Default to the call names of x or y or to if these were unspecified
Project x An object to produce english ordinal output
Project x Any orderable vector i e those with relevant methods for such as numeric character Date etc in case of between and a numeric vector in case of inrange
Project x An expression to unquote
Project x the object to be forced
Project x An object to produce english ordinal output
Project x an R object with the data to be clustered
Project x a plm object representing a panel object
Project x a base64 character string
Project x A list of one or more earth objects or a single earth object This is the only required argument This argument is called x for consistency with the generic plot
Project x a formal argument of the enclosing function
Project x Depending on the function x may be a matrix as returned by the cor function or a data frame with items e g from a test or questionnaire
Project x a one dimensional table
Project x An object of the class listof Essentially a list of objects
Project xbreaks Numeric vector giving the x coordinates of the boundaries of the rectangular quadrats Incompatible with nx
Project x An object of the class solist essentially a list of two dimensional spatial datasets
Project x an R object representing a hierarchical clustering For the default method an object of class hclust or with a method for as hclust such as agnes in package href https CRAN R project org package 1 pkg 1 cluster https CRAN R project org package cluster cluster
Project xrange Optional range of values of x that should be mapped to the new interval
Project x A n x m matrix or a list of such matrices
Project x The cumulative combo syllable sum object
Project n x a single number representing the sample size for x
Project x for print plot text is the result of anova
Project x a rules or itemsets object
Project x a data matrix If more than 2 columns are provided then the data is plotted using the first two principal components
Project x a dimension to be interpreted as a number of centimetres
Project maxdt required Date time object in standard format that will form the upper boundary of the hour or half hour time categories maxdt must greater than or equal to the minimum value in x and must be rounded off to the nearst hour for hour categories e g HH 00 00 or rounded off to the nearest half hour for half hour categories e g HH 30 00
Project x the first row to form the Toeplitz matrix
Project x input data can be one of the following r x 2 table vector of numbers from a contigency table will be transformed into r x 2 table in row wise order or single factor or character vector that will be combined with y into a table
Project x set to TRUE to store the design matrix with the fit For print is an Rq object
Project x A set of continuous variables may be missing or if p and d are missing the variables to be analyzed
Project x set to TRUE to store the design matrix with the fit For print is an Rq object
Project x the object to be forced
Project x A DocumentTermMatrix or TermDocumentMatrix
Project x numeric vector NA s and Inf s are allowed but will be removed
Project x The object to be printed
Project x A character vector of stems to be completed
Project x The data matrix were columns correspond to the variables and rows to observations
Project x an object where samples are in rows and features are in columns This could be a simple matrix data frame or other type e g sparse matrix See Details below
Project x univariate data set
Project x numeric vector with the data to be summed squared
Project rand x NULL or the random number generator for the x slot a function such that rand x n generates a numeric vector of length n Typical examples are rand x rnorm or rand x runif the default is nice for didactical purposes
Project x a histogram object or a list with components density mid etc see hist for information about the components of x
Project xbreaks Optional Numeric vector giving the x coordinates of the boundaries of the quadrats Incompatible with nx
Project x An object of class fv containing the variables to be plotted or variables from which the plotting coordinates can be computed
Project x Data points on x axis
Project x Object
Project xlab label for the horizontal axis defaults to the name of the variable x
Project x x variable
Project x The matrix data frame with the data For the print and knit print it takes a string of the class htmlTable as x argument
Project xtype a vector of one letter character codes specifying how each predictor is to be modeled in order of columns of x The codes are s for smooth function using restricted cubic splines l for no transformation linear or c for categorical to cause expansion into dummy variables Default is s if nk 0 and l if nk 0
Project x A matrix with same number of rows as in y columns coding the levels of sampling hierarchy The number of groups within the hierarchy must decrease from left to right If x is missing function performs an overall decomposition into alpha beta and gamma diversities
Project x a vector a factor a matrix or a data frame
Project x Numeric vector of x coordinates of any points
Project x RasterBrick or RasterStack
Project x object to test for class membership
Project x An object to test for geometry inheritance
Project x the coordinates to look for Numeric if so their meaning is defined by the input argument or one of all all the points of the ROC curve local maximas the local maximas of the ROC curve or best the point with the best sum of sensitivity and specificity
Project x An object that inherits from the partial class
Project x An integer or vector of integers if not integer the fractional part will be ignored
Project x A function raw expression or formula to interpolate
Project x hare object typically the result of hare
Project x x variable
Project x a ff vector containing values to be differenced
Project x list or data frame List or data frame to check for compliance with at least one of rules See details of qtest for rule explanation
Project x an object where samples are in rows and features are in columns This could be a simple matrix data frame or other type e g sparse matrix See Details below
Project x a matrix or a data frame The function will pass all argument to chordDiagramFromMatrix or chordDiagramFromDataFrame depending on the type of x also format of other arguments depends of the type of x If it is in the form of a matrix it should be an adjacency matrix If it is in the form of a data frame it should be an adjacency list
Project x a list of dimRedResult objects The names of the list will appear in the legend with the AUC lnK value
Project x an R object representing a hierarchical clustering For the default method an object of class hclust or with a method for as hclust such as agnes in package href https CRAN R project org package 1 pkg 1 cluster https CRAN R project org package cluster cluster
Project x The signature of this method When it is of type character it should be a file name When it is of type data frame it is the data frame that already exists in the current R session When it is of type db Rquery it represents a series of operations on a existing db data frame object See db Rquery for more For as db Rview x must be a db Rquery object
Project x a numeric vector for which each value will be a sector
Project x For the default method x is an object where samples are in rows and features are in columns This could be a simple matrix data frame or other type e g sparse matrix but must have column names see Details below Preprocessing using the preProcess argument only supports matrices or data frames When using the recipe method x should be an unprepared recipe object that describes the model terms i e outcome predictors etc as well as any pre processing that should be done to the data This is an alternative approach to specifying the model Note that when using the recipe method any arguments passed to preProcess will be ignored See the links and example below for more details using recipes
Project x the coordinates of points in the plot Alternatively a single plotting structure function or any object with a plot method can be provided
Project x a matrix or a data frame The function will pass all argument to chordDiagramFromMatrix or chordDiagramFromDataFrame depending on the type of x also format of other arguments depends of the type of x If it is in the form of a matrix it should be an adjacency matrix If it is in the form of a data frame it should be an adjacency list
Project x the first variable to be plotted
Project x A set of external variables to correlate with the phase angles
Project x the result of validate rpart
Project x Data to be plotted A numeric vector containing angles or a histogram object containing a histogram of angular values or a density object containing a smooth density estimate for angular data or an fv object giving a function of an angular argument
Project x The integer variable
Project x character filename see Details Raster object missing array SpatialGrid SpatialPixels Extent or list of Raster objects Supported file types are the native raster package format and those that can be read via rgdal see readGDAL and NetCDF files see details
Project x Plotted horizontal coordinates
Project x set to TRUE to store the design matrix with the fit For print is an Rq object
Project x A matrix with same number of rows as in y columns coding the levels of sampling hierarchy The number of groups within the hierarchy must decrease from left to right If x is missing function performs an overall decomposition into alpha beta and gamma diversities
Project xbreaks Numeric vector giving the x coordinates of the boundaries of the quadrats Incompatible with nx
Project x the stack of independent network variables Note that NA s are permitted as is dichotomous data
Project x A vector of numbers
Project x any Single value to check For a parameter set this must be a list If the list is unnamed not recommended it must be in the same order as the param set If it is named its names must match the parameter names in the param set
Project x Data to be rescaled
Project x Matrix n x p of complete covariates Only numeric variables are permitted for usage of this function
Project prefix prefix string to use A vector can be used to specify a prefix for each dimension of x Names are build as prefix sep index
Project x y numeric vectors of same length supposedly from a model y f x For D1tr x can have length one and then gets the meaning of h Delta x
Project medoids x logical indicating if the medoids should be returned identically to some rows of the input data x If FALSE keep data must be false as well and the medoid indices i e row numbers of the medoids will still be returned i med component and the algorithm saves space by needing one copy less of x
Project x a FlexTable to be printed
Project x a matrix or a list of sequences each made of a single vector of mode character where each element is a character state e g A C Objects of class of DNAbin are accepted
Project xlim a list with elements named as the variable names appearing on the x axis with each element being a 2 vector specifying lower and upper limits Any variable not appearing in the list will have its limits computed and possibly trim med
Project x an unquoted file name aside from s This base file name must be a legal S name
Project x a data matrix If more than 2 columns are provided then the data is plotted using the first two principal components
Project x the object to be forced
Project suffix String value will be appended to variable column names of x if x is a data frame If x is not a data frame this argument will be ignored The default value to suffix column names in a data frame depends on the function call recoded variables rec will be suffixed with r dichotomized variables dicho will be suffixed with d grouped variables split var will be suffixed with g
Project x tmTaggedCorpus
Project x ROCMeasures Created by calculateROCMeasures
Project x A factor variable
Project x An object of class merMod such as those from lmer glmer or nlmer
Project x String
Project x The root Node of the tree or sub tree to be convert to a data frame
Project x an object of class dip i e typically the result of dip full result FF where FF is TRUE or a string such as all
Project x For the plot method a mcd object typically result of covMcd For covPlot the numeric data matrix such as the X component as returned from covMcd
Project n x a single number representing the sample size for x
Project x Design matrix with length y rows and p columns containing complete covariates
Project x input data can be one of the following r x 2 table vector of numbers from a contigency table will be transformed into r x 2 table in row wise order or single factor or character vector that will be combined with y into a table
Project x A function raw expression or formula to interpolate
Project x The function
Project x An R object Typically a character string or an object which can be converted to a character string via as character
Project max span The angle of the maximal sector in radians The default is to scale x so that it sums to 2 pi
Project x The object to be printed
Project x ANY Vector list or other type supported by split
Project x The fitted tree from the result of madlib rpart
Project x An XMLDocument
Project xlab a character string or a variable of mode character giving the label for the x axis default is Time
Project x The wcmdscale result object when the function was called with options eig TRUE or x ret TRUE See Details
Project x Matrix of inputs or object of class bclust for plot
Project x a matrix or a data frame with at least two columns the first one gives the number of species in clades with a trait supposed to increase diversification rate and the second one the number of species in the corresponding sister clade without the trait Each row represents a pair of sister clades
Project x The gantt plot object
Project x TODO
Project x A term document matrix
Project x A matrix dataframe or equal length list of vectors
Project xmin Numeric scalar or NULL The lower bound for fitting the power law If NULL the smallest value in x will be used for the R mle implementation and its value will be automatically determined for the plfit implementation This argument makes it possible to fit only the tail of the distribution
Project x A factor variable
Project auxwhere for summaryD and dotchartp specifies whether auxdata and auxgdata are to be placed on the far right of the chart or should appear as pop up tooltips when hovering the mouse over the ordinary x data points on the chart Ignored for dotchart3
Project x The table score object
Project x The table score object
Project is cmplx optional logical to be used when x is character to indicate if it stems from complex vector or not By default NA x is checked to look like complex
Project x plot object
Project x A Corpus object such as a VCorpus or PCorpus
Project x formula or data
Project x The wcmdscale result object when the function was called with options eig TRUE or x ret TRUE See Details
Project x plot object
Project x ROCMeasures Created by calculateROCMeasures
Project x an rqss object as above
Project x a data frame with one row per term where the sequence of the terms correspond to the natural order of a text The data frame x should also contain the columns provided in term and group
Project x The data frame in the required format
Project x An ore number object a list of ore number objects or a formula e g y grp where y is an ore number object and grp is an ore factor object
Project x a matrix or a list of sequences each made of a single vector of mode character where each element is a character state e g A C Objects of class of DNAbin are accepted
Project x A numeric data frame or matrix with the x values If y is NULL these will become the y values and the x positions will be the integers from 1 to dim x 1
Project x An object of the class anylist Essentially a list of objects
Project x a rules or itemsets object
Project x an R object representing a hierarchical clustering For the default method an object of class hclust or with a method for as hclust such as agnes in package href https CRAN R project org package 1 pkg 1 cluster https CRAN R project org package cluster cluster
Project x Input
Project x Numeric vector or variable
Project x numeric variable or R object shingle in plot shingle and x An object list of intervals of class shingleLevel in print shingleLevel
Project x a character string giving either the fully qualified name of a Weka learner or filter class in JNI notation or the name of an available R interface or an object obtained from applying these interfaces to build an associator classifier clusterer or filter
Project extremes The colors for the extreme values of x RGB only
Project x X coordinates
Project x For nmfEstimateRank a target object to be estimated in one of the format accepted by interface nmf For plot NMF rank an object of class NMF rank as returned by function nmfEstimateRank
Project x Data points on x axis measured in current data coordinate
Project x The function
Project x Table with population size versus time as computed by extract popsize
Project x The value of the property This can be an atomic vector a constant a name or quoted call a variable a single sided formula a constant or variable depending on its contents or a delayed reactive which can be either variable or constant
Project x a matrix or an NMF object from which is extracted the mixture coefficient matrix It is extracted from the best fit if x is the results from multiple NMF runs
Project xtrafo a function of transformations to be applied to the factor x supplied in formula see Details Defaults to trafo
Project x a numeric vector matrix for lm fit qr bare or data frame For xless may be any object that is sensible to print For sepUnitsTrans is a character or factor variable For getLatestSource is a character string or vector of character strings containing base file names to retrieve from CVS Set x all to retrieve all source files For clowess x may also be a list with x and y components For inverseFunction x and y contain evaluations of the function whose inverse is needed x is typically an equally spaced grid of 1000 points For strgraphwrap is a character vector
Project x A numeric
Project x A function which takes a TermDocumentMatrix with term frequencies as input weights the elements and returns the weighted matrix
Project x a vector of any mode including a list or a factor or for rep only a POSIXct or POSIXlt or Date object or an S4 object containing such an object
Project x Line segment pattern object of class psp to be smoothed
Project x y Vectors of Cartesian coordinates Alternatively x can be a point pattern and y can be missing
Project x A single categorical column
Project x the object to be deserialized and the character vector to be deserialized
Project x An expression to unquote
Project x x coordinates to plot
Project x The cumulative syllable freqobject
Project x Data to be plotted A numeric vector containing angles or a histogram object containing a histogram of angular values or a density object containing a smooth density estimate for angular data or an fv object giving a function of an angular argument
Project x character vector the values to be matched or the values to be ordered or grouped
Project x A list containing the names or indices of the variables to encode A target encoding map will be created for each element in the list Items in the list can be multiple columns For example if x list c A c B C then there will be one mapping frame for A and one mapping frame for B C in this case we group by two columns
Project x the XML node or the top level document content in which the children are to be accessed The XMLDocumentContent is the container for the top level node that also contains information such as the URI filename and XML version This accessor method is merely a convenience to get access to children of the top level node
Project x a base64 character string
Project x a PROJ 4 character string a shortcut or a CRS object The following shortcuts are available longlat Not really a projection but a plot of the longitude latitude coordinates WGS84 datum wintri Winkel Tripel 1921 Popular projection that is useful in world maps It is the standard of world maps made by the National Geographic Society Type compromise robin Robinson 1963 Another popular projection for world maps Type compromise eck4 Eckert IV 1906 Projection useful for world maps Area sizes are preserved which makes it particularly useful for truthful choropleths Type equal area hd Hobo Dyer 2002 Another projection useful for world maps in which area sizes are preserved Type equal area gall Gall Peters 1855 Another projection useful for world maps in which area sizes are preserved Type equal area merc Web Mercator Projection in which shapes are locally preserved a variant of the original Mercator 1569 used by Google Maps Bing Maps and OpenStreetMap Areas close to the poles are inflated Type conformal utmXX s Universal Transverse Mercator Set of 60 projections where each projection is a traverse mercator optimized for a 6 degree longitude range These ranges are called UTM zones Zone 01 covers 180 to 174 degrees West and zone 60 174 to 180 east Replace XX in the character string with the zone number For southern hemisphere add s So for instance the Netherlands is utm31 and New Zealand is utm59s mill Miller 1942 Projetion based on Mercator in which poles are displayed Type compromise eqc0 Equirectangular 120 Projection in which distances along meridians are conserved The equator is the standard parallel Also known as Plate Carr ee Type equidistant eqc30 Equirectangular 120 Projection in which distances along meridians are conserved The latitude of 30 is the standard parallel Type equidistant eqc45 Equirectangular 120 Projection in which distances along meridians are conserved The latitude of 45 is the standard parallel Also known as Gall isographic Type equidistant rd Rijksdriehoekstelsel Triangulation coordinate system used in the Netherlands EPSG code A valid code from the EPSG database
Project x A factor
Project x y These provide the coordinates of the set of points being tesselated Argument x may be a data frame or a list in particular one of class ppp See the spatstat package For a full description see the discussion of these arguments in the help for deldir
Project prefix character string defining the prefix for function names created when type individual By default the function specifying the transformation for variable x will be named x
Project x the predictor values at which the design matrix will be computed The predictor values can be in a number of formats It can take the form of a vector of length equal to the number of predictors in the original data set or it can be shortened to the length of only those predictors that occur in the model in the same order as they appear in the original data set Similarly x can take the form of a matrix with the number of columns equal to the number of predictors in the original data set or shortened to the number of predictors in the model
Project x The x variable to plot numeric
Project x Continous variable
Project x a trellis object i e the result of a high level plot function in the Lattice framework
Project x A matrix with same number of rows as in y columns coding the levels of sampling hierarchy The number of groups within the hierarchy must decrease from left to right If x is missing two levels are assumed each row is a group in the first level and all rows are in the same group in the second level
Project x An expression to unquote
Project x a numeric vector may be a character or category or factor vector for wtd table
Project x X variable
Project x RasterLayer or RasterStack RasterBrick
Project x The syllable sum object
Project xlim a list with elements named as the variable names appearing on the x axis with each element being a 2 vector specifying lower and upper limits Any variable not appearing in the list will have its limits computed and possibly trim med
Project x The cumulative combo syllable sum object
Project x A Spark DataFrame
Project x variable for horizontal axis
Project x An XMLDocument
Project x A vector of four elements or a two by two matrix or in the case of YuleBonett or YuleCor this can also be a data matrix
Project x The conditioning covariate
Project x The object containing results of ordEval algorithm obtained by calling ordEval If this object is not given it has to be constructed from files file and rndFile
Project xlim a list with elements named as the variable names appearing on the x axis with each element being a 2 vector specifying lower and upper limits Any variable not appearing in the list will have its limits computed and possibly trim med
Project x An object to test for inheritance
Project x RasterLayer or RasterStack RasterBrick
Project x list or data frame List or data frame to check for compliance with at least one of rules See details of qtest for rule explanation
Project x An R object Typically a character string or an object which can be converted to a character string via as character
Project x0 Vector of x coordinates of first endpoint of each segment
Project x object to be sampled from a set of associations or transactions
Project x A data frame with at least 2 columns containing the values of the function argument and the corresponding values of one or more versions of the function
Project x a data matrix If more than 2 columns are provided then the data is plotted using the first two principal components
Project x a numeric vector may be a character or category or factor vector for wtd table
Project xmid a numeric parameter representing the x value at the inflection point of the curve The value of SSlogis will be Asym 2 at xmid
Project x a list with several dendrogram hclust phylo or dendlist objects and other junk that should be omitted
Project x any Parameter value or a list of values for a discrete vector
Project x ajust logit
Project max length The data x is aggregated if necessary by taking batch means so that the length of the series is less than max length If this is set to NULL no aggregation occurs
Project formula family data weights subset na action start offset control model method x y contrasts see glm for print x is the result of Glm
Project x A DocumentTermMatrix or TermDocumentMatrix
Project x A OHLC object from the quantmod package
Project x Vector of x coordinates
Project x An XMLDocument
Project suffix String value will be appended to variable column names of x if x is a data frame If x is not a data frame this argument will be ignored The default value to suffix column names in a data frame depends on the function call recoded variables rec will be suffixed with r dichotomized variables dicho will be suffixed with d grouped variables split var will be suffixed with g
Project x numeric like vector typically of length prod dim or shorter in which case it is recycled
Project x y coordinate vectors of points This can be specified as two vectors x and y a 2 column matrix x a list x with two components etc see xy coords
Project x The syllable sum object
Project x any object to be coerced
Project x The number of successes or failures for which the CI is to be calculated
Project x An object Currently there are methods for numeric logical vectors and date date time and time interval objects Complex vectors are allowed for trim 0 only
Project x A table object or either a vector or a list of several categorical vectors containing grouping variables for the first x margin of the plotted matrix
Project x Any orderable vector i e those with relevant methods for such as numeric character Date etc in case of between and a numeric vector in case of inrange
Project x The object containing results of ordEval algorithm obtained by calling ordEval If this object is not given it has to be constructed from files file and rndFile
Project x the result of a call to the survfit function
Project x A DocumentTermMatrix or TermDocumentMatrix or a vector of term frequencies as obtained by termFreq
Project x Line segment pattern object of class psp to be smoothed
Project x a vector of any mode including a list or a factor or for rep only a POSIXct or POSIXlt or Date object or an S4 object containing such an object
Project max span The angle of the maximal sector in radians The default is to scale x so that it sums to 2 pi
Project x numeric matrix of data or an object that can be coerced to such a matrix such as a numeric vector or a data frame with all numeric columns
Project x A set of external variables to correlate with the phase angles
Project x A vector like matrix like or data frame like object to be subsetted
Project s x a single number representing the sample standard deviation for x
Project x A matrix with same number of rows as in y columns coding the levels of sampling hierarchy The number of groups within the hierarchy must decrease from left to right If x is missing two levels are assumed each row is a group in the first level and all rows are in the same group in the second level
Project x Matrix n x p of complete covariates Only numeric variables are permitted for usage of this function
Project x String
Project x generic object with unknown value s
Project x The fitted forest from the result of madlib randomForest
Project x A list of the tiles in a tessellation as produced the function tile list
Project x yd ys numeric vectors all of the same length representing x i y i and fitted smooth values hat y i y i x will be sorted increasingly if necessary and yd and ys accordingly Alternatively ys can be an x y list as resulting from xy coords containing fitted values on a finer grid than the observations x In that case the observational values x must be part of the larger set seqXtend may be applied to construct such a set of abscissa values
Project x The x coordinates to plot
Project x ANY Object
Project x The cumulative syllable freqobject
Project xinc increment in x over which to examine the density of y in perimeter
Project x an object of class betadisper the result of a call to betadisper
Project x the x points of the curve
Project x optional list of components that change the settings any valid value of theme These are used to modify the current settings obtained by trellis par get before they are displayed
Project x string to be split Works only with one string Non string arguments and multi dimensional arguments are returned unchaged
Project x numeric variable or R object shingle in plot shingle and x An object list of intervals of class shingleLevel in print shingleLevel
Project x Matrix n x p of complete covariates Only numeric variables are permitted for usage of this function
Project x For the default method x should not be specified Otherwise x should be a grob or a gPath If x is character it is assumed to be a gPath
Project x vector of x coordinates of observed points or a 2 column matrix giving x y coordinates or a list with components x y giving coordinates such as a point pattern object of class ppp
Project xtype a vector of one letter character codes specifying how each predictor is to be modeled in order of columns of x The codes are s for smooth function using restricted cubic splines l for no transformation linear or c for categorical to cause expansion into dummy variables Default is s if nk 0 and l if nk 0
Project x an R object with the data to be associated
Project x a list of task labels start end times and task priorities as returned by get gantt info If this is not present get gantt info will be called
Project x A table
Project x Numeric
Project x Fitted object from locfit
Project x The x position of the label
Project x An object to get or set the score value of
Project x the object to be deserialized and the character vector to be deserialized
Project x x coordinates or a data frame with columns x y z
Project medoids x logical indicating if the medoids should be returned identically to some rows of the input data x If FALSE keep data must be false as well and the medoid indices i e row numbers of the medoids will still be returned i med component and the algorithm saves space by needing one copy less of x
Project x Vector of country names may include colons
Project max span The angle of the maximal sector in radians The default is to scale x so that it sums to 2 pi
Project x a matrix containing continuous variable values and codes for categorical variables The matrix must have column names dimnames If row names are present they are used in forming the names attribute of imputed values if imputed TRUE x may also be a formula in which case the model matrix is created automatically using data in the calling frame Advantages of using a formula are that categorical variables can be determined automatically by a variable being a factor variable and variables with two unique levels are modeled asis Variables with 3 unique values are considered to be categorical if a formula is specified For a formula you may also specify that a variable is to remain untransformed by enclosing its name with the identify function e g I x3 The user may add other variable names to the asis and categorical vectors For invertTabulated x is a vector or a list with three components the x vector the corresponding vector of transformed values and the corresponding vector of frequencies of the pair of original and transformed variables For print plot ggplot impute and predict x is an object created by transcan
Project x x variable
Project x A character vector to be hashed
Project non slopes in x set to FALSE if the design matrix x does not have columns for intercepts and these columns are needed
Project x x coordinate s of the new plot in user coordinates of the existing plot or a character string
Project x number of successes or a vector of length 2 giving the numbers of successes and failures respectively
Project x For nmfEstimateRank a target object to be estimated in one of the format accepted by interface nmf For plot NMF rank an object of class NMF rank as returned by function nmfEstimateRank
Project x a contingency table in array form with optional category labels specified in the dimnames x attribute The table is best created by the table command
Project x A numeric
Project x An object that inherits from the partial class
Project x y numeric data vectors If y is not specified it is set equal to x and x is set to 1 length y
Project x Matix of predictors This should not include an intercept
Project x a plm object representing a panel object
Project x The BinaryTree
Project x set to TRUE to store the design matrix with the fit For print is an Rq object
Project x A language or pairlist node Note that these functions are barebones and do not perform any type checking
Project maxHead maxTail abbreviate Non negative integer s also Inf specifying the maxium number of elements of the beginning and then end of the vector to be outputted If n length x is greater than maxHead maxTail 1 then x is truncated to consist of x 1 maxHead abbreviate and x n maxTail 1 n
Project x The independent variable or plotting object in plot
Project x a URL or vector of URLs
Project x a plotly graphics object or a named list of such objects The resulting png file will go in the file path given by the knitr fig path value and have a base name equal to the current knitr chunk name If x is a list a minus sign followed by the chunk name are inserted before png
Project x A vector of four elements or a two by two matrix or in the case of YuleBonett or YuleCor this can also be a data matrix
Project x RasterLayer or RasterBrick object associated with a binary values file on disk
Project x metaMDS result or a dissimilarity structure for initMDS
Project extremes The colors for the extreme values of x Takes precedence over the color ranges
Project x an integer between 1 and length search the length of the search path or 1
Project x a matrix or a data frame with at least two columns the first one gives the number of species in clades with a trait supposed to increase diversification rate and the second one the number of species in the corresponding sister clade without the trait Each row represents a pair of sister clades
Project suffix String value will be appended to variable column names of x if x is a data frame If x is not a data frame this argument will be ignored The default value to suffix column names in a data frame depends on the function call recoded variables rec will be suffixed with r dichotomized variables dicho will be suffixed with d grouped variables split var will be suffixed with g
Project x The cumulative combo syllable sum object
Project x An object of class lme such as those from lme or nlme
Project x Dataset like object to count Built in methods for data frames grouped data frames and ggvis visualisations
Project x Data points on x axis
Project x the dfm to be printed
Project x polyclass object typically the result of polyclass
Project x variable for horizontal axis
Project x A character vector of stems to be completed
Project xbreaks Optional Numeric vector giving the x coordinates of the boundaries of the quadrats Incompatible with nx
Project x a URL or vector of URLs
Project x Table with population size versus time as computed by extract popsize
Project x the dfm to be printed
Project x numeric vector with the data to be summed
Project x A character vector or a factor
Project x an R object representing a hierarchical clustering For the default method an object of class hclust or with a method for as hclust such as agnes in package href https CRAN R project org package 1 pkg 1 cluster https CRAN R project org package cluster cluster
Project x Fitted model object from the rstanarm package See stanreg objects
Project x A xts object from the quantmod package
Project x Data to be rescaled
Project maxdt required Date time object in standard format that will form the upper boundary of the hour or half hour time categories The maxdt option must greater than or equal to the minimum value in x and must be rounded off to the nearst hour for hour categories e g HH 00 00 or rounded off to the nearest half hour for half hour categories e g HH 30 00
Project x A list containing as many numeric vectors as there are sets The first vector contains the counts or percentages of the elements that are only in one set the next vector contains the counts or percentages of elements that are in two sets and so on A matrix of set membership indicators or a two column matrix of object identifiers and attribute identifiers can be passed see Details
Project x some vector
Project x Dataset like object to model and predict Built in methods for data frames grouped data frames and ggvis visualisations
Project x a vector a factor a matrix or a data frame
Project useExpScaled Numeric geq 0 0 The smallest value of x x for which the ratio is calculated using the exponentially scaled Bessel function values
Project x An expression to unquote
Project x0 Vector of x coordinates of first endpoint of each segment
Project x The x location of the lower left corner of the grob
Project x Raster SpatialPoints SpatialLines or SpatialPolygons
Project x an unquoted file name aside from s This base file name must be a legal S name
Project x The BinaryTree
Project x a matrix or a data frame The function will pass all argument to chordDiagramFromMatrix or chordDiagramFromDataFrame depending on the type of x also format of other arguments depends of the type of x If it is in the form of a matrix it should be an adjacency matrix If it is in the form of a data frame it should be an adjacency list
Project x the coordinates to look for Numeric if so their meaning is defined by the input argument or one of all all the points of the ROC curve local maximas the local maximas of the ROC curve or best the point with the best sum of sensitivity and specificity
Project x classIntervals object for printing conversion to shingle or plotting
Project x Age of the annuitant
Project x1 Vector of x coordinates of second endpoint of each segment
Project x The x variables defining each line Each y is plotted against all the x variables
Project max date optional maximum calendar date for plotting x axis of an epidemic curve should be of the form of 2004 08 10 if no date is specified then several days are added to the maximum date in x as specified by the after option
Project x For s3saveRDS a single R object to be saved via saveRDS and uploaded to S3 x is analogous to the object argument in saveRDS
Project x an R object representing a hierarchical clustering For the default method an object of class hclust or with a method for as hclust such as agnes in package href https CRAN R project org package 1 pkg 1 cluster https CRAN R project org package cluster cluster
Project x An object of the class listof Essentially a list of objects
Project x ANY Object
Project x A grob or gList or gTree or gPath
Project x A DocumentTermMatrix TermDocumentMatrix
Project x RasterLayer of flow direction as can be created with terrain
Project x the first row to form the Toeplitz matrix
Project x An object to get or set the score value of
Project x output document as a single string
Project x A function which takes a TermDocumentMatrix with term frequencies as input weights the elements and returns the weighted matrix
Project x Either a two by n data with categorical values from 1 to p or a p x p table If a data array a table will be found
Project x tmTopicModel
Project x The name of the option to be set
Project x y numeric data vectors If y is not specified it is set equal to x and x is set to 1 length y
Project x y numeric vectors of same length supposedly from a model y f x For D1tr x can have length one and then gets the meaning of h Delta x
Project x tmTaggedCorpus
Project x an object of class betadisper the result of a call to betadisper
Project fixed a logical If TRUE default the x elements are used as string literals Otherwise they are taken as regular expressions and partial TRUE is implied corresponding to the approximate string distance used by agrep with fixed FALSE
Project x univariate data set
Project x object to test for class membership
Project x Character vector of Excel column names e g A AF
Project x character vector each element of which is to be split Other inputs including a factor will give an error
Project extremes The colors for the extreme values of x RGB only
Project x the new values of x


@hugovk
Copy link
Member

hugovk commented Dec 3, 2018

That's about 50k characters, do you think you can generate 50k words?

@jwijffels
Copy link
Author

jwijffels commented Dec 4, 2018

Sure, I updated the R code above such that it generated 50k words. As it doesn't make sense to copy-paste them in this comment, I hashed the final text with the 64bit xxhash algorithm and you can find also the embedding of the 50k words below.

digest(X, algo = "xxhash64")
 "9ffcbc431844d59c"
as.numeric(starspace_embedding(model, X))
 -0.12279627 -0.05173848 -0.07786379  0.27775517  0.05830002 -0.18552773  0.13060857 0.01432916  0.05664242  0.19304146 -0.22541937 -0.05172935 -0.10577659  0.19047663  0.09256749  0.13173793  0.06360801  0.09919640  0.23225586 -0.01422341  0.27507266 0.04643934 -0.10495421  0.06213732  0.18558547  0.05021271 -0.22401638  0.06384591 -0.18533641 -0.05114066  0.02659866  0.06835321 -0.01333170 -0.26090792 -0.25534153  0.17079763 -0.02245170  0.16339058 -0.06327450  0.20101789 -0.16265121 -0.01489814 -0.01287134  0.24622424 -0.05026128 -0.03672163 -0.07712743  0.19711804  -0.07180170  0.08221155

@hugovk
Copy link
Member

hugovk commented Dec 4, 2018

Maybe put them in a gist?

https://gist.github.com

@jwijffels
Copy link
Author

I thought the xxhash hashed text was completely in the spirit of the X project. But if you want it as a gist, equally fine. Its asciified with figlet here: https://gist.github.com/jwijffels/fd14531036363464e1c6f5ed5545da5d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants