ikodaSparse

ikodaSparse maintains sparse data along with its meaningful text values.

Libsvm format data (and the analogous LabeledPoint Scala class) do not maintain meaningful text values for columns or rows. They are purely numeric. In contrast, ikodaSparse maintains the text values for features/columns and text category names for the labels/targets.

As an example, this allows natural language word frequency data to be processed in libsvm format without losing the meaningful information required when reporting and providing data visualization of the data analysis.

ikodaSparse is a Scala tool designed to run as part of a data pipeline on Spark.

The core of the tool is an RDD[org.apache.spark.ml.feature.LabeledPoint] with a mapping for text names to each column and also to each label/target.

ikodaSparse also converts the data to both DataFrame and RDD[org.apache.spark.mllib.regression.LabeledPoint]if required

The main function of ikodaSparse is to manipulate large sparse data.

ikodaSparse can:

Maintain a map of numeric feature identifiers with text names
Maintain a map of numeric labels/targets with text labels
Maintain a UUID for each row
Remove columns/features
Reorder columns/features
Add columns
Remove rows by label/target
Perform mathematical operations, both row wise and column wise
Provide data directly to scala ML functions
Merge labels/targets.
Merge data schemas. (i.e., convert one data set to match the column and target numbers of another).
Merge sparse data from two sources
Dichotomize labels/targets.(i.e., It is either of target A or OTHER)
Identify and remove duplicate rows
Return rows containing a particular column.
Load and save data on a local file system
Load and save data on Hadoop.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
docs		docs
src		src
unitTestInput		unitTestInput
.gitignore		.gitignore
.idgen		.idgen
ikodaSparse.iml		ikodaSparse.iml
mvnr.bat		mvnr.bat
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ikodaSparse

ikodaSparse can:

See Simple Guide to ikodaSparse for details

View ikodaSparse API (See class and object RDDLabeledPoint)

About

Releases

Packages

Languages

amerywu/ikodaSparse

Folders and files

Latest commit

History

Repository files navigation

ikodaSparse

ikodaSparse can:

See Simple Guide to ikodaSparse for details

View ikodaSparse API (See class and object RDDLabeledPoint)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages