Skip to content

datasketch/homodatum

Repository files navigation

homodatum


Overview

homodatum helps to manage dataframes in a more human way. This package mainly adds information to data frames (metadata) by creating new classes for variables (hdTypes) and dataframes (hdFringe) and add them more descriptive properties.

Installation

Install the development version of makeup from GitHub with:

# install.packages("devtools")
remotes::install_github("datasketch/homodatum")

Example

This is a basic example which shows you how this packages work:

Let´s load homodatum package

library(homodatum)

New type of values

One of the main properties of this package is to add new type of variables, in order to offer ones with (more) metadata and information. The valid available variable new types from homodatum can be viewed with available_hdTypes():

Available hdTypes for variables
id label
\_\_\_ Null
Uid Uid
Cat Categorical
Bin Binary
Seq Sequential
Num Numeric
Pct Percentage
Dst Distribution
Dat Date
Yea Year
Mon Month
Day Day
Wdy Day of week
Ywe Week in year
Dtm Date time
Hms Time HMS
Min Minutes
Sec Seconds
Hie Hierarchy
Grp Group
Txt Text
Mny Money
Gnm Geo name
Gcd Geo code
Glt Geo latitude
Gln Geo longitude
Img Image
Aud Audio

New type of data frame

In order to offer a more detailed information about a data frame, homodatum offers the function fringe(), which takes a data frame and converts it into a more informative object adding properties such as a dictionary, value type information, data frame name and description and several summary calculation from de variables, depending on their type.

Creating a fringe object:

# Create a dataframe
df <- data.frame(name = c("Roberta", "Ruby", "Roberta", "Maria"),
                 age  = c(98, 43, 98, 12))

# Create a fringe object
fr <- fringe(df)

This is how it looks with all the properties added:

str(fr)
#> List of 9
#>  $ data       : tibble[,2] (S3: tbl_df/tbl/data.frame/hd_tbl)
#>   ..$ name: Cat [1:4] Roberta, Ruby, Roberta, Maria
#>    .. ..@ categories  : chr [1:3] "Roberta" "Ruby" "Maria"
#>    .. ..@ n_categories: int 3
#>    .. ..@ stats       :List of 4
#>    .. .. ..$ n_unique: int 3
#>    .. .. ..$ n_na    : int 0
#>    .. .. ..$ pct_na  : num 0
#>    .. .. ..$ summary : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#>    .. .. .. ..$ category: chr [1:4] "Maria" "Roberta" "Ruby" NA
#>    .. .. .. ..$ n       : int [1:4] 1 2 1 0
#>    .. .. .. ..$ dist    : num [1:4] 0.25 0.5 0.25 0
#>    .. .. .. ..$ names   : logi [1:4] NA NA NA NA
#>   ..$ age : Num [1:4] 98, 43, 98, 12
#>    .. ..@ stats:List of 5
#>    .. .. ..$ n_unique: int 3
#>    .. .. ..$ n_na    : int 0
#>    .. .. ..$ pct_na  : num 0
#>    .. .. ..$ min     : num 12
#>    .. .. ..$ max     : num 98
#>  $ dic        : tibble [2 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ id    : chr [1:2] "name" "age"
#>   ..$ label : chr [1:2] "name" "age"
#>   ..$ hdType: hdType [1:2] Cat, Num
#>  $ frtype     : frType [1:1] Cat-Num
#>    ..@ hdTypes: hdType [1:2] Cat, Num
#>    ..@ group  : chr "Cat-Num"
#>  $ group      : chr "Cat-Num"
#>  $ name       : chr "df"
#>  $ description: chr ""
#>  $ slug       : chr "df"
#>  $ meta       : list()
#>  $ stats      :List of 3
#>   ..$ nrow     : int 4
#>   ..$ ncol     : int 2
#>   ..$ col_stats:List of 2
#>   .. ..$ name:List of 4
#>   .. .. ..$ n_unique: int 3
#>   .. .. ..$ n_na    : int 0
#>   .. .. ..$ pct_na  : num 0
#>   .. .. ..$ summary : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#>   .. .. .. ..$ category: chr [1:4] "Maria" "Roberta" "Ruby" NA
#>   .. .. .. ..$ n       : int [1:4] 1 2 1 0
#>   .. .. .. ..$ dist    : num [1:4] 0.25 0.5 0.25 0
#>   .. .. .. ..$ names   : logi [1:4] NA NA NA NA
#>   .. ..$ age :List of 5
#>   .. .. ..$ n_unique: int 3
#>   .. .. ..$ n_na    : int 0
#>   .. .. ..$ pct_na  : num 0
#>   .. .. ..$ min     : num 12
#>   .. .. ..$ max     : num 98
#>  - attr(*, "class")= chr "fringe"

You can inspect specifics attibutes of the fringe object such as:

  • Data:
fr$data
#> # A tibble: 4 × 2
#>   name      age
#>   <Cat>   <Num>
#> 1 Roberta    98
#> 2 Ruby       43
#> 3 Roberta    98
#> 4 Maria      12
  • Dictionary:
fr$dic
#> # A tibble: 2 × 3
#>   id    label hdType  
#>   <chr> <chr> <hdType>
#> 1 name  name  Cat     
#> 2 age   age   Num
  • Summary stats for the fringe object and its variables:
fr$stats
#> $nrow
#> [1] 4
#> 
#> $ncol
#> [1] 2
#> 
#> $col_stats
#> $col_stats$name
#> $col_stats$name$n_unique
#> [1] 3
#> 
#> $col_stats$name$n_na
#> [1] 0
#> 
#> $col_stats$name$pct_na
#> [1] 0
#> 
#> $col_stats$name$summary
#> # A tibble: 4 × 4
#>   category     n  dist names
#>   <chr>    <int> <dbl> <lgl>
#> 1 Maria        1  0.25 NA   
#> 2 Roberta      2  0.5  NA   
#> 3 Ruby         1  0.25 NA   
#> 4 <NA>         0  0    NA   
#> 
#> 
#> $col_stats$age
#> $col_stats$age$n_unique
#> [1] 3
#> 
#> $col_stats$age$n_na
#> [1] 0
#> 
#> $col_stats$age$pct_na
#> [1] 0
#> 
#> $col_stats$age$min
#> [1] 12
#> 
#> $col_stats$age$max
#> [1] 98

Learn about the many ways to work with formatting dates values in vignette("set-name")

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages