homodatum helps to manage dataframes in a more human way. This package mainly adds information to data frames (metadata) by creating new classes for variables (hdTypes) and dataframes (hdFringe) and add them more descriptive properties.
Install the development version of makeup from GitHub with:
# install.packages("devtools")
remotes::install_github("datasketch/homodatum")
This is a basic example which shows you how this packages work:
Let´s load homodatum
package
library(homodatum)
One of the main properties of this package is to add new type of
variables, in order to offer ones with (more) metadata and information.
The valid available variable new types from homodatum
can be viewed
with available_hdTypes()
:
id | label |
---|---|
\_\_\_ | Null |
Uid | Uid |
Cat | Categorical |
Bin | Binary |
Seq | Sequential |
Num | Numeric |
Pct | Percentage |
Dst | Distribution |
Dat | Date |
Yea | Year |
Mon | Month |
Day | Day |
Wdy | Day of week |
Ywe | Week in year |
Dtm | Date time |
Hms | Time HMS |
Min | Minutes |
Sec | Seconds |
Hie | Hierarchy |
Grp | Group |
Txt | Text |
Mny | Money |
Gnm | Geo name |
Gcd | Geo code |
Glt | Geo latitude |
Gln | Geo longitude |
Img | Image |
Aud | Audio |
In order to offer a more detailed information about a data frame,
homodatum
offers the function fringe()
, which takes a data frame and
converts it into a more informative object adding properties such as a
dictionary, value type information, data frame name and description and
several summary calculation from de variables, depending on their type.
# Create a dataframe
df <- data.frame(name = c("Roberta", "Ruby", "Roberta", "Maria"),
age = c(98, 43, 98, 12))
# Create a fringe object
fr <- fringe(df)
This is how it looks with all the properties added:
str(fr)
#> List of 9
#> $ data : tibble[,2] (S3: tbl_df/tbl/data.frame/hd_tbl)
#> ..$ name: Cat [1:4] Roberta, Ruby, Roberta, Maria
#> .. ..@ categories : chr [1:3] "Roberta" "Ruby" "Maria"
#> .. ..@ n_categories: int 3
#> .. ..@ stats :List of 4
#> .. .. ..$ n_unique: int 3
#> .. .. ..$ n_na : int 0
#> .. .. ..$ pct_na : num 0
#> .. .. ..$ summary : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#> .. .. .. ..$ category: chr [1:4] "Maria" "Roberta" "Ruby" NA
#> .. .. .. ..$ n : int [1:4] 1 2 1 0
#> .. .. .. ..$ dist : num [1:4] 0.25 0.5 0.25 0
#> .. .. .. ..$ names : logi [1:4] NA NA NA NA
#> ..$ age : Num [1:4] 98, 43, 98, 12
#> .. ..@ stats:List of 5
#> .. .. ..$ n_unique: int 3
#> .. .. ..$ n_na : int 0
#> .. .. ..$ pct_na : num 0
#> .. .. ..$ min : num 12
#> .. .. ..$ max : num 98
#> $ dic : tibble [2 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ id : chr [1:2] "name" "age"
#> ..$ label : chr [1:2] "name" "age"
#> ..$ hdType: hdType [1:2] Cat, Num
#> $ frtype : frType [1:1] Cat-Num
#> ..@ hdTypes: hdType [1:2] Cat, Num
#> ..@ group : chr "Cat-Num"
#> $ group : chr "Cat-Num"
#> $ name : chr "df"
#> $ description: chr ""
#> $ slug : chr "df"
#> $ meta : list()
#> $ stats :List of 3
#> ..$ nrow : int 4
#> ..$ ncol : int 2
#> ..$ col_stats:List of 2
#> .. ..$ name:List of 4
#> .. .. ..$ n_unique: int 3
#> .. .. ..$ n_na : int 0
#> .. .. ..$ pct_na : num 0
#> .. .. ..$ summary : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#> .. .. .. ..$ category: chr [1:4] "Maria" "Roberta" "Ruby" NA
#> .. .. .. ..$ n : int [1:4] 1 2 1 0
#> .. .. .. ..$ dist : num [1:4] 0.25 0.5 0.25 0
#> .. .. .. ..$ names : logi [1:4] NA NA NA NA
#> .. ..$ age :List of 5
#> .. .. ..$ n_unique: int 3
#> .. .. ..$ n_na : int 0
#> .. .. ..$ pct_na : num 0
#> .. .. ..$ min : num 12
#> .. .. ..$ max : num 98
#> - attr(*, "class")= chr "fringe"
- Data:
fr$data
#> # A tibble: 4 × 2
#> name age
#> <Cat> <Num>
#> 1 Roberta 98
#> 2 Ruby 43
#> 3 Roberta 98
#> 4 Maria 12
- Dictionary:
fr$dic
#> # A tibble: 2 × 3
#> id label hdType
#> <chr> <chr> <hdType>
#> 1 name name Cat
#> 2 age age Num
- Summary stats for the fringe object and its variables:
fr$stats
#> $nrow
#> [1] 4
#>
#> $ncol
#> [1] 2
#>
#> $col_stats
#> $col_stats$name
#> $col_stats$name$n_unique
#> [1] 3
#>
#> $col_stats$name$n_na
#> [1] 0
#>
#> $col_stats$name$pct_na
#> [1] 0
#>
#> $col_stats$name$summary
#> # A tibble: 4 × 4
#> category n dist names
#> <chr> <int> <dbl> <lgl>
#> 1 Maria 1 0.25 NA
#> 2 Roberta 2 0.5 NA
#> 3 Ruby 1 0.25 NA
#> 4 <NA> 0 0 NA
#>
#>
#> $col_stats$age
#> $col_stats$age$n_unique
#> [1] 3
#>
#> $col_stats$age$n_na
#> [1] 0
#>
#> $col_stats$age$pct_na
#> [1] 0
#>
#> $col_stats$age$min
#> [1] 12
#>
#> $col_stats$age$max
#> [1] 98
Learn about the many ways to work with formatting dates values in
vignette("set-name")