Lettervalue tables from the command line.
The inspiration for this project came from the paper Letter-value plots: Boxplots for large data.
It seemed like a good idea to not just have these letter-values for plotting, but also in tabular for as summary statistics. This was how letter values were originally introduced in to that world in Exploratory Data Analysis. For some reason they did not seem to catch on. Maybe this was becase the data at the time was smaller. Modern petabyte sized data vindicates letter values.
This package aims to be a user friendly command line utility to generate these values.
- Works with tidy data
- Tukey inspired
- Printable csv only - making a nice display in Cloudwatch or other logging tools
- Uses data piped with stdout making it compatable with a wider range of unix tools.
# download data
wget https://raw.githubusercontent.com/tidyverse/ggplot2/master/data-raw/diamonds.csv
# install lvtab
pip3 install lvtab
# ungrouped lv table
cat diamonds.csv | lvtab -y price
results:
tail_area_odds,lower_quantile,upper_quantile,lower_value,upper_value
4,0.25001,0.75001,950.0,5324.74999
8,0.12501,0.87501,694.0,8687.12499
16,0.06251,0.9375,572.0,12150.0625
32,0.03127,0.96875,497.0,14928.18749
64,0.01564,0.98438,449.0,16709.0
128,0.00782,0.9922,420.99218,17710.01561
256,0.00392,0.9961,394.0,18234.01561
512,0.00197,0.99805,376.0,18489.00779
1024,0.00099,0.99903,364.49901,18668.51071
2048,0.0005,0.99952,356.999,18741.0
-
tail_area_odds
- odds are 1 intail_area_odds
that a value is either lower than thelower_value
or higher than theupper_value
. For example if thetail_area_odds
are 4 then the tail area is ~ 0.25 or 1/4. This is saying that we cut the data into 4 parts where a quarter of it is less than the lower value and a quarter is higher than the upper value -- aka 1/4 on either end. -
*_quantile
- quantile of cut -
*_value
- the value of the cut at the given quantiles
See all options with lvtab --help
I have been working on tv to make printing of csv files pretty in the command line. I use lvtab
with tv
.
cat diamonds.csv | lvtab -y price | tv
pip3 install lvtab