-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
aec1ca4
commit bd0c870
Showing
19 changed files
with
341 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,27 @@ | ||
Package: binsmooth | ||
Type: Package | ||
Title: Generate PDFs and CDFs from Binned Data | ||
Version: 0.1.0 | ||
Version: 0.2.0 | ||
Author: David J. Hunter and McKalie Drown | ||
Maintainer: Dave Hunter <dhunter@westmont.edu> | ||
Description: Provides several methods for generating density functions | ||
based on binned data. Data are assumed to be nonnegative, but the bin widths | ||
need not be uniform, and the top bin may be unbounded. All PDF smoothing methods | ||
maintain the areas specified by the binned data. (Equivalently, all CDF | ||
smoothing methods interpolate the points specified by the binned data.) An | ||
estimate for the mean of the distribution may be supplied as an optional | ||
argument, which greatly improves the reliability of statistics computed from | ||
the smoothed density functions. Methods include step function, recursive | ||
subdivision, and optimized spline. | ||
based on binned data. Methods include step function, recursive | ||
subdivision, and optimized spline. Data are assumed to be nonnegative, | ||
but the bin widths need not be equal, and the top bin need not have an | ||
upper bound. All PDF smoothing methods maintain the areas specified by | ||
the binned data. (Equivalently, all CDF smoothing methods interpolate | ||
the points specified by the binned data.) An estimate for the mean of | ||
the distribution may be supplied as an optional argument, which greatly | ||
improves the reliability of statistics computed from the smoothed density | ||
functions. Includes methods for estimating the Gini coefficient, the | ||
Theil index, percentiles, and random deviates from a smoothed | ||
distribution. Among the three methods, the optimized spline (splinebins) | ||
is recommended for most purposes. The percentile and random-draw | ||
functions only support splinebins. | ||
License: MIT + file LICENSE | ||
Imports: stats, pracma, ineq, triangle | ||
LazyData: TRUE | ||
NeedsCompilation: no | ||
Packaged: 2016-08-12 14:09:50 UTC; dhunter | ||
Packaged: 2019-05-31 16:25:17 UTC; dhunter | ||
Repository: CRAN | ||
Date/Publication: 2016-08-12 16:46:49 | ||
Date/Publication: 2019-05-31 22:11:49 UTC |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,26 @@ | ||
29cbf0aaefa92ed1f2b8339fa9724e0e *DESCRIPTION | ||
0555bf8a3c2237d7c21ef858daa8c14d *DESCRIPTION | ||
11fcc18229d1926590c9d08f219c5132 *LICENSE | ||
1e32880d420021b43570b02ebd8ee747 *NAMESPACE | ||
697bb12c10759a0ff85806e074114758 *NEWS | ||
0b08e66cda3f9eab71fa02616c2ae78a *R/gini.R | ||
718b18f72622950c7b2295e16059284c *R/rsubbins.R | ||
0b0f11381b6ae0594fec5136eeecb050 *R/simcounty.R | ||
98fd59c90b830afaf4e0d9c7b17aa00d *R/splinebins.R | ||
d83f9955b6554f52406efffe1ff50d98 *R/sb_percentiles.R | ||
d75b32130119bef303a08018516b08de *R/sb_sample.R | ||
bc0b94f27b3c257f31e3cd997eb14089 *R/simcounty.R | ||
f72652d10bcb9011168986090a14c06b *R/splinebins.R | ||
93e8b451cc3e732d3e003fac2adfe38b *R/stats_from_distribution.R | ||
623bfeaff9a308954638e90d80276b18 *R/stepbins.R | ||
6644434bb4789dad83ffb90467f932fd *R/theil.R | ||
5237fa31e1d511ca7ead04d6f771f08c *data/county_bins.rda | ||
47e7d5dc78ca0d9cbd9c8d105b2b2090 *data/county_true.rda | ||
d668f119a683e0b3b0ab2a3da9517394 *man/county_bins.Rd | ||
84a042f94329da7f1240919fd887bb33 *man/county_true.Rd | ||
143c3aea13378d084527eb3398a3bce2 *man/rsubbins.Rd | ||
99495f9813d6057a0ac352e8ab92187d *man/simcounty.Rd | ||
bd92a9fcc194aaaf34d52dba32bd68b5 *man/splinebins.Rd | ||
c66af3ca0a5304d0fdea1a54229ca5da *man/stepbins.Rd | ||
df3c9823236c02b29a842405c923a8e1 *man/gini.Rd | ||
fe2bd370b9ea7ef94dea637644727966 *man/rsubbins.Rd | ||
5cca6bc445c879f14bfda203f286f326 *man/sb_percentiles.Rd | ||
a9f32c0206986b937d7dae2e95012267 *man/sb_sample.Rd | ||
da111c9bd47b60725cb440b6995188fe *man/simcounty.Rd | ||
3f8447b7e5c73e431bdf5721ea32eba8 *man/splinebins.Rd | ||
00d834fa433ec1b0ee440c0a685f50f6 *man/stats_from_distribution.Rd | ||
b9f8d416f41acaa1d2857b48f6dc510d *man/stepbins.Rd | ||
de732e89ee60ab0d1f12c24766220b8c *man/theil.Rd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
Changes in Version 0.2.0 | ||
======================== | ||
|
||
New features: | ||
|
||
* Added functions to compute the Gini and Theil coefficients from the smoothed distributions, along with other descriptive statistics. | ||
* Added Theil index to simulated county_true data. | ||
* Added inverse CDF to the list that splinebins returns. | ||
* Added functions for computing percentiles and random samples from a splinebins fit. | ||
* Added NEWS file. | ||
|
||
Updates: | ||
|
||
* Updated references to the paper in Sociological Science: https://www.sociologicalscience.com/articles-v4-26-641/ | ||
* Updated documentation. | ||
* Fixed typo in bincounts for Cook County in documentation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
gini <- function(binFit) { | ||
CDF <- binFit[[2]] | ||
E <- binFit[[3]] | ||
cdf_mean <- E - pracma::integral(CDF, 0, E) | ||
return(1-pracma::integral(function(x){(1-CDF(x))^2}, 0, E)/cdf_mean) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
sb_percentiles <- function(splinebinFit, p = seq(0,100,25)) { | ||
iCDF <- splinebinFit$splineInvCDF | ||
percentiles <- iCDF(p/100) | ||
names(percentiles) <- paste0(p, "%") | ||
return(percentiles) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
sb_sample <- function(splinebinFit, n = 1) { | ||
iCDF <- splinebinFit$splineInvCDF | ||
return(iCDF(stats::runif(n))) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
stats_from_distribution <- function(binFit) { | ||
PDF <- binFit[[1]] | ||
CDF <- binFit[[2]] | ||
E <- binFit[[3]] | ||
cdf_mean <- E - pracma::integral(CDF, 0, E) | ||
v <- pracma::integral(function(x){2*x-2*x*CDF(x)}, 0 ,E) - cdf_mean^2 | ||
g <- 1-pracma::integral(function(x){(1-CDF(x))^2}, 0, E)/cdf_mean | ||
t <- pracma::integral(function(x){PDF(x)*x/cdf_mean*log(x/cdf_mean)}, 0, E) | ||
statistics <- c(cdf_mean, v, sqrt(v), g, t) | ||
names(statistics) <- c("mean", "variance", "SD", "Gini", "Theil") | ||
return(statistics) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
theil <- function(binFit) { | ||
PDF <- binFit[[1]] | ||
CDF <- binFit[[2]] | ||
E <- binFit[[3]] | ||
cdf_mean <- E - pracma::integral(CDF, 0, E) | ||
return(pracma::integral(function(x){PDF(x)*x/cdf_mean*log(x/cdf_mean)}, 0, E)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
\name{gini} | ||
\alias{gini} | ||
\title{ | ||
Estimate the Gini coefficient | ||
} | ||
\description{ | ||
Estimates the Gini coefficient from a smoothed distribution. | ||
} | ||
\usage{ | ||
gini(binFit) | ||
} | ||
\arguments{ | ||
\item{binFit}{ | ||
A list as returned by \code{\link{splinebins}}, \code{\link{stepbins}}, or \code{\link{rsubbins}}. (Alternatively, a list containing a PDF of non-negative support, its CDF, and an upper bound for the support of the PDF.) | ||
} | ||
} | ||
\details{ | ||
For distributions of non-negative support, the Gini coefficient can be computed from a cumulative distribution function \eqn{F(x)} by the integral | ||
\deqn{G = 1 - \frac{1}{\mu}\int_0^\infty (1-F(x))^2 \, dx} | ||
where \eqn{\mu} is the mean of the distribution. | ||
} | ||
\value{ | ||
Returns the Gini coefficient \eqn{G}. | ||
} | ||
\references{ | ||
Paul T. von Hippel, David J. Hunter, McKalie Drown. \emph{Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching}, Sociological Science, November 15, 2017. \url{https://www.sociologicalscience.com/articles-v4-26-641/} | ||
} | ||
\author{ | ||
David J. Hunter and McKalie Drown | ||
} | ||
|
||
\examples{ | ||
# 2005 ACS data from Cook County, Illinois | ||
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000, | ||
50000,60000,75000,100000,125000,150000,200000,NA) | ||
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481, | ||
79816,153581,195430,240948,155139,94527,92166,103217) | ||
stepfit <- stepbins(binedges, bincounts, 76091) | ||
splinefit <- splinebins(binedges, bincounts, 76091) | ||
gini(stepfit) | ||
gini(splinefit) # More accurate | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
\name{sb_percentiles} | ||
\alias{sb_percentiles} | ||
\title{ | ||
Estimate percentiles from splinebins | ||
} | ||
\description{ | ||
Estimates percentiles of a smoothed distribution obtained using \code{\link{splinebins}}. | ||
} | ||
\usage{ | ||
sb_percentiles(splinebinFit, p = seq(0,100,25)) | ||
} | ||
\arguments{ | ||
\item{splinebinFit}{ | ||
A list as returned by \code{\link{splinebins}}. | ||
} | ||
\item{p}{ | ||
A vector of percentages in the range \eqn{0 \le p \le 100}. | ||
} | ||
} | ||
\details{ | ||
The approximate inverse of the CDF calculated by \code{\link{splinebins}} is used to approximate percentiles of the smoothed distribution. | ||
} | ||
\value{ | ||
A vector of percentiles. | ||
} | ||
\references{ | ||
Paul T. von Hippel, David J. Hunter, McKalie Drown. \emph{Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching}, Sociological Science, November 15, 2017. \url{https://www.sociologicalscience.com/articles-v4-26-641/} | ||
} | ||
\author{ | ||
David J. Hunter and McKalie Drown | ||
} | ||
|
||
\examples{ | ||
# 2005 ACS data from Cook County, Illinois | ||
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000, | ||
50000,60000,75000,100000,125000,150000,200000,NA) | ||
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481, | ||
79816,153581,195430,240948,155139,94527,92166,103217) | ||
splinefit <- splinebins(binedges, bincounts, 76091) | ||
sb_percentiles(splinefit) | ||
sb_percentiles(splinefit, c(27, 32, 93)) | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
\name{sb_sample} | ||
\alias{sb_sample} | ||
\title{ | ||
Random sample from splinebins distribution | ||
} | ||
\description{ | ||
Draw a random sample of points from a smoothed distribution obtained using \code{\link{splinebins}}. | ||
} | ||
\usage{ | ||
sb_sample(splinebinFit, n = 1) | ||
} | ||
\arguments{ | ||
\item{splinebinFit}{ | ||
A list as returned by \code{\link{splinebins}}. | ||
} | ||
\item{n}{ | ||
A positive integer giving the sample size. | ||
} | ||
} | ||
\details{ | ||
The approximate inverse of the CDF calculated by \code{\link{splinebins}} is used to generate random values of the smoothed distribution. | ||
} | ||
\value{ | ||
A vector of random deviates. | ||
} | ||
\references{ | ||
Paul T. von Hippel, David J. Hunter, McKalie Drown. \emph{Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching}, Sociological Science, November 15, 2017. \url{https://www.sociologicalscience.com/articles-v4-26-641/} | ||
} | ||
\author{ | ||
David J. Hunter and McKalie Drown | ||
} | ||
|
||
\examples{ | ||
# 2005 ACS data from Cook County, Illinois | ||
binedges <- c(10000,15000,20000,25000,30000,35000,40000,45000, | ||
50000,60000,75000,100000,125000,150000,200000,NA) | ||
bincounts <- c(157532,97369,102673,100888,90835,94191,87688,90481, | ||
79816,153581,195430,240948,155139,94527,92166,103217) | ||
splinefit <- splinebins(binedges, bincounts, 76091) | ||
sb_sample(splinefit, 5) | ||
hist(sb_sample(splinefit, 3000)) | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.