Skip to content
a R package for data exploratory analysis
R TeX
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Paper JOSS final paper Sep 2, 2019
R JOSS paper Aug 7, 2019
docs Updated contribution guidelines Aug 20, 2019
inst/rmd_template updated utilis Aug 6, 2019
man updated utilis Aug 6, 2019
revdep Version updated May 4, 2019
tests updated test function Aug 7, 2019
vignettes
.Rbuildignore
.gitignore Version updated May 4, 2019
CODE_OF_CONDUCT.md Version updated May 4, 2019
CONDUCT.md Version updated May 4, 2019
CONTRIBUTING.md Updated contribution guidelines Aug 20, 2019
CRAN-RELEASE updated utilis Aug 6, 2019
DESCRIPTION updated utilis Aug 6, 2019
LICENSE Version updated May 4, 2019
NAMESPACE
NEWS.md updated utilis Aug 6, 2019
README.md Updated contribution guidelines Aug 20, 2019
VcfEDA.Rproj
cran-comments.md Version updated May 4, 2019
master Updated contribution guidelines Aug 20, 2019

README.md

SmartEDA CRAN status

Downloads Total Downloads

Authors: Dayanand Ubrangala, Kiran R, Ravi Prasad Kondapalli and Sayan Putatunda


Background

In a quality statistical data analysis the initial step has to be exploratory. Exploratory data analysis begins with the univariate exploratory analyis - examining the variable one at a time. Next comes bivariate analysis followed by multivariate analyis. SmartEDA package helps in getting the complete exploratory data analysis just by running the function instead of writing lengthy r code.


Functionalities of SmartEDA

The SmartEDA R package has four unique functionalities as

  • Descriptive statistics
  • Data visualisation
  • Custom table
  • HTML EDA report

SmartEDA


Journal of Open Source Software Article

An article describing SmartEDA pacakge for exploratory data analysis approach has been published in arxiv and currently it is under review at The Journal of Open Source Software. Please cite the paper if you use SmartEDA in your work!


Installation

The package can be installed directly from CRAN.

install.packages("SmartEDA")

To contribute, download the latest development version of SmartEDA from GitHub via devtools:

install.packages("devtools")
devtools::install_github("daya6489/SmartEDA",ref = "develop")

Example

Data

In this vignette, we will be using a simulated data set containing sales of child car seats at 400 different stores.

Data Source ISLR package.

Install the package "ISLR" to get the example data set.

	install.packages("ISLR")
	library("ISLR")
	install.packages("SmartEDA")
	library("SmartEDA")
	## Load sample dataset from ISLR pacakge
	Carseats= ISLR::Carseats

Overview of the data

Understanding the dimensions of the dataset, variable names, overall missing summary and data types of each variables

## overview of the data; 
	ExpData(data=Carseats,type=1)
## structure of the data	
	ExpData(data=Carseats,type=2)

Summary of numerical variables

To summarise the numeric variables, you can use following r codes from this pacakge

## Summary statistics by – overall
	ExpNumStat(Carseats,by="A",gp=NULL,Qnt=seq(0,1,0.1),MesofShape=2,Outlier=TRUE,round=2)
## Summary statistics by – overall with correlation	
	ExpNumStat(Carseats,by="A",gp="Price",Qnt=seq(0,1,0.1),MesofShape=1,Outlier=TRUE,round=2)
## Summary statistics by – category
	ExpNumStat(Carseats,by="GA",gp="Urban",Qnt=seq(0,1,0.1),MesofShape=2,Outlier=TRUE,round=2)

Graphical representation of all numeric features

## Generate Boxplot by category
ExpNumViz(mtcars,target="gear",type=2,nlim=25,fname = file.path(tempdir(),"Mtcars2"),Page = c(2,2))
## Generate Density plot
ExpNumViz(mtcars,target=NULL,type=3,nlim=25,fname = file.path(tempdir(),"Mtcars3"),Page = c(2,2))
## Generate Scatter plot
ExpNumViz(mtcars,target="carb",type=3,nlim=25,fname = file.path(tempdir(),"Mtcars4"),Page = c(2,2))
ExpNumViz(mtcars,target="am",scatter=TRUE)

Summary of Categorical variables

## Frequency or custom tables for categorical variables
	ExpCTable(Carseats,Target=NULL,margin=1,clim=10,nlim=5,round=2,bin=NULL,per=T)
	ExpCTable(Carseats,Target="Price",margin=1,clim=10,nlim=NULL,round=2,bin=4,per=F)
## Summary statistics of categorical variables
	ExpCatStat(Carseats,Target="Urban",result = "Stat",clim=10,nlim=5,Pclass="Yes")
## Inforamtion value and Odds value
	ExpCatStat(Carseats,Target="Urban",result = "IV",clim=10,nlim=5,Pclass="Yes")

Graphical representation of all categorical variables

## column chart
	ExpCatViz(Carseats,target="Urban",fname=NULL,clim=10,col=NULL,margin=2,Page = c(2,1),sample=2)
## Stacked bar graph
	ExpCatViz(Carseats,target="Urban",fname=NULL,clim=10,col=NULL,margin=2,Page = c(2,1),sample=2)
## Variable importance graph using information values
  ExpCatStat(Carseats,Target="Urban",result="Stat",Pclass="Yes",plot=TURE,top=20,Round=2)

Variable importance based on Information value

  ExpCatStat(Carseats,Target="Urban",result = "Stat",clim=10,nlim=5,bins=10,Pclass="Yes",plot=TRUE,top=10,
  Round=2)

Create HTML EDA report

Create a exploratory data analysis report in HTML format

	ExpReport(Carseats,Target="Urban",label=NULL,theme="Default",op_file="test.html",op_dir=getwd(),sc=2,
	sn=2,Rc="Yes")

Quantile-quantile plot for numeric variables

	ExpOutQQ(CData,nlim=10,fname=NULL,Page=c(2,2),sample=4)

Parallel Co-ordinate plots

## Defualt ExpParcoord funciton
	ExpParcoord(CData,Group=NULL,Stsize=NULL,Nvar=c("Price","Income","Advertising","Population","Age",
	"Education"))
## With Stratified rows and selected columns only
  ExpParcoord(CData,Group="ShelveLoc",Stsize=c(10,15,20),Nvar=c("Price","Income"),Cvar=c("Urban","US"))
## Without stratification
  ExpParcoord(CData,Group="ShelveLoc",Nvar=c("Price","Income"),Cvar=c("Urban","US"),scale=NULL)

Exploratory analysis - Custom tables, summary statistics

Descriptive summary on all input variables for each level/combination of group variable. Also while running the analysis we can filter row/cases of the data.

	ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=FALSE)
	ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt=NULL)
	ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=TRUE,filt=NULL)
	ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt="Population>150")
	ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")

Issues

  • Need some help?
  • Found a bug?
  • Request a new feature? Just open an issue.

Contributions

  • Want to add a feature?
  • Correct a bug? You're more than welcome to contribute

Please read the contribution guidelines prior to submitting a pull request. Try to code and submit a new pull request (PR). Even if not perfect, we will help you to make a great PR


Articles

See article wiki page.

References

Chon Ho, Y. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research, 3(1), 9–22. doi:https://doi.org/10.21500/20112084.819

Coates, M. (2016). exploreR: Tools for Quickly Exploring Data. Retrieved from https://CRAN.R-project.org/package=exploreR

Comtois, D. (2018). summarytools: Tools to Quickly and Neatly Summarize Data. Retrieved from https://CRAN.R-project.org/package=summarytools

Cui, B. (2018). DataExplorer: Data Explorer. Retrieved from https://CRAN.Rproject.org/package=DataExplorer

DiCerbo et al. (2015). Serious Games Analytics. Advances in Game-Based Learning. In C. Loh, Y. Sheng, & D. Ifenthaler (Eds.),. Cham: Springer. doi:10.1007/978-3-319-05834-4

Harrell et al. (2018). Hmisc: Harrell Miscellaneous, Retrieved from https://CRAN.Rproject.org/package=Hmisc

Hoaglin, D., Mosteller, F., & Tukey, J. (1983). Understanding robust and exploratory data analysis. Wiley Series in probability and mathematical statistics, New-York.

Jaggi, S. (2013). Descriptive statistics and exploratory data analysis. Indian Agricultural Statistics Research Institute. Retrieved from http://www.iasri.res.in/ebook/EB_SMAR/ebook_pdf%20files/Manual%20II/1-Descriptive%20Statistics.pdf

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2017). ISLR: Data for an Introduction to Statistical Learning with Applications in R. doi:https://doi.org/10.1007/978-1-4614-7138-7_1

Konopka et al. (2018). Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data. PLoS ONE, 13(8).

Liu, Q. (2014, October). The Application of Exploratory Data Analysis in Auditing (PhD thesis). Newark Rutgers, The State University of New Jersey, Newark, New Jersey.

Ma, X., Hummer, D., Golden, J. J., Fox, P. A., Hazen, R. M., Morrison, S. M., Downs, R.T., et al. (2017). Using Visual Exploratory Data Analysis to Facilitate Collaboration and Hypothesis Generation in Cross-Disciplinary Research. International Journal of Geo-Information, 6(368), 1–11. doi:https://doi.org/10.3390/ijgi6110368

Nair, A. (2018). RtutoR: Shiny Apps for Plotting and Exploratory Analysis. Retrieved from https://CRAN.R-project.org/package=RtutoR

Ryu, C. (2018). dlookr: Tools for Data Diagnosis, Exploration, Transformation. Retrieved from https://CRAN.R-project.org/package=dlookr

Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.

Ubrangala, D., Rama, K., Kondapalli, R. P., & Putatunda, S. (2018). SmartEDA: Summarize and Explore the Data. Retrieved from https://CRAN.R-project.org/package=SmartEDA

You can’t perform that action at this time.