The goal of ucimlrepo
is to download and import data sets directly
into R from the UCI Machine Learning
Repository.
Important
This package is an unoffical port of the Python ucimlrepo
package.
Note
Want to have datasets alongside a help documentation entry?
Check out the {ucidata}
R package! The package provides a small selection of data sets from
the UC Irvine Machine Learning Repository alongside of help entries.
You can install the development version of ucimlrepo from GitHub with:
# install.packages("remotes")
remotes::install_github("coatless-rpkg/ucimlrepo")
To use ucimlrepo
, load the package using:
library(ucimlrepo)
With the package now loaded, we can download a dataset using the
fetch_ucirepo()
function or use the list_available_datasets()
function to view a list of available datasets.
For example, to download the iris
dataset, we can use:
# Fetch a dataset by name
iris_by_name <- fetch_ucirepo(name = "iris")
names(iris_by_name)
#> [1] "data" "metadata" "variables"
There are many levels to the data returned. For example, we can extract
the original data frame containing the iris
dataset using:
iris_uci <- iris_by_name$data$original
head(iris_uci)
#> sepal length sepal width petal length petal width class
#> 1 5.1 3.5 1.4 0.2 Iris-setosa
#> 2 4.9 3.0 1.4 0.2 Iris-setosa
#> 3 4.7 3.2 1.3 0.2 Iris-setosa
#> 4 4.6 3.1 1.5 0.2 Iris-setosa
#> 5 5.0 3.6 1.4 0.2 Iris-setosa
#> 6 5.4 3.9 1.7 0.4 Iris-setosa
Alternatively, we could retrieve two data frames, one for the features and one for the targets:
iris_features <- iris_by_name$data$features
iris_targets <- iris_by_name$data$targets
We can then view the first few rows of each data frame:
head(iris_features)
#> sepal length sepal width petal length petal width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
head(iris_targets)
#> class
#> 1 Iris-setosa
#> 2 Iris-setosa
#> 3 Iris-setosa
#> 4 Iris-setosa
#> 5 Iris-setosa
#> 6 Iris-setosa
Alternatively, you can also directly query by using an ID found by using
list_available_datasets()
or by looking up the dataset on the UCI ML
Repo website:
# Fetch a dataset by id
iris_by_id <- fetch_ucirepo(id = 53)
We can also view a list of data sets available for download using the
list_available_datasets()
function:
# List available datasets
list_available_datasets()
Note
Not all 600+ datasets on UCI ML Repo are available for download using the package. The current list of available datasets can be viewed here.
If you would like to see a specific dataset added, please submit a comment on an issue ticket in the upstream repository.