# Import an R dataset in Python
> Import an R dataset in Python.

- toc: true 
- badges: true
- comments: true
- categories: [jupyter]
- image: images/chart-preview.png

I recently discovered the [Rdatasets project](https://vincentarelbundock.github.io/Rdatasets/) which gives access to the datasets available in R's core datasets package and many other common R packages. There is a nice integration with [The Datasets Package](https://www.statsmodels.org/devel/datasets/index.html#datasets-index--page-root), which is the place where I originally discovered the [Rdatasets project](https://vincentarelbundock.github.io/Rdatasets/) (see [here](https://www.statsmodels.org/devel/datasets/index.html#using-datasets-from-r)). This is particularly revelant to people like me working on Survival Analysis where R is still predominant.

The list of available datasets can be found [here](https://vincentarelbundock.github.io/Rdatasets/articles/data.html) and instructions to add new R datasets can be found [here](https://vincentarelbundock.github.io/Rdatasets/#adding-data). I have made requests to add new datasets from [`asaur`](https://github.com/vincentarelbundock/Rdatasets/issues/13) and [`mstate`](https://github.com/vincentarelbundock/Rdatasets/issues/14) packages and the maintener, [Professor Vincent Arel-Bundock](http://arelbundock.com/), has responded very quickly (the datasets have been already added).

From the [list of available datasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html), we need to seach what's the dataset name (item name) and from which R package it's coming from.

For example, let's say we want to import the [pharmacoSmoking](https://vincentarelbundock.github.io/Rdatasets/doc/asaur/pharmacoSmoking.html) dataset from the [`asaur`](https://cran.r-project.org/web/packages/asaur/index.html) package. 

We will use [The Datasets Package](https://www.statsmodels.org/devel/datasets/index.html#datasets-index--page-root) from [statsmodels](https://www.statsmodels.org/stable/).

In [1]:
import statsmodels.api as sm

Then we need to use the function `get_rdataset` with the name of the item and the name of the package it's coming from.

In [2]:
pharmacoSmoking = sm.datasets.get_rdataset("pharmacoSmoking", "asaur")

We can take a look at the documentation with the `__doc__` attribute.

In [3]:
print(pharmacoSmoking.__doc__)

pharmacoSmoking R Documentation

pharmacoSmoking
---------------

Description
~~~~~~~~~~~

Randomized trial of triple therapy vs. patch for smoking cessation.

Usage
~~~~~

::

   data("pharmacoSmoking")

Format
~~~~~~

A data frame with 125 observations on the following 14 variables.

``id``
   patient ID number

``ttr``
   Time in days until relapse

``relapse``
   Indicator of relapse (return to smoking)

``grp``
   Randomly assigned treatment group with levels ``combination`` or
   ``patchOnly``

``age``
   Age in years at time of randomization

``gender``
   ``Female`` or ``Male``

``race``
   ``black``, ``hispanic``, ``white``, or ``other``

``employment``
   ``ft`` (full-time), ``pt`` (part-time), or ``other``

``yearsSmoking``
   Number of years the patient had been a smoker

``levelSmoking``
   ``heavy`` or ``light``

``ageGroup2``
   Age group with levels ``21-49`` or ``50+``

``ageGroup4``
   Age group with levels ``21-34``, ``35-49``, ``50-64``, or ``65+``

``priorAttempts`

And access the data with the `data` attribute.

In [4]:
pharmacoSmoking.data.head()

Unnamed: 0,id,ttr,relapse,grp,age,gender,race,employment,yearsSmoking,levelSmoking,ageGroup2,ageGroup4,priorAttempts,longestNoSmoke
0,21,182,0,patchOnly,36,Male,white,ft,26,heavy,21-49,35-49,0,0
1,113,14,1,patchOnly,41,Male,white,other,27,heavy,21-49,35-49,3,90
2,39,5,1,combination,25,Female,white,other,12,heavy,21-49,21-34,3,21
3,80,16,1,combination,54,Male,white,ft,39,heavy,50+,50-64,0,0
4,87,0,1,combination,45,Male,white,other,30,heavy,21-49,35-49,0,0
