A naive, roundabout way to save an R data frame to a Tableau Data Extract using Python
About this repo
Tableau is a huge cog in my data analysis toolkit, so to say I was overcome with joy when they released an API would be an understatement. However, I was pretty annoyed that they didn't think about the R community.
R is a statistical programming language. In my workflow, I find it very easy to do
almost everything I need entirely within
R. In theory, my ideal workflow would be:
python, but more on this later) to collect and clean the data
- Save my data to Tableau so I can explore it interactively (and easily)
- Model my data
- Save the modeled data to Tableau so I can create dashboard reports
- Deploy my models to production (or whatever that means)
To be fair, Tableau has released both
python versions of their API. For those way better
at programming than myself, it appears that it should be possible to build an
R package to interface with the API.
In my head, this is what a some
R code might look like if an
R package were available:
library(tableauR) library(RODBC) # create connection to my datastore ch = odbcConnect("DSN", "USER", "PWD") # get the data df = sqlQuery(ch, "SELECT * FROM TABLE") # basic regression mod = glm(x ~ y, data=df, family=binomial()) df = transform(df, pred = predict(mod, newdata=df, type="response") # save the scored data to a Tableau Data Extract df2TDE(df, file="r-df.tde")
This simple interface isnt available at the moment, but this leads me to the purpose of this repo.
I saw this PR story and figured I could hack something together pretty quickly along the same lines by:
- modeling my data in R
- save the dataframe as an
Rpy2from within python to read the
- convert it to a
- use the pandas dataframe to pseudo-intelligently build the
Because I am trying to get my python skills on par with R, I took this as opportunity to show a trivial example
of how we could use R to model/score our data (using the
R Script), and leverage the Tableau Python API
to create a Data Extract.
pandas libary, which aims to be a (superior) python-equivalent to the
data.frame is pretty awesome
and appears to have a growing development community.
ipython notebook was never meant to demonstrate effectient python code, but simply aims
to be a proof-of-concept for the Python and Tableau communities alike. In theory, we don't really need to use
R; there are quite a few of examples as to how we could entirely clean and
model our data entirely in python.
To be honest, the only
thing holding me back right now from diving in head first to
python is the difficulty in connecting to my
Oracle database at work.
ONE LAST THING! Did I mention that the API appears to only work on Windows machines?!?!
This is because the API seemingly requires the
.dll files that are found in Windows Applications!
I am not sure that Tableau realizes that the majority of the developer community doesn't have access to, or hates, Windows development environments.