In [1]:
%%html
Experiments using altair, a library on top of vega-lite an abstraction over vega which generates d3 plots.
This is interesting because vega-lite is a javascript library. Sadly, this document is a bit of a post-mortem
because it turned out to not be expressive enough.

In [2]:


import altair as alt
import pandas as pd
db = pd.read_json('../frontend/db.json')[lambda row: row.tumor == 'lung']
db.head()

Unnamed: 0,tumor,cell,location,cell_full,expression,coef,lower,upper,p
26,lung,CD4,TUMOR,CD4_TUMOR,135.9,0.9517,0.5945,1.523,0.8366
27,lung,CD4_Treg,TUMOR,CD4_Treg_TUMOR,4.696,0.9658,0.5889,1.584,0.8904
28,lung,CD8,TUMOR,CD8_TUMOR,78.42,0.7335,0.4578,1.175,0.1975
29,lung,CD8_Treg,TUMOR,CD8_Treg_TUMOR,0.646,0.6706,0.3952,1.138,0.1385
30,lung,B_cells,TUMOR,B_cells_TUMOR,41.35,0.8494,0.5247,1.375,0.5066


In [3]:


%%html
It is possible to generate expression bar plots by having location on the x axis and facetting on the cell
type (`alt.Column` below):

In [4]:


alt.Chart(db).mark_bar().encode(
    x=alt.X('location', axis=None),
    y=alt.Y('expression', axis=alt.Axis(grid=False)),
    column=alt.Column('cell', header=alt.Header(labelOrient='bottom'), spacing=10),
    color='location'
).configure_view(strokeWidth=0)

In [5]:


%%html
Without `strokeWidth` it is more apparent that these are several plots side-by-side:

In [6]:


alt.Chart(db).mark_bar().encode(
    x=alt.X('location', axis=None),
    y=alt.Y('expression', axis=alt.Axis(grid=False)),
    column=alt.Column('cell', header=alt.Header(labelOrient='bottom'), spacing=10),
    color='location'
)

In [7]:


%%html
I bumped in to many problems. This is problem 1: It is not possble to draw grid lines on the y-axis between graphs.
The corresponding issue is <a href="https://github.com/vega/vega-lite/issues/4703">vega-lite#4703</a>
This is what happens when the grid is turned on:

In [8]:


alt.Chart(db).mark_bar().encode(
    x=alt.X('location', axis=None),
    y=alt.Y('expression', axis=alt.Axis(grid=True)),
    column=alt.Column('cell', header=alt.Header(labelOrient='bottom'), spacing=10),
    color='location'
).configure_view(strokeWidth=0)

In [9]:


%%html
However we want to facet on both cell type and tumor cohort. This can be done by setting
the `column` to cell_full. However it will use the same padding between the cohorts (or cells, whichever
is put innermost). So this is problem 2.

In [10]:


%%html
On to problem 3: we want to have cell type as colour and stripedness for location. Color is possible in this way:

In [11]:


alt.Chart(db).mark_bar().encode(
    x=alt.X('location', axis=None),
    y=alt.Y('expression', axis=alt.Axis(grid=False)),
    column=alt.Column('cell', header=alt.Header(labelOrient='bottom'), spacing=10),
    color='cell'
).configure_view(strokeWidth=0)

In [12]:


%%html
For stripedness we have to include a bit of SVG for setting the fill. Then we are stuck since
it overrides the color property. (Note that this is not rendered in github because it requires svg)

In [13]:


from IPython.display import HTML
HTML('''<svg height=0>
<defs>
  <pattern id="stripe" patternUnits="userSpaceOnUse" width="6" height="6">
    <path d="M-1,1 l2,-2
       M0,6 l6,-6
       M5,7 l2,-2" stroke="black" stroke-width="2"></path>
  </pattern>
</defs>
</svg>''')

In [14]:


alt.Chart(db).mark_bar().encode(
    x=alt.X('location', axis=None),
    y=alt.Y('expression', axis=alt.Axis(grid=False)),
    column=alt.Column('cell', header=alt.Header(labelOrient='bottom'), spacing=10),
    color='cell', # this is ignored
    fill={
        'field': 'cell_full',
        'type': 'nominal',
        'scale': {'range': ['url(#stripe)', '']}
    },
).configure_view(
    strokeWidth=0
).display(renderer='svg')