# Example
Start by importing the libraries

In [2]:
import pandas as pd
from pySummarizedExperiment import pySummarizedExperiment

## Using predefined Pandas DataFrames
Using three dataframes, we can create a pySummarizedExperiment 

In [3]:
data = [[1,2,3,4], [5,6,7,8], [6, 7, 8, 9]]
columns = ["a", "b", "c", "d"]
rownames = ["FT1", "FT2", "FT3"]
data = pd.DataFrame(data, columns = columns, index = rownames)
rowData = pd.DataFrame([1,2,3], index = rownames, columns = ["mz"])
colData = pd.DataFrame([["QC", 1], ["QC", 2], ["BLANK", 3], ["BLANK", 4]],  index=columns, columns = ["Type", "Injection"])
assays = {"first_assay": data, "second_assay": data * 2}
exp = pySummarizedExperiment(assays = assays, columnData = colData, rowData = rowData)
exp

class: pySummarizedExperiment
dim: 3 4
metadata(0): {}
rownames(3): FT1, FT2, FT3
rowData names(1): mz
colnames(4): a, b, c, d
colData names(2): Type, Injection
assays(2): first_assay, second_assay

## From a long-format DataFrame
We can also create a pySummarizedExperiment using a single long-format dataframe. We need a rowIndex and colIndex to define how to create the rowData and Coldata. Next, the cardinality between columns is checked to define how columns should be assigned. First we create a long dataframe.

In [4]:
from random import randint
from random import seed

seed(42)
df = pd.DataFrame({
    "samples": ["a", "b", "c", "d"] * 5,
    "sample_day": [1, 4, 4, 7] * 5,
    "features": [1, 2, 3, 4, 5] * 4,
    "feature_polarity": ["pos", "neg", "neg", "pos", "pos"] * 4,
    "assay": [randint(0, 1000) for _ in range(20)],
    "assay_other": [_ for _ in range(20)]
})
display(df)

Unnamed: 0,samples,sample_day,features,feature_polarity,assay,assay_other
0,a,1,1,pos,654,0
1,b,4,2,neg,114,1
2,c,4,3,neg,25,2
3,d,7,4,pos,759,3
4,a,1,5,pos,281,4
5,b,4,1,pos,250,5
6,c,4,2,neg,228,6
7,d,7,3,neg,142,7
8,a,1,4,pos,754,8
9,b,4,5,pos,104,9


Next, we create a `pySummarizedExperiment` by setting the dataframe as the `longDf` parameter. Here we use the features column as `rowIndex` parameter and the samples column as `colIndex`. 

In [5]:
exp = pySummarizedExperiment(longDf=df, rowIndex = "features", colIndex="samples")
display(exp)

class: pySummarizedExperiment
dim: 5 4
metadata(0): []
rownames(5): 1, 2, 3, 4, 5
rowData names(1): feature_polarity
colnames(4): a, b, c, d
colData names(1): sample_day
assays(2): assay, assay_other

## Selecting data
pySummarizedExperiment implements the basic functionality of colData, rowData, and assay functions from R as methods of the object.

In [8]:
# colData
display(exp.colData())

# rowData
display(exp.rowData())

# assays
display(exp.assay())

Unnamed: 0_level_0,sample_day
samples,Unnamed: 1_level_1
a,1
b,4
c,4
d,7


Unnamed: 0_level_0,feature_polarity
features,Unnamed: 1_level_1
1,pos
2,neg
3,neg
4,pos
5,pos


samples,a,b,c,d
features,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,654,250,692,604
2,432,114,228,758
3,913,32,25,142
4,754,558,30,759
5,281,104,89,95


We can subset the pySummarizedExperiment similar to Pandas syntax:

In [10]:
# Take all rows and columns 'a' and 'b'. Next, take the assay "assay_other"
display(exp[:, ["a", "b"]].assay("assay_other"))

samples,a,b
features,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0,5
2,16,1
3,12,17
4,8,13
5,4,9
