# Debugging pandas attribute access

Currently having issues since each entry in the sample.tsv file does not have a unique ID. This is because
as of now I am expecting two file per sample, one with forward and one with the reverse strand. To overcome this
I have been using `sample_name` and `strand` attributes to uniquely ID samples. Here experimenting with better
options.


In [1]:
import pandas as pd
# read example samples.tsv
df = pd.read_table('/home/ethollem/projects/metaploter/runs/RNAss_hg19_genes/samples.tsv').set_index('sample_name', drop=False)

In [2]:
df

Unnamed: 0_level_0,sample_name,filepath,strand,operation,null_val
sample_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ssR749H,ssR749H,data/RNAss/R749H_structure_score_pos_non_uniqu...,fwd,mean,0
ssR749H,ssR749H,data/RNAss/R749H_structure_score_neg_non_uniqu...,rev,mean,0
ssWT,ssWT,data/RNAss/WT_structure_score_pos_non_unique.b...,fwd,mean,0
ssWT,ssWT,data/RNAss/WT_structure_score_neg_non_unique.b...,rev,mean,0


Current method

In [3]:
df.loc[df['sample_name'] == 'ssWT']

Unnamed: 0_level_0,sample_name,filepath,strand,operation,null_val
sample_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ssWT,ssWT,data/RNAss/WT_structure_score_pos_non_unique.b...,fwd,mean,0
ssWT,ssWT,data/RNAss/WT_structure_score_neg_non_unique.b...,rev,mean,0


Could create a new ID column

In [4]:
df["id"] = df["sample_name"] + df["strand"]

In [5]:
df

Unnamed: 0_level_0,sample_name,filepath,strand,operation,null_val,id
sample_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ssR749H,ssR749H,data/RNAss/R749H_structure_score_pos_non_uniqu...,fwd,mean,0,ssR749Hfwd
ssR749H,ssR749H,data/RNAss/R749H_structure_score_neg_non_uniqu...,rev,mean,0,ssR749Hrev
ssWT,ssWT,data/RNAss/WT_structure_score_pos_non_unique.b...,fwd,mean,0,ssWTfwd
ssWT,ssWT,data/RNAss/WT_structure_score_neg_non_unique.b...,rev,mean,0,ssWTrev


In [6]:
# set index to id column
df_id = df.set_index('id')
df_id

Unnamed: 0_level_0,sample_name,filepath,strand,operation,null_val
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ssR749Hfwd,ssR749H,data/RNAss/R749H_structure_score_pos_non_uniqu...,fwd,mean,0
ssR749Hrev,ssR749H,data/RNAss/R749H_structure_score_neg_non_uniqu...,rev,mean,0
ssWTfwd,ssWT,data/RNAss/WT_structure_score_pos_non_unique.b...,fwd,mean,0
ssWTrev,ssWT,data/RNAss/WT_structure_score_neg_non_unique.b...,rev,mean,0


In [12]:
df_id.loc['ssWTfwd', 'strand']

'fwd'

Using above method can get the string directly without having to deal with weird pandas datatypes