# Example script to predict with TBD package

### Import packages

In [1]:
import pandas as pd
import tbd.predict

### Inputs

The input sequences can be put in a list or a dataframe. Each item in the list or each row in the dataframe represents one protein sequence, and each sequence should contain only acceptable amino acid one-letter codes and be greater than 40 characters.

In [2]:
input_data = ["LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES",
             "MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKSRVESALNPIDLTVLAEYHKQIESNLQRIERKNQTWYSKPGER",
             "MDAQTRRRERRAEKQAQWKAAN"]

or

In [3]:
import pandas as pd
input_data = pd.DataFrame(
    {'sequence': 
     ["LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES",
      "MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKSRVESALNPIDLTVLAEYHKQIESNLQRIERKNQTWYSKPGER",
      "MDAQTRRRERRAEKQAQWKAAN"]}
)
input_data

Unnamed: 0,sequence
0,LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES
1,MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKS...
2,MDAQTRRRERRAEKQAQWKAAN


### Use `check_data` to check whether the input protein sequences are valid. If the sequences are good to go, this should raise no errors.

In [4]:
tbd.utils.check_data(input_data)

The sequence LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES is less than the specified length limit 40
The sequence MDAQTRRRERRAEKQAQWKAAN is less than the specified length limit 40


### Use `predict_protein` to predict whether the inputted sequences are intrisically disordered or ordered. A pretrained model will be used unless a new trained model is specified.

In [5]:
df_pred = tbd.predict.predict_protein(input_data)

### The results will show the probability of being classified as ordered or disordered for each sub-sequence. Its original sequence will be displayed as well.

In [6]:
df_pred

Unnamed: 0,parent_sequence,sequence,prob_disordered,prob_ordered
0,MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKS...,MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSL,0.606321,0.393679
1,MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKS...,RAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKSRVES,0.693179,0.306821
2,MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKS...,ANPLLVGVSAKPVNRPILSLNRKPKSRVESALNPIDLTVL,0.301349,0.698651
3,MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKS...,KPVNRPILSLNRKPKSRVESALNPIDLTVLAEYHKQIESN,0.571889,0.428111
4,MDAQTRRRERRAEKQAQWKAANPLLVGVSAKPVNRPILSLNRKPKS...,NRKPKSRVESALNPIDLTVLAEYHKQIESNLQRIERKNQT,0.378113,0.621887


In [1]:
df_pred.to_csv('saved_predictions.csv',header=True,index=False)