#############################
#. RG qualification report -#
#############################

This program takes the Crysol output parsed: it works with the files contained in the Rg folder: "-Rg-Dmax-mean.txt" and "rg.lits_ens" files are used. 
This scripts also use as an input the all_fasta_sequences file. The theorical Rg value describing the native folded behaivor was calculated using the Wilkins formula (Wilkins et al. 1999).
The theorical Rg value describing the intrinsically disordered behavior was calculated using the Wilkins formula (Wilkins et al. 1999). The lenght of the ensembles was calculated using the sum of all the chain lengths, calculated from their sequences. 
R0 and v values were 2.54 and 0.522 respectively.

In [187]:
#######################################################
##- Theorical Rg for intrinsically desorder protein -##
def RgIDP(N):
    """
    This function takes a sequence length and returns a the theorical Rg for native
    folding and returns the theorical Rg for IDP 
    """
    try:
        ## R0 value 2.54 and v value 0.522 ##
        RG = ((N)**0.52)*2.54
        return RG
    except:
        print('Something went wrong: check your data, only number lists are acepted!')

In [188]:
########################################
##- Theorical Rg for natively folded -##
def RgNF(N):
    """
    This function takes a sequence length and returns a the theorical Rg for native
    folding and returns the theorical Rg for the native folding 
    """
    try:
        import math
        A = math.sqrt(0.6)
        B = (N)**0.29
        RG = A*4.75*B
        return RG
    except:
        print('Something went wrong: check your data, only number lists are acepted!')

In [192]:
######################################################
##- Theorical RgNF and RgIDP for a list of lengths -##
def limits(list_length):
    """
    This function takes a list of sequence lengths (list of numbers) and returns 
    a list of theorical Rg for native
    folding and for IDP
    """
    try:
        #start calculations if everything goes well
        RgNFs = []
        RgIDPs = []
        for i in list_length:
            RgIDPs.append(RgIDP(i))
            RgNFs.append(RgNF(i))
        return RgIDPs,RgNFs
    except:
        print('Something went wrong: check your list, only number lists are acepted as input!')

In [9]:
#########################
#- importing libraries -#
import pandas as pd

##################
#- seting paths -#
path_main = '/home/anajulia/Be_project/Data'
path_data = path_main + '/PED-DB3'

In [10]:
####################
#- setting paths -#
df = pd.read_csv(path_main + '/general_paths_to_data', sep='\t')
PEDs = list(df)

In [186]:
###############################################
##--- getting the ensemble sum of lengths --##
list_sums = []
for i in PEDs:
    fasta_file = df[i].tolist()[1].replace("['","").replace("']","")
    lengths_sums = 0
    if fasta_file != '[]':
        with open(fasta_file,'r') as miarch:
            data = miarch.read().split('\n>')
            for j in data:
                lengths_sums = lengths_sums + len(j.split('\n')[1])
    list_sums.append(lengths_sums)

In [195]:
limits(list_sums)

([54.68193584534022,
  40.75929380327368,
  0.0,
  32.92841613551616,
  47.03964215687948,
  33.78664571030295,
  83.44297312296266,
  55.5304533071169,
  23.30709025549593,
  83.44297312296266,
  0.0],
 [20.37880668516606,
  17.29846638445664,
  0.0,
  15.357982552566176,
  18.73772660496374,
  15.57994626502974,
  25.795317628791203,
  20.554561981186154,
  12.665844619674566,
  25.795317628791203,
  0.0])