# Collagenase (Clostridium histolyticum) Digest of Collagen (Bovine)
This project analyzes collagen peptide sequences post collagenase treatment. 
#### Sources
1. National Library of Medicine - National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/)<br>
    1. Collagen, type I, alpha 2 [Bos taurus] (https://www.ncbi.nlm.nih.gov/protein/AAI49096.1)
    2. Collagen, type I, alpha 1 [Bos taurus] (https://www.ncbi.nlm.nih.gov/protein/AAI05185.1)
    3. Collagen, type III, alpha 1 [Bos taurus] (https://www.ncbi.nlm.nih.gov/protein/AAI23470.1)
2. Sigma Aldrich (https://www.sigmaaldrich.com/US/en)
    1. Collagenase from Clostridium histolyticum (https://www.sigmaaldrich.com/US/en/product/sigma/c0130)
    2. See bottom of page for product information.

In [1]:
# function to split sequence into fragments at the cut site
def cut(seq, cs): # sec: sequence; cs: cut site
    peptides = seq.replace(cs, cs[0] + ' ' + cs[1:])
    
    return peptides

# function to find the maximum peptide, length, and number of unique species.
def find_max_length(name, seq, cts): # parameters: [name] for legibility, [seq]uence, and list [cts] of cutsites
    answer = cut(seq, cts[0]) # initialize first cut and variable for data mutability
    for i in range(len(cts)): # loop to call cut function for every cutsite
        answer = cut(answer, cts[i]) # recursive function call to continue cutting sequences 
        answer_list = answer.split(' ') # new variable created to store peptides in a list
    res = sorted(answer_list, key=len, reverse=True)[0] # sort list from longest to shortest fragment size
    print('{} post collagenase treatment:\n\tNumber of fragments: {}\n\tMaximum peptide length: {} amino \
acids'.format(name,len(answer_list), len(res))) # output for legibility
    
    return res # longest fragment sequence

In [2]:
# amino acid sequences

# See Sources 1, A
collagen_type_1_alpha_2 = 'mlsfvdtrtllllavtsclatcqslqeatarkgpsgdrgprgergppgppgrdgddgipgppgppgppgppglggnfa\
aqfdakgggpgpmglmgprgppgasgapgpfqgppgepgepgqtgpagargppgppgkagedghpgkpgrpgergvvgpqgargfpgtpglpgfkgirghngldg\
lkgqpgapgvkgepgapgengtpgqtgarglpgergrvgapgpagargsdgsvgpvgpagpigsagppgfpgapgpkgelgpvgnpgpagpagprgevglpglsg\
pvgppgnpganglpgakgaaglpgvagapglpgprgipgpvgaagatgarglvgepgpagskgesgnkgepgavgqpgppgpsgeegkrgstgeigpagppgppg\
lrgnpgsrglpgadgragvmgpagsrgatgpagvrgpngdsgrpgepglmgprgfpgspgnigpagkegpvglpgidgrpgpigpagargepgnigfpgpkgpsg\
dpgkagekghaglagargapgpdgnngaqgppglqgvqggkgeqgpagppgfqglpgpagtageagkpgergipgefglpgpagargergppgesgaagptgpig\
srgpsgppgpdgnkgepgvvgapgtagpsgpsglpgergaagipggkgekgetglrgdigspgrdgargapgaigapgpagangdrgeagpagpagpagprgspg\
ergevgpagpngfagpagaagqpgakgergtkgpkgengpvgptgpvgaagpsgpngppgpagsrgdggppgatgfpgaagrtgppgpsgisgppgppgpagkeg\
lrgprgdqgpvgrsgetgasgppgfvgekgpsgepgtagppgtpgpqgllgapgflglpgsrgerglpgvagsvgepgplgiagppgargppgnvgnpgvngapg\
eagrdgnpgndgppgrdgqpghkgergypgnagpvgaagapgpqgpvgpvgkhgnrgepgpagavgpagavgprgpsgpqgirgdkgepgdkgprglpglkghng\
lqglpglaghhgdqgapgavgpagprgpagpsgpagkdgrigqpgavgpagirgsqgsqgpagppgppgppgppgpsgggyefgfdgdfyradqprsptslrpkd\
yevdatlkslnnqietlltpegsrknpartcrdlrlshpewssgyywidpnqgctmdaikvycdfstgetciraqpedipvknwyrnskakkhvwvgetinggtq\
feynvegvttkematqlafmrllanhasqnityhcknsiaymdeetgnlkkavilqgsndvelvaegnsrftytvlvdgcskktnewqktiieyktnkpsrlpil\
diapldiggadqeirlnigpvcfk'

# See Sources 1, B
collagen_type_1_alpha_1 = 'mfsfvdlrlllllaatallthgqeegqeegqeedippvtcvqnglryhdrdvwkpvpcqivcdngnvlcddvicdelk\
dcpnakvptdeccpvcpegqesptdqettgvegpkgdtgprgprgpagppgrdgipgqpglpgppgppgppgppglggnfapqlsygydekstgisvpgpmgpsg\
prglpgppgapgpqgfqgppgepgepgasgpmgprgppgppgkngddgeagkpgrpgergppgpqgarglpgtaglpgmkghrgfsgldgakgdagpagpkgepg\
spgengapgqmgprglpgergrpgapgpagargndgatgaagppgptgpagppgfpgavgakgeggpqgprgsegpqgvrgepgppgpagaagpagnpgadgqpg\
akgangapgiagapgfpgargpsgpqgpsgppgpkgnsgepgapgskgdtgakgepgptgiqgppgpageegkrgargepgpaglpgppgerggpgsrgfpgadg\
vagpkgpagergapgpagpkgspgeagrpgeaglpgakgltgspgspgpdgktgppgpagqdgrpgppgppgargqagvmgfpgpkgaagepgkagergvpgppg\
avgpagkdgeagaqgppgpagpagergeqgpagspgfqglpgpagppgeagkpgeqgvpgdlgapgpsgargergfpgergvqgppgpagprgangapgndgakg\
dagapgapgsqgapglqgmpgergaaglpgpkgdrgdagpkgadgapgkdgvrgltgpigppgpagapgdkgeagpsgpagptgargapgdrgepgppgpagfag\
ppgadgqpgakgepgdagakgdagppgpagpagppgpignvgapgpkgargsagppgatgfpgaagrvgppgpsgnagppgppgpagkegskgprgetgpagrpg\
evgppgppgpagekgapgadgpagapgtpgpqgiagqrgvvglpgqrgergfpglpgpsgepgkqgpsgasgergppgpmgppglagppgesgregapgaegspg\
rdgspgakgdrgetgpagppgapgapgapgpvgpagksgdrgetgpagpagpigpvgargpagpqgprgdkgetgeqgdrgikghrgfsglqgppgppgspgeqg\
psgasgpagprgppgsagspgkdglnglpgpigppgprgrtgdagpagppgppgppgppgppsggydlsflpqppqekahdggryyraddanvvrdrdlevdttl\
kslsqqienirspegsrknpartcrdlkmchsdwksgeywidpnqgcnldaikvfcnmetgetcvyptqpsvaqknwyisknpkekrhvwygesmtggfqfeygg\
qgsdpadvaiqltflrlmsteasqnityhcknsvaymdqqtgnlkkalllqgsneieiraegnsrftysvtydgctshtgawgktvieykttktsrlpiidvapl\
dvgapdqefgfdvgpacfl'

# See Sources 1, C
collagen_type_3_alpha_1 = 'mmsfvqkgtwllfallhptvilaqqeavdggcshlgqsyadrdvwkpepcqicvcdsgsvlcddiicddqeldcpnpe\
ipfgeccavcpqpptaptrppngqgpqgpkgdpgppgipgrngdpgppgspgspgspgppgicescptggqnyspqyeaydvksgvagggiagypgpagppgppg\
ppgtsghpgapgapgyqgppgepgqagpagppgppgaigpsgpagkdgesgrpgrpgergfpgppgmkgpagmpgfpgmkghrgfdgrngekgetgapglkgeng\
vpgengapgpmgprgapgergrpglpgaagargndgargsdgqpgppgppgtagfpgspgakgevgpagspgssgapgqrgepgpqghagapgppgppgsngspg\
gkgemgpagipgapgligargppgppgtngvpgqrgaagepgkngakgdpgprgergeagspgiagpkgedgkdgspgepganglpgaagergvpgfrgpagang\
lpgekgppgdrggpgpagprgvagepgrdglpggpglrgipgspggpgsdgkpgppgsqgetgrpgppgspgprgqpgvmgfpgpkgndgapgkngerggpggpg\
pqgpagkngetgpqgppgptgpsgdkgdtgppgpqglqglpgtsgppgengkpgepgpkgeagapgipggkgdsgapgergppgaggppgprggagppgpeggkg\
aagppgppgsagtpglqgmpgerggpggpgpkgdkgepgssgvdgapgkdgprgptgpigppgpagqpgdkgesgapgvpgiagprggpgergeqgppgpagfpg\
apgqngepgakgergapgekgeggppgaagpaggsgpagppgpqgvkgergspggpgaagfpggrgppgppgsngnpgppgssgapgkdgppgppgsngapgspg\
isgpkgdsgppgergapgpqgppgapgplgiagltgarglagppgmpgargspgpqgikgengkpgpsgqngergppgpqglpglagtagepgrdgnpgsdglpg\
rdgapgakgdrgengspgapgapghpgppgpvgpagksgdrgetgpagpsgapgpagsrgppgpqgprgdkgetgergamgikghrgfpgnpgapgspgpaghqg\
avgspgpagprgpvgpsgppgkdgasghpgpigppgprgnrgergsegspghpgqpgppgppgapgpccgaggvaaiagvgaekaggfapyygdepidfkintde\
imtslksvngqieslispdgsrknparncrdlkfchpelqsgeywvdpnqgckldaikvycnmetgetcisaspltipqknwwtdsgaekkhvwfgesmeggfqf\
sygnpelpedvldvqlaflrllssrasqnityhcknsiaymdhasgnvkkalklmgsnegefkaegnskftytvledgctkhtgewgktvfqyqtrkavrlpivd\
iapydiggpdqefgadigpvcfl'

# See Sources 2, B
# assumed neutral residues: serine (s), threonine (t), tyrosine (y), asparagine (n), cysteine (c), 
#         glutamine (q), and histidine (h)
cutsites = ['psgp','ptgp','pygp','pngp','pcgp','pqgp', 'phgp', 'pgsp', 'pgtp', 'pgyp', 'pgnp', 'pgcp', \
            'pgqp', 'pghp']

In [3]:
# Collagen type 1 alpha 1 analysis
find_max_length('Collagen I (alpha 1)', collagen_type_1_alpha_1, cutsites)

Collagen I (alpha 1) post collagenase treatment:
	Number of fragments: 14
	Maximum peptide length: 341 amino acids


'gspgeqgpsgasgpagprgppgsagspgkdglnglpgpigppgprgrtgdagpagppgppgppgppgppsggydlsflpqppqekahdggryyraddanvvrdrdlevdttlkslsqqienirspegsrknpartcrdlkmchsdwksgeywidpnqgcnldaikvfcnmetgetcvyptqpsvaqknwyisknpkekrhvwygesmtggfqfeyggqgsdpadvaiqltflrlmsteasqnityhcknsvaymdqqtgnlkkalllqgsneieiraegnsrftysvtydgctshtgawgktvieykttktsrlpiidvapldvgapdqefgfdvgpacfl'

In [4]:
# Collagen type 1 alpha 2 analysis
find_max_length('Collagen I (alpha 2)',collagen_type_1_alpha_2, cutsites)

Collagen I (alpha 2) post collagenase treatment:
	Number of fragments: 14
	Maximum peptide length: 308 amino acids


'sgpagkdgrigqpgavgpagirgsqgsqgpagppgppgppgppgpsgggyefgfdgdfyradqprsptslrpkdyevdatlkslnnqietlltpegsrknpartcrdlrlshpewssgyywidpnqgctmdaikvycdfstgetciraqpedipvknwyrnskakkhvwvgetinggtqfeynvegvttkematqlafmrllanhasqnityhcknsiaymdeetgnlkkavilqgsndvelvaegnsrftytvlvdgcskktnewqktiieyktnkpsrlpildiapldiggadqeirlnigpvcfk'

In [5]:
# Collagen type 3 alpha 1 analysis
find_max_length('Collagen III (alpha 1)',collagen_type_3_alpha_1, cutsites)

Collagen III (alpha 1) post collagenase treatment:
	Number of fragments: 21
	Maximum peptide length: 285 amino acids


'gqpgppgppgapgpccgaggvaaiagvgaekaggfapyygdepidfkintdeimtslksvngqieslispdgsrknparncrdlkfchpelqsgeywvdpnqgckldaikvycnmetgetcisaspltipqknwwtdsgaekkhvwfgesmeggfqfsygnpelpedvldvqlaflrllssrasqnityhcknsiaymdhasgnvkkalklmgsnegefkaegnskftytvledgctkhtgewgktvfqyqtrkavrlpivdiapydiggpdqefgadigpvcfl'

#### Product Information
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)