# Collagenase (Clostridium histolyticum) Digest of Collagen Capsules

## Sequence Data
### National Library of Medicine - National Center for Biotechnology Information 
https://www.ncbi.nlm.nih.gov/
1. Collagen, type I, alpha 2 [Bos taurus] (https://www.ncbi.nlm.nih.gov/protein/AAI49096.1)
2. Collagen, type I, alpha 1 [Bos taurus] (https://www.ncbi.nlm.nih.gov/protein/AAI05185.1)
3. Collagen, type III, alpha 1 [Bos taurus] (https://www.ncbi.nlm.nih.gov/protein/AAI23470.1)

In [1]:
# function to split sequence into fragments at the cut site
def cut(seq, cs):
    new_fragments = seq.replace(cs, cs[0] + ' ' + cs[1:])
    return new_fragments

# function to find the maximum fragment size
def find_max_length(name, seq, cts):
    answer = cut(seq, cts[0])
    for i in range(len(cts)):
        answer = cut(answer, cts[i])
        answer_list = answer.split(' ')
    res = sorted(answer_list, key=len, reverse=True)[0]
    print('{} post collagenase treatment:\n\tNumber of fragments: {}\n\tMaximum peptide length: {} amino acids'.format(name,len(answer_list), len(res)))
    return res

In [2]:
# amino acid sequence data
collagen_type_1_alpha_2 = 'mlsfvdtrtllllavtsclatcqslqeatarkgpsgdrgprgergppgppgrdgddgipgppgppgppgppglggnfaaqfdakgggpgpmglmgprgppgasgapgpfqgppgepgepgqtgpagargppgppgkagedghpgkpgrpgergvvgpqgargfpgtpglpgfkgirghngldglkgqpgapgvkgepgapgengtpgqtgarglpgergrvgapgpagargsdgsvgpvgpagpigsagppgfpgapgpkgelgpvgnpgpagpagprgevglpglsgpvgppgnpganglpgakgaaglpgvagapglpgprgipgpvgaagatgarglvgepgpagskgesgnkgepgavgqpgppgpsgeegkrgstgeigpagppgppglrgnpgsrglpgadgragvmgpagsrgatgpagvrgpngdsgrpgepglmgprgfpgspgnigpagkegpvglpgidgrpgpigpagargepgnigfpgpkgpsgdpgkagekghaglagargapgpdgnngaqgppglqgvqggkgeqgpagppgfqglpgpagtageagkpgergipgefglpgpagargergppgesgaagptgpigsrgpsgppgpdgnkgepgvvgapgtagpsgpsglpgergaagipggkgekgetglrgdigspgrdgargapgaigapgpagangdrgeagpagpagpagprgspgergevgpagpngfagpagaagqpgakgergtkgpkgengpvgptgpvgaagpsgpngppgpagsrgdggppgatgfpgaagrtgppgpsgisgppgppgpagkeglrgprgdqgpvgrsgetgasgppgfvgekgpsgepgtagppgtpgpqgllgapgflglpgsrgerglpgvagsvgepgplgiagppgargppgnvgnpgvngapgeagrdgnpgndgppgrdgqpghkgergypgnagpvgaagapgpqgpvgpvgkhgnrgepgpagavgpagavgprgpsgpqgirgdkgepgdkgprglpglkghnglqglpglaghhgdqgapgavgpagprgpagpsgpagkdgrigqpgavgpagirgsqgsqgpagppgppgppgppgpsgggyefgfdgdfyradqprsptslrpkdyevdatlkslnnqietlltpegsrknpartcrdlrlshpewssgyywidpnqgctmdaikvycdfstgetciraqpedipvknwyrnskakkhvwvgetinggtqfeynvegvttkematqlafmrllanhasqnityhcknsiaymdeetgnlkkavilqgsndvelvaegnsrftytvlvdgcskktnewqktiieyktnkpsrlpildiapldiggadqeirlnigpvcfk'
collagen_type_1_alpha_1 = 'mfsfvdlrlllllaatallthgqeegqeegqeedippvtcvqnglryhdrdvwkpvpcqivcdngnvlcddvicdelkdcpnakvptdeccpvcpegqesptdqettgvegpkgdtgprgprgpagppgrdgipgqpglpgppgppgppgppglggnfapqlsygydekstgisvpgpmgpsgprglpgppgapgpqgfqgppgepgepgasgpmgprgppgppgkngddgeagkpgrpgergppgpqgarglpgtaglpgmkghrgfsgldgakgdagpagpkgepgspgengapgqmgprglpgergrpgapgpagargndgatgaagppgptgpagppgfpgavgakgeggpqgprgsegpqgvrgepgppgpagaagpagnpgadgqpgakgangapgiagapgfpgargpsgpqgpsgppgpkgnsgepgapgskgdtgakgepgptgiqgppgpageegkrgargepgpaglpgppgerggpgsrgfpgadgvagpkgpagergapgpagpkgspgeagrpgeaglpgakgltgspgspgpdgktgppgpagqdgrpgppgppgargqagvmgfpgpkgaagepgkagergvpgppgavgpagkdgeagaqgppgpagpagergeqgpagspgfqglpgpagppgeagkpgeqgvpgdlgapgpsgargergfpgergvqgppgpagprgangapgndgakgdagapgapgsqgapglqgmpgergaaglpgpkgdrgdagpkgadgapgkdgvrgltgpigppgpagapgdkgeagpsgpagptgargapgdrgepgppgpagfagppgadgqpgakgepgdagakgdagppgpagpagppgpignvgapgpkgargsagppgatgfpgaagrvgppgpsgnagppgppgpagkegskgprgetgpagrpgevgppgppgpagekgapgadgpagapgtpgpqgiagqrgvvglpgqrgergfpglpgpsgepgkqgpsgasgergppgpmgppglagppgesgregapgaegspgrdgspgakgdrgetgpagppgapgapgapgpvgpagksgdrgetgpagpagpigpvgargpagpqgprgdkgetgeqgdrgikghrgfsglqgppgppgspgeqgpsgasgpagprgppgsagspgkdglnglpgpigppgprgrtgdagpagppgppgppgppgppsggydlsflpqppqekahdggryyraddanvvrdrdlevdttlkslsqqienirspegsrknpartcrdlkmchsdwksgeywidpnqgcnldaikvfcnmetgetcvyptqpsvaqknwyisknpkekrhvwygesmtggfqfeyggqgsdpadvaiqltflrlmsteasqnityhcknsvaymdqqtgnlkkalllqgsneieiraegnsrftysvtydgctshtgawgktvieykttktsrlpiidvapldvgapdqefgfdvgpacfl'
collagen_type_3_alpha_1 = 'mmsfvqkgtwllfallhptvilaqqeavdggcshlgqsyadrdvwkpepcqicvcdsgsvlcddiicddqeldcpnpeipfgeccavcpqpptaptrppngqgpqgpkgdpgppgipgrngdpgppgspgspgspgppgicescptggqnyspqyeaydvksgvagggiagypgpagppgppgppgtsghpgapgapgyqgppgepgqagpagppgppgaigpsgpagkdgesgrpgrpgergfpgppgmkgpagmpgfpgmkghrgfdgrngekgetgapglkgengvpgengapgpmgprgapgergrpglpgaagargndgargsdgqpgppgppgtagfpgspgakgevgpagspgssgapgqrgepgpqghagapgppgppgsngspggkgemgpagipgapgligargppgppgtngvpgqrgaagepgkngakgdpgprgergeagspgiagpkgedgkdgspgepganglpgaagergvpgfrgpaganglpgekgppgdrggpgpagprgvagepgrdglpggpglrgipgspggpgsdgkpgppgsqgetgrpgppgspgprgqpgvmgfpgpkgndgapgkngerggpggpgpqgpagkngetgpqgppgptgpsgdkgdtgppgpqglqglpgtsgppgengkpgepgpkgeagapgipggkgdsgapgergppgaggppgprggagppgpeggkgaagppgppgsagtpglqgmpgerggpggpgpkgdkgepgssgvdgapgkdgprgptgpigppgpagqpgdkgesgapgvpgiagprggpgergeqgppgpagfpgapgqngepgakgergapgekgeggppgaagpaggsgpagppgpqgvkgergspggpgaagfpggrgppgppgsngnpgppgssgapgkdgppgppgsngapgspgisgpkgdsgppgergapgpqgppgapgplgiagltgarglagppgmpgargspgpqgikgengkpgpsgqngergppgpqglpglagtagepgrdgnpgsdglpgrdgapgakgdrgengspgapgapghpgppgpvgpagksgdrgetgpagpsgapgpagsrgppgpqgprgdkgetgergamgikghrgfpgnpgapgspgpaghqgavgspgpagprgpvgpsgppgkdgasghpgpigppgprgnrgergsegspghpgqpgppgppgapgpccgaggvaaiagvgaekaggfapyygdepidfkintdeimtslksvngqieslispdgsrknparncrdlkfchpelqsgeywvdpnqgckldaikvycnmetgetcisaspltipqknwwtdsgaekkhvwfgesmeggfqfsygnpelpedvldvqlaflrllssrasqnityhcknsiaymdhasgnvkkalklmgsnegefkaegnskftytvledgctkhtgewgktvfqyqtrkavrlpivdiapydiggpdqefgadigpvcfl'
cutsites = ['psgp','ptgp','pygp','pngp','pcgp','pqgp', 'pgsp', 'pgtp', 'pgyp', 'pgnp', 'pgcp', 'pgqp']

In [3]:
# Collagen type 1 alpha 1 analysis
find_max_length('Collagen I (alpha 1)', collagen_type_1_alpha_1, cutsites)

Collagen I (alpha 1) post collagenase treatment:
	Number of fragments: 14
	Maximum peptide length: 341 amino acids


'gspgeqgpsgasgpagprgppgsagspgkdglnglpgpigppgprgrtgdagpagppgppgppgppgppsggydlsflpqppqekahdggryyraddanvvrdrdlevdttlkslsqqienirspegsrknpartcrdlkmchsdwksgeywidpnqgcnldaikvfcnmetgetcvyptqpsvaqknwyisknpkekrhvwygesmtggfqfeyggqgsdpadvaiqltflrlmsteasqnityhcknsvaymdqqtgnlkkalllqgsneieiraegnsrftysvtydgctshtgawgktvieykttktsrlpiidvapldvgapdqefgfdvgpacfl'

In [4]:
# Collagen type 1 alpha 2 analysis
find_max_length('Collagen I (alpha 2)',collagen_type_1_alpha_2, cutsites)

Collagen I (alpha 2) post collagenase treatment:
	Number of fragments: 14
	Maximum peptide length: 308 amino acids


'sgpagkdgrigqpgavgpagirgsqgsqgpagppgppgppgppgpsgggyefgfdgdfyradqprsptslrpkdyevdatlkslnnqietlltpegsrknpartcrdlrlshpewssgyywidpnqgctmdaikvycdfstgetciraqpedipvknwyrnskakkhvwvgetinggtqfeynvegvttkematqlafmrllanhasqnityhcknsiaymdeetgnlkkavilqgsndvelvaegnsrftytvlvdgcskktnewqktiieyktnkpsrlpildiapldiggadqeirlnigpvcfk'

In [5]:
# Collagen type 3 alpha 1 analysis
find_max_length('Collagen III (alpha 1)',collagen_type_3_alpha_1, cutsites)

Collagen III (alpha 1) post collagenase treatment:
	Number of fragments: 19
	Maximum peptide length: 285 amino acids


'gqpgppgppgapgpccgaggvaaiagvgaekaggfapyygdepidfkintdeimtslksvngqieslispdgsrknparncrdlkfchpelqsgeywvdpnqgckldaikvycnmetgetcisaspltipqknwwtdsgaekkhvwfgesmeggfqfsygnpelpedvldvqlaflrllssrasqnityhcknsiaymdhasgnvkkalklmgsnegefkaegnskftytvledgctkhtgewgktvfqyqtrkavrlpivdiapydiggpdqefgadigpvcfl'