# Examples on how to use LICHEN

First set the environment as describe in the README. Then store the model weights. I stored them in "/LICHEN/model/model_weights.pt".

In [1]:
# Load package
import pandas as pd
import os
from lichen import LICHEN

Now we load the lichen model. We can request a GPU or CPU and the number of CPUs.

In [2]:
# Load the model
lichen_model = LICHEN(f'{os.getcwd().strip('notebooks')}/model/model_weights.pt', cpu=True, ncpu=4) # change to locally stored model path

Using device CPU with 4 CPUs


## 1. Generating light sequences for one heavy sequence
Light sequences can be generated directly for a single heavy sequence using the light_generation function.  

This function takes the following **required** parameters as input:  
**input**: the heavy sequence  

And the following **optional** parameters:    
**germline_seed**: Type, V-gene family, or V-genes to use provided in a list, multiple are allowed (e.g. ['IGKV1', 'IGKV2'] or ['IGKV1', 'K']). When multiple provided a random chosen selected seed will be used.  
**custom_seed**: Custom seed to use. Provided as string.  
**cdrs**: Containing the CDRL1, CDRL2, and CDRL3 for additional information available. Provided as list of length three (e.g. if only CDRL3 known [None, None, 'QRYNRAPYT']).  
**numbering_scheme**: Numbering scheme CDR definition used when CDRs provided. Either 'IMGT' or 'Kabat'.  
**n**: Number of light sequences requested per heavy sequence.  
**filtering**: Filtering methods to apply. Provided in a list (e.g. ['ANARCII'])   
**verbose**: Enable verbose output.

In [3]:
# Generate 2 light sequence for the heavy sequence
lichen_model.light_generation('EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCARDVPGHGAAFMDVWGTGTTVTVSS', 
                              n=2)

['DIQMTQSPSTLSASVGDRVTITCRASERVDGYLAWYQHRPGRAPRLLISRVSRLHDGVPPRFTGRRSETDFSLTIDDLESEDSATYYCQHSRDSSSSSGRSVTFGPGTRLDIK',
 'DIQMTQSPSTLSASIGDRVTITCRASETVDDWVAWYQHRPGRVPKVLIHDTSRLHRGVPSRFRGRRSESQTEYTLTIRRLEPDDSATYFCQHHRDSSRYSSRFGGGTRVEIK']

In [None]:
# Genreate 2 light sequences either IGKV1- or IGKV3 for the heavy sequence
lichen_model.light_generation('EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCARDVPGHGAAFMDVWGTGTTVTVSS', 
                              n=2, 
                              germline_seed=['IGKV1', 'IGKV3'])

['DIQLTQSPSSLSASVGDRVTITCRASENIRDSLHWYQHKSGTAPRVLISSTFRLRDGVPSRFRGRRSETDFSLTITDLQSDDSGTYYCQQTHRSPQTFGQGTRVEMK',
 'DIQMTQSPSSVSASVGDRVTITCRASEGVRRSLVWYQHRPGRAPRLLVSDAFRLRGGVPSRFTGRRSETNFTLTISNLQPEDSATYYCQQSRRSPHSFGQGTKLEIK']

In [7]:
# Generate a lambda light sequence for the heavy seuquence
lichen_model.light_generation('EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCARDVPGHGAAFMDVWGTGTTVTVSS',
                              germline_seed=['L'])


['QPALTQPRSVSGSPGQSVTISCTGTRNDIGDFDSVSWYQHRPGTVPRVMIFEVRRRPSGVPDRFSGSKSGNTASLTISGLQTDDEGEYFCCSFTDRDRRMFGGGTELTVL']

In [9]:
# Generate 2 light sequences starting with 'VIWMTQ' for the heavy sequence
lichen_model.light_generation('EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCARDVPGHGAAFMDVWGTGTTVTVSS',
                              custom_seed='VIWMTQ')

['VIWMTQSPSLVSASTGDRVTISCRMSQDIGNSLAWYQHRPGRPPKLLISSTSNLHTGVPSRFSGSGSGTDFTLTISRLESEDSATYFCQQYSDSPRTFGPGTRVEVK']

In [None]:
# Generate a light sequences with 'QSIGSS' as CDRL1 and 'YAS' as CDRL2 using the IMGT definition for the heavy sequence
lichen_model.light_generation('EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCARDVPGHGAAFMDVWGTGTTVTVSS',
                              cdrs=['QSIGSS', 'YAS', None])

['EIVMTQSPVTLSVSPGERATLSCRASQSIGSSLAWYQHRPGQPPRLLMYYASSRATDTPDRFTGTGSGTDFTLTISSLESEDSAVYFCQHRRHSREATFGPGTRVDIK']

In [4]:
# Generate a light sequence with 'QQTRST' as CDRL3 using the Kabat CDR definition for the heavy sequence
lichen_model.light_generation('EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCARDVPGHGAAFMDVWGTGTTVTVSS',
                              cdrs=[None, None, 'QRYNRAPYT'],
                              numbering_scheme='Kabat')

['DIQMTQSPSSVSASVGDRVTITCRASQGVRRSLAWYQQRPGRAPKVLISAASRLHTGVPSRFSGSGSGTDFTLTITGLQPEDSATYFCQRYNRAPYTFGQGTKVEVK']

In [None]:
# Generate 2 light sequence which are non redundant and can all be numbered by ANARCII for the heavy sequence
lichen_model.light_generation('EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCARDVPGHGAAFMDVWGTGTTVTVSS',
                              filtering=['redundancy', 'ANARCII'],
                              n=2)

['DIVMTQSPDSLAVSLGERATINCKSSQNVFHRSSKRSHVGWYQHKPGQPPRLLISWASTRDSGVPDRFSGSGSGTDFTLTISNLQPEDVAVYFCHQHHNIPHTFGGGTRVEIK',
 'DIQMTQSPSSLSASVGDRVTITCRASQDINTHLAWYQQKPGRPPRSLISTTSRLHDGVPSKFTGSRSGTDFTLTISGLQPEDSATYFCQQYDRHPRTFGQGTKLEIK']

## 2. Generating light sequence for multiple heavy sequences

Light sequences can be generated for multiple heavy sequences using the **light_generation_bulk** function and providing the input data in a pandas dataframe. This dataframe should contain a column **"heavy"** containing the heavy sequences. Optional additional information can be passed in the columns **"germline_seed"**, **"custom_seed"**, **"cdrs"**, and **"filtering"** in the same format as indicated above.

The function takse the remaining parameters as input, i.e.:  
**input**: the pandas dataframe.  
**numbering_scheme**: Numbering scheme CDR definition used when CDRs provided. Either 'IMGT' or 'Kabat'.  
**n**: Number of light sequences requested per heavy sequence.  
**verbose**: Enable verbose output.

In [33]:
# Load the input dataset
# Note that columns "germline_seed" and "filtering" (and "cdrs") need to be writte in as lists
from ast import literal_eval
df_input = pd.read_csv('./example_data/easy_example.csv', 
                       converters={"germline_seed": literal_eval, "filtering": literal_eval}, 
                       index_col=0)
df_input

Unnamed: 0,heavy,germline_seed,filtering
0,EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLE...,[IGLV1-36],[ANARCII]
1,QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLE...,[IGKV1-39],[ANARCII]


In [26]:
# Generate 3 light sequences for each heavy sequence in the dataframe. Each heavy sequence has its own light germline seed and ANARCII filtering is performed on both.
result = lichen_model.light_generation_bulk(df_input, n=3)

In [27]:
result

Unnamed: 0,heavy,generated_light
0,EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLE...,QSVLTQPPSVSGAPGQRVTISCTGSRSNIGTAFDVHWYQHFPGRAP...
1,EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLE...,QSVLTQPPSVSGAPGQRVTISCTGSRSNIGTRFDVHWYQHFPGRAP...
2,EVQLLESGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLE...,QSVLTQPPSVSGAPGQRVTISCSGTNSNIGSRHEVNWYQHFPGRAP...
0,QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLE...,DIQMTQSPSFVSASVGDRVTITCRASQDIRRSLAWYQQRPGRAPRL...
1,QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLE...,DIQMTQSPSSLSASVGDRVTITCRASQDIRKDLGWYQQKPGKAPKR...
2,QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLE...,DIQMTQSPSFLSASVGDRVTITCRASQNIRRSLSWYQHKSGRAPRV...


## 3. Generating a common light sequence for a bispecific antibody
Both the functions **light_generation** and **light_generation_bulk** can take two heavy sequences at the same time to generate a common light sequences by providing the heavy sequences in a list. Again all optional parameters are available.

In [28]:
lichen_model.light_generation(['QVQLVESGGGLVKPGGSLRLSCAASGFTFSNYYMSWVRQAPGKGLEWISYISGRGSTIFYADSVKGRITISRDNAKNSLFLQMNSLRAEDTAVYFCVKDRGGYSPYWGQGTLVTVSS', 
                               'EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYSMHWVRQAPGKGLEWVSGISWNSGSKGYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTALYYCAKYGSGYGKFYHYGLDVWGQGTTVTVSS'])

['DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSYSTPRFGGGTKVEIK']

## 4. Extracting log likelihood scores for an antibody
The log likelihood score of a given heavy-light pairing can be extracted from the model using the **light_log_likelihood** function. This function takes a pandas dataframe as input with the heavy sequence in the **"heavy"** column and the light sequence in the **"light"** column.

In [34]:
df_input = pd.read_csv('./example_data/easy_example_likelihood.csv', index_col=0)
df_input

Unnamed: 0,heavy,light
0,EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLE...,DIQMTQSPSSLSASVGDRVTITCRASQGIRNYLAWYQQKPGKAPKL...
1,EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLE...,DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKL...


In [35]:
result = lichen_model.light_log_likelihood(df_input)
result

Unnamed: 0,heavy,light,log_likelihood
0,EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLE...,DIQMTQSPSSLSASVGDRVTITCRASQGIRNYLAWYQQKPGKAPKL...,-27.12
1,EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLE...,DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKL...,-10.96


## 5. Extracting perplexity scores for an antiboy
The perplexity score of a given heavy-light pairing can be extracted from the model using the **light_perplexity** function. This function takes a pandas dataframe as input with the heavy sequence in the **"heavy"** column and the light sequence in the **"light"** column.

In [None]:
df_input = pd.read_csv('./example_data/easy_example_likelihood.csv', index_col=0)
df_input

In [36]:
result = lichen_model.light_log_likelihood(df_input)
result

Unnamed: 0,heavy,light,log_likelihood
0,EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLE...,DIQMTQSPSSLSASVGDRVTITCRASQGIRNYLAWYQQKPGKAPKL...,-27.12
1,EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLE...,DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKL...,-10.96
