<a href="https://colab.research.google.com/github/RulerScarlett/DataCon-2025-TNF/blob/main/TNF_alpha_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# STEP 0 Check GPU  


## This notebook must be run in GPU mode. You can run the nvidia-smi command above to check if the GPU is loaded. If not, click on the top-left corner > Edit > Notebook settings, and change the hardware accelerator to GPU.  






In [None]:
!nvidia-smi

Tue Jul 15 13:27:43 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   43C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
#@title STEP 1 Clone repository
!git clone https://github.com/LIYUESEN/druggpt.git
!pwd
import os
os.chdir("/content/druggpt")
!pwd

Cloning into 'druggpt'...
remote: Enumerating objects: 257, done.[K
remote: Counting objects: 100% (80/80), done.[K
remote: Compressing objects: 100% (62/62), done.[K
remote: Total 257 (delta 40), reused 38 (delta 18), pack-reused 177 (from 1)[K
Receiving objects: 100% (257/257), 109.70 KiB | 4.99 MiB/s, done.
Resolving deltas: 100% (126/126), done.
/content
/content/druggpt


In [None]:
#@title STEP 2 Build environment
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
!conda config --set always_yes yes --set changeps1 no
!conda update -q conda

!conda create -y -n druggpt python=3.8
import sys
sys.path.append('/usr/local/lib/python3.8/site-packages/')

!source activate druggpt && pip install --root-user-action=ignore datasets transformers scipy scikit-learn
!source activate druggpt && pip install --root-user-action=ignore torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
!source activate druggpt && conda install conda-forge/label/cf202003::openbabel
!source activate druggpt && pip install psutil
!source activate druggpt && conda list

--2025-07-15 13:27:49--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.191.158, 104.16.32.241, 2606:4700::6810:20f1, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.191.158|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 159476510 (152M) [application/octet-stream]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’


2025-07-15 13:27:50 (308 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [159476510/159476510]

PREFIX=/usr/local
Unpacking payload ...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Pyth

In [None]:
#@title STEP 3 Run druggpt

#@markdown # **Instructions**
#@markdown - If you select "protein amino acid sequence", input a protein amino acid sequence in the input_protein_amino_acid_sequence field.
#@markdown - If you select "fasta file", first upload the fasta file to the druggpt directory by dragging and dropping it into the file browser on the left side. Then, input the name of the fasta file as text in the input_fasta_file field.
#@markdown - If "no input" is selected, the input content will be ignored.

#@markdown #step 3.1 Select your input type
Input_type = "protein amino acid sequence" #@param ["protein amino acid sequence", "fasta file", "no input"]
#@markdown #step 3.2 Provide the required input based on your selected input type:

#@markdown - For "protein amino acid sequence", input a protein amino acid sequence.
input_protein_amino_acid_sequence = "MSTESMIRDVELAEEALPKKTGGPQGSRRCLFLSLFSFLIVAGATTLFCLLHFGVIGPQREEFPRDLSLISPLAQAVRSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL" #@param {type:"raw"}
#@markdown - Upload the fasta file to the "druggpt" directory by selecting the file and uploading it. Ensure that the file is placed within the same folder as the default "BCL2L11.fasta" file. Then, input the name of the fasta file in the text field.
input_fasta_flie = "BCL2L11.fasta" #@param {type:"raw"}
if Input_type == "protein amino acid sequence" :
  input_content = input_protein_amino_acid_sequence
elif Input_type == "fasta file":
  input_content = input_fasta_flie
else:pass




#@markdown #step 3.3 Choose whether to use ligand prompt
ligand_prompt = False #@param {type:"boolean"}
#@markdown -  If ligand prompt is not checked, the ligand prompt content will be ignored.
if ligand_prompt == True:
  ligand_prompt_content = "COc1ccc(cc1)C(=O)" #@param {type:"raw"}
  ligand_content = "\'" + ligand_prompt_content + "\'"


#@markdown #step 3.4 Set Additional Parameters

#@markdown - At least how many molecules will be generated.
number = 500 #@param {type:"integer"}

#@markdown - Hardware device to use. Default is 'cuda'.
device = "cuda" #@param ["cuda", "cpu"]

#@markdown - Output directory for generated molecules. Default is './ligand_output/'.
output = './ligand_output/' #@param {type:"raw"}

# @markdown - how many molecules will be generated per batch. Try to reduce this value if you have low RAM. Default is 16.
batch_size = 32 #@param {type:"integer"}

#@markdown # Don't forget to run the cell.
temp = "source activate druggpt && TOKENIZERS_PARALLELISM=false && python drug_generator.py "
if Input_type == "no input":
  temp = temp + "-e "
elif Input_type == "protein amino acid sequence":
  temp = temp + "-p " + input_content + " "
else:
  temp = temp + "-f " + input_content + " "

if (ligand_prompt == True) and (Input_type != "no input"):
  temp = temp + "-l " + ligand_content + " "

temp = temp + "-n " + str(number) + " -d " + device + " -o " +output + " -b "+ str(batch_size)
print(temp)

!{temp}



source activate druggpt && TOKENIZERS_PARALLELISM=false && python drug_generator.py -p MSTESMIRDVELAEEALPKKTGGPQGSRRCLFLSLFSFLIVAGATTLFCLLHFGVIGPQREEFPRDLSLISPLAQAVRSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSAEINRPDYLDFAESGQVYFGIIAL -n 500 -d cuda -o ./ligand_output/ -b 32

  _____                    _____ _____ _______ 
 |  __ \                  / ____|  __ \__   __|
 | |  | |_ __ _   _  __ _| |  __| |__) | | |   
 | |  | | '__| | | |/ _` | | |_ |  ___/  | |   
 | |__| | |  | |_| | (_| | |__| | |      | |   
 |_____/|_|   \__,_|\__, |\_____|_|      |_|   
                     __/ |                     
                    |___/                      
 A generative drug design model based on GPT2
    
<|startoftext|><P>MSTESMIRDVELAEEALPKKTGGPQGSRRCLFLSLFSFLIVAGATTLFCLLHFGVIGPQREEFPRDLSLISPLAQAVRSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLIYSQVLFKGQGCPSTHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEG

In [None]:
#@title STEP 4 Save results
#@markdown # Enter the name of the results folder you want to download:
from google.colab import files
results_dir_name = 'ligand_output' #@param {type:"raw"}
results_zip = results_dir_name + ".zip"
print(results_zip)
!zip -r $results_zip $results_dir_name
files.download(results_zip)

results_dir_name = results_dir_name + '_min'
results_zip = results_dir_name + ".zip"
print(results_zip)
!zip -r $results_zip $results_dir_name
files.download(results_zip)

ligand_output.zip
  adding: ligand_output/ (stored 0%)
  adding: ligand_output/b8b9f36e55dbc307eb4c8b0639e5e8c6ba7b0983.sdf (deflated 79%)
  adding: ligand_output/91d3e2c8c77b56deb1c4d6864fc51fbffa313976.sdf (deflated 79%)
  adding: ligand_output/a2b3793e8007d3ee61613b9d33f47b90b1b6c2f6.sdf (deflated 80%)
  adding: ligand_output/6e50c07157a394dd63dfe8a12443a613c20173bd.sdf (deflated 79%)
  adding: ligand_output/0cc01a6faea3e51fe6c77fac37603d51712bdcf2.sdf (deflated 79%)
  adding: ligand_output/0c99e2f68d042d86b4706a072b6ec3d6356773a7.sdf (deflated 80%)
  adding: ligand_output/02eee1245eacd0bbf5cc119aeba3d84450426074.sdf (deflated 80%)
  adding: ligand_output/b3fdbed03b97d8c0a308e910a7ce41df555cdb75.sdf (deflated 80%)
  adding: ligand_output/5c494ca3e4c7e865478fe6301bd40b1fd2426c71.sdf (deflated 80%)
  adding: ligand_output/c0e3fa63ae286496caa01938e348bbe67ef8ddc5.sdf (deflated 80%)
  adding: ligand_output/6b800191211d1299450245391d21c1e4b2831f46.sdf (deflated 79%)
  adding: ligand_outp

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

ligand_output_min.zip
  adding: ligand_output_min/ (stored 0%)
  adding: ligand_output_min/b8b9f36e55dbc307eb4c8b0639e5e8c6ba7b0983.sdf (deflated 79%)
  adding: ligand_output_min/91d3e2c8c77b56deb1c4d6864fc51fbffa313976.sdf (deflated 79%)
  adding: ligand_output_min/a2b3793e8007d3ee61613b9d33f47b90b1b6c2f6.sdf (deflated 80%)
  adding: ligand_output_min/6e50c07157a394dd63dfe8a12443a613c20173bd.sdf (deflated 80%)
  adding: ligand_output_min/0cc01a6faea3e51fe6c77fac37603d51712bdcf2.sdf (deflated 79%)
  adding: ligand_output_min/0c99e2f68d042d86b4706a072b6ec3d6356773a7.sdf (deflated 79%)
  adding: ligand_output_min/02eee1245eacd0bbf5cc119aeba3d84450426074.sdf (deflated 80%)
  adding: ligand_output_min/b3fdbed03b97d8c0a308e910a7ce41df555cdb75.sdf (deflated 80%)
  adding: ligand_output_min/5c494ca3e4c7e865478fe6301bd40b1fd2426c71.sdf (deflated 80%)
  adding: ligand_output_min/c0e3fa63ae286496caa01938e348bbe67ef8ddc5.sdf (deflated 80%)
  adding: ligand_output_min/6b800191211d1299450245391d21c

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Step 5: Visualize and post-process on your local computer  

## After completing the model predictions and downloading the results, you can visualize and post-process the results on your local computer. Here are some suggested steps:

## 1. Unzip the downloaded results folder: Locate the zip file you downloaded in Step 4 and unzip it to access the result files.

## 2. Install molecular visualization software: Before visualizing, make sure you have installed molecular visualization software, such as PyMOL, UCSF Chimera, or Discovery Studio Visualizer. These tools can help you visualize the protein structures and ligands more intuitively.

## 3. Open the protein and ligand structures: Use the molecular visualization software to open the protein structure file (e.g., in PDB format) and the ligand structure files from the downloaded results (e.g., in SDF format).

## 4. Analyze the predicted results: When visualizing the protein-ligand complexes, pay attention to the following aspects:

- The binding position and pose of the ligand with the protein  
- Hydrogen bonds, hydrophobic interactions, and other interactions between the ligand and protein  
- Any conformational changes that might affect binding  
- The consistency of the results with known experimental data (e.g., crystal structures, binding assays, etc.)

## 5. Further post-processing: As needed, you can use other bioinformatics tools (e.g., AutoDock, GROMACS, or Amber) to perform energy minimization, molecular dynamics simulations, or other analyses on the predicted results.

## 6. Share the results: If you are satisfied with the analysis results, you can share your findings with colleagues, collaborators, or the community. This can be done by writing reports, creating presentations, or publishing online resources.