docu:
 - sequenceModels
 - setUpAlphaFold

## * General considerations
#### Get the information of a method / function
We can always see information of a method, for example a function, by writting a "?" at the end of the line. It will retrieve the function's documentation, typically a general explanation of what does the function do and its options. In the example below, we are going to take the "models" variable where we used the proteinModels class and stablished our "models_folder" and we will apply the *removeTerminiByConfidenceScore* function. To see its documentation, we add a "?" at the end of the function's name and execute the cell.

In [22]:
models.removeTerminiByConfidenceScore?

# 01 - Preparing your proteins

## Introduction

The prepare_proteins library was written to deal with the high throughput setup of protein systems. It can handle many PDB files simultaneously to set up general optimizations that prepare the systems for specific calculations and set up many kinds of simulations.

In this document we are going to show an example of the general workflow that can be followed in order to accomplish the latter mentioned objectives. The intention is that every step can be followed and understood with the help of the attached explanations.

## 1. What modules and libraries do you need?
First of all we need to import the principal library **"prepare_proteins"**. However, probably it won't be the only one because in general you will need another "complementary" modules and libraries. This is the case, for example, of the **"bsc_calculations"** library. This one doesn't need much explanation but is the one responsible of setting up the calculation files and folders for the different BSC clusters. So, in summary, the principal libraries that we will need are:

- prepare_proteins
- bsc_calculations

We have to mention that the **prepare_proteins** library works with python classes and we will begin the workflow using them in order to be able to use their respective methods. 

In [2]:
import prepare_proteins
import bsc_calculations

## 2. What kind of initial system do we have?
The first steps tht we will follow will depend on the type of initial structure that we have, being these a PDB or a FASTA file. If we are starting the workflow from sequences in a FASTA file, we will need to first prepare these sequences to obtain the PDB structures with ALPHAFOLD. If it is the case, we will follow the **2.1. Preparing sequences**, otherwise, we will start at **2.2. Preparing models**.

- Starting from a FASTA file &rarr; (we need to obtain the structure with AlphaFold) &rarr; use sequenceModels class
- Starting / following with PDB files &rarr; use proteinModels class

### 2.1. Preparing sequences - starting from a FASTA file
In this case, we will start defining the class that belongs to this piece of workflow, the *sequenceModels* class. To do it, we can save it in a variable, in this case called "sequences" and we only have to define in which fasta file reside our starting sequences. In this case, our sequence resides in the file named "sequences.fasta". 

Now we can use the class method *setUpAlphafold*. It only needs the name of the folder that we want to create/use to run the software.

In [25]:
sequences = prepare_proteins.sequenceModels('sequences.fasta')
sequences.setUpAlphaFold('alphafold')
jobs = bsc_calculations.minotauro.jobArrays('jobs', job_name='AF_sequences', partition='bsc_ls')

### 2.2. Preparing models - taking PDB files
In this case, we will also start defining the class that belongs to this piece or workflow, the *proteinModels* class. To do it, we can also save it in a variable, in this case named "models" and we only have to define in which folder reside our PDB files. In this case, we could have a folder named "structures" where we previously stored our initial PDBs or we could also create a folder with the ALPHAFOLD output structures.

In [None]:
for model in os.listdir('alphafold/output_models/'):
    if os.path.exists('alphafold/output_models/'+model+'/ranked_0.pdb'):
        shutil.copyfile('alphafold/output_models/'+model+'/ranked_0.pdb', 
                        'structures/'+model.replace('_', '-')+'.pdb')

In [20]:
models = prepare_proteins.proteinModels('structures')

## 3. System preparation
