# Get rich sequence information

## Acquire sequence information based on accession id(s)

**Single accession ID**

Single sequences can be retrieved using the `get_id` function. The function takes an accession id as input and returns the sequence as a `ProteinRecord` object.  
The `ProteinRecord` object contains the sequence as a string and additional information such as information on the `Organism`, `Region` or `Site` annotations of the sequence.


In [1]:
%reload_ext autoreload
%autoreload 2

from pyeed.core import ProteinRecord


matHM = ProteinRecord.get_id("MBP1912539.1")

Output()

AttributeError: 'Region' object has no attribute '_repo'

**Multiple accession IDs**

To load multiple sequences at once, the `get_ids` function can be used. The function takes a list of accession IDs as input and returns a list of `ProteinRecord` objects.

In [None]:
import json

# Load the saved ids from json
with open("ids.json", "r") as f:
    ids = json.load(f)

# Get the protein info for each id
proteins = ProteinRecord.get_ids(ids)

## Serach for similar sequences with BLAST

The `ncbi_blast` method can be used to perform a BLAST search on the NCBI server. The method can be applied to a `ProteinRecord` object and returns a list of `ProteinRecord` objects that represent the hits of the BLAST search.
By specifying the `n_hits`, `e_value`, `db`, `matrix`, and `identity`, the search can be customized to number of hits, E-value, query database, substitution matrix, and identity to accept the hit, respectively.

<div class="admonition warning">
    <p class="admonition-title">NCBI BLAST service might be slow</p>
    <p>Due to the way NCBI handles requests to its BLAST API the service is quite slow. During peak working hours a single search might take more than 15 min.</p>
</div>

In [None]:
blast_results = matHM.ncbi_blast(
    n_hits=100,
    e_value=0.05,
    db="swissprot",
    matrix="BLOSUM62",
    identity=0.5,
)

## Inspect objects

Each `pyeed` object has a rich `print` method, displaying all the information available for the object. This can be useful to inspect the object and its attributes.

In [2]:
print(matHM)

[4mProteinRecord[0m
├── [94mid[0m = MBP1912539.1
├── [94mname[0m = S-adenosylmethionine synthetase
├── [94morganism[0m
│   └── [4mOrganism[0m
│       ├── [94mid[0m = ec01bd4b-490f-4908-aa3c-f8435295e9ef
│       ├── [94mtaxonomy_id[0m = 49900
│       ├── [94mname[0m = Thermococcus stetteri
│       ├── [94mdomain[0m = Archaea
│       ├── [94mphylum[0m = Euryarchaeota
│       ├── [94mtax_class[0m = Thermococci
│       ├── [94morder[0m = Thermococcales
│       ├── [94mfamily[0m = Thermococcaceae
│       └── [94mgenus[0m = Thermococcus
├── [94msequence[0m = MLMAEKIRNIVVEEMVRTPVEMQQVELVERKGIGHPDSIADGIAEAVSRALSREYMKRYGIILHHNTDQVEVVGGRAYPQFGGGEVIKPIYILLSGRAVEMVDREFFPVHEVAIKAAKDYLKKAVRHLDIENHVVIDSRIGQGSVDLVGVFNKAKKNPIPLANDTSFGVGYAPLSETERIVLETEKYLNSDEFKKKWPAVGEDIKVMGLRKGDEIDLTIAAAIVDSEVDNPDDYMAVKEAIYEAAKEIVESHTQRPTNIYVNTADDPKEGIYYITVTGTSAEAGDDGSVGRGNRVNGLITPNRHMSMEAAAGKNPVSHVGKIYNILSMLIANDIAEQIEGVEEVYVRILSQIGKPIDEPLVASVQIIPKKGYSIDVLQKPAYEIADEWLANITKIQKMILEDKINVF
├──