# BLAST Search Using fasta input

In this tutorial, we will show how to identify subject sequences with multiple matches from the BLAST search result.

## Installation
First, install the the API library into your virtual environment:

In [None]:
%pip install --quiet ncbi-cloudblast-api

For this demo, you also need to install `pandas`:

In [None]:
%pip install --quiet pandas==0.24.2

## Before you start
To use this libray, you must provide the address for a CloudBlast API service endpoint:

In [None]:
API_ADDRESS =  ""  # set the API service address, e.g. "35.245.159.177:5000"

## Perform a BLAST Search

In [None]:
from ncbi_cloudblast_api.api_client import APIClient

if not API_ADDRESS:
    raise ValueError("Please set value for API_ADDRESS in the previous step.")

client=APIClient(API_ADDRESS)

In [None]:
query="u93236"

print (f"Running BLAST search for {query} ...")

res = client.search(accession=query)

print ("Done")

In [None]:
from pandas import DataFrame

# A list of fields to get from the search result
fields = ["qaccver", "saccver", "pident", "length", "evalue", "bitscore", "staxid"]

# A slice of search result for the above fields
df = res.as_dataframe()[fields]

# Show result (first 20 rows)
df.head()

In the next two cells, we'll check how many rows are in the result, and how many unique subjects are there.

In [None]:
# Total number of rows from the result
df['saccver'].count()

In [None]:
# Number of unique subject sequences
df[df.duplicated('saccver', 'first') != True]['saccver'].count()

There are more rows from the result than the number of unique subject sequences. That means there are more than one matches for some subject sequences. The next cell shows all multiple matches and print them out (first 20 rows).

In [None]:
df[df.duplicated('saccver', False)].head(20)

We can see for example that our query aligns with `NM_130802.2` at two regions.