# Handling Microbial Strains with TeselaGen API

This notebook provides a step-by-step guide to querying microbial strains from the TeselaGen API. Specifically, we will demonstrate how to authenticate with the API, retrieve data about microbial strains that have associated plasmids with specific size criteria, and organize the results into a pandas DataFrame for easy analysis.

## Prerequisites
Make sure you have the following packages installed:
- `dotenv` (for loading environment variables)
- `pandas` (for handling data)
- Create a `.env` file with your credentials (`USERNAME`, `PASSWORD`, `HOST_URL`)

Install the packages using:
```bash
pip install python-dotenv pandas

## Generating a One-Time Password (OTP)
To authenticate, you need to generate a one-time password (OTP). You can do this from your TeselaGen instance by navigating to: Settings > API Password > Generate API OTP. Use this OTP as your `PASSWORD` for logging in via the API.

In [None]:
# Step 1: Import required libraries
from dotenv import load_dotenv
import requests
import os
import json
import pandas as pd

# Load the variables from the .env file
load_dotenv('./credentials.env', override=True)

# Read credentials from the environment file
USERNAME = os.getenv('USERNAME')
PASSWORD = os.getenv('PASSWORD')
HOST_URL = os.getenv('HOST_URL')

# Create a persistent session object and set default headers
session = requests.Session()
session.headers.update({'Content-Type': 'application/json', 'Accept': 'application/json'})

# Define credentials to be sent for authentication
credentials_json = {
    'username': USERNAME,
    'password': PASSWORD,
    'expiresIn': '1w',  # Session token will expire in 1 week
}

# Send authentication request
response = session.put(url=f'{HOST_URL}/public/auth', json=credentials_json)
response.raise_for_status()

# Update session headers to include the token
session.headers.update({
    'x-tg-cli-token': response.json()['token']
})

print("Successfully authenticated!")

## Step 2: Define Query to Filter Microbial Strains
We will define a query to retrieve microbial strains that have plasmids of specific sizes. The query will fetch strains with plasmids having a size greater than or equal to 8000 base pairs or less than or equal to 3000 base pairs.


In [17]:
# Step 2: Define the query to retrieve microbial strains based on plasmid size
query = {
    "__objectType": "query",
    "type": "root",
    "entity": "strain",
    "filters": [
        {
            "type": "group",
            "operator": "or",
            "filters": [
                {
                    "type": "expression",
                    "operator": "greaterThanOrEqual",
                    "field": "strainPlasmids.polynucleotideMaterial.polynucleotideMaterialSequence.size",
                    "args": ["8000"]
                },
                {
                    "type": "expression",
                    "operator": "lessThanOrEqual",
                    "field": "strainPlasmids.polynucleotideMaterial.polynucleotideMaterialSequence.size",
                    "args": ["3000"]
                }
            ]
        }
    ]
}

# Convert the query to a JSON string for the request
query_string = json.dumps(query)
query_params = {
    'filter': query_string
}

## Step 3: Retrieve Microbial Strains Data
We will now send a GET request to the microbial-strain endpoint of the TeselaGen API using the query we defined above. The response will include detailed information about the microbial strains that meet the specified plasmid size criteria.


In [None]:
# Step 3: Send a GET request to retrieve microbial strains data
response = session.get(url=f'{HOST_URL}/microbial-strain', params=query_params)
response.raise_for_status()
results = response.json()

print("Data successfully retrieved!")

## Step 4: Organize and Display Data
We will process the data using pandas to create a structured DataFrame. The DataFrame will contain information such as:
- Strain ID, Name, and Biosafety Level
- Plasmid ID, Name, Size, and Sequence

This will allow easy viewing, analysis, and further processing of the microbial strain data.


In [None]:
# Step 4: Organize the retrieved data into a pandas DataFrame
df = pd.json_normalize(
    results, 
    record_path=['plasmids'], 
    meta=['id', 'name', 'biosafetyLevel'], 
    record_prefix='plasmid_'
)

# Select relevant columns
df = df[['id', 'name', 'biosafetyLevel', 'plasmid_id', 'plasmid_name', 'plasmid_size', 'plasmid_sequence']]

# Display the first few rows of the DataFrame
df.head()
