In this script, we use Gemini to automatically look for similarity within each of the bacterial communities

In [21]:
import requests
import pandas as pd
import re
import time


In [None]:
API_KEY = "Enter Your API Key Here"

In [27]:

# Gemini REST endpoint
ENDPOINT = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent"

# Gemini Request headers
headers = {
    "Content-Type": "application/json",
    "X-goog-api-key": API_KEY
}

# Load Community File 
df = pd.read_csv("../Results/community_amr_profiles.tsv", sep="|")
df.columns = df.columns.str.strip()  # Clean up column names

# AMR Parser
def parse_amr(amr_str):
    gene_counts = {}
    
    # Match any sequence ending in ":number", allowing commas within gene names
    entries = re.findall(r'([^,]+?:\d+)', amr_str)
    
    for entry in entries:
        try:
            gene, count = entry.rsplit(":", 1)
            gene_counts[gene.strip()] = int(count.strip())
        except ValueError:
            print(f"⚠️ Skipping malformed entry: {entry}")
    
    return gene_counts

# Iterate through each community and send prompts to Gemini
with open("../Results/communities_gemini_analysis_results.txt", "w", encoding="utf-8") as f:
    for idx, row in df.iterrows():
        community_id = row["community_id"]
        members = row["members"]
        amr_dict = parse_amr(row["amr_gene_counts"])

        prompt_text = (
            f"Community {community_id} includes the following bacterial species:\n{members}\n\n"
            f"The AMR gene counts in this community are:\n{amr_dict}\n\n"
            f"Based on online literature, what possible factors relevant to the community members "
            f"explain the similarity in AMR gene profiles observed among these organisms?\n"
            f"Please provide the explanations with respect to the community members and the AMR profile,\n"
            f"Provide references."
        )

        data = {
            "contents": [
                {
                    "parts": [{"text": prompt_text}]
                }
            ]
        }

        print(f"\n🧠 Community {community_id} - Analyzing...\n")

        try:
            response = requests.post(ENDPOINT, headers=headers, json=data)
            response.raise_for_status()
            output = response.json()
            text_response = output['candidates'][0]['content']['parts'][0]['text']
            print(text_response)

            f.write(f"\n=== Community {community_id} ===\n")
            f.write(f"\n Members: {members} \n")
            f.write(f"\n AMR Genes: {amr_dict} \n")

            f.write(text_response + "\n")
            f.write("="*80 + "\n")

        except Exception as e:
            error_msg = f"❌ Error analyzing community {community_id}: {e}\n"
            print(error_msg)
            f.write(f"\n=== Community {community_id} ===\n")
            f.write(error_msg)
            f.write("="*80 + "\n")

        time.sleep(2)

print("\n✅ All results saved to '../Results/communities_gemini_analysis_results.txt'")




🧠 Community 1 - Analyzing...

Okay, let's break down the potential factors driving the observed AMR gene profile similarity in Community 1, considering the bacterial species present and the AMR genes detected.

**Key Observations:**

*   **Dominant Genera:**  *Pseudomonas* and *Bradyrhizobium* are heavily represented, along with *Paraburkholderia*, *Rhizobium*, and other members of the *Rhizobiales*.  *Lysobacter*, *Cupriavidus*, *Janthinobacterium* and *Pseudoduganella* are also represented
*   **AMR Gene Profile:**  The AMR profile is dominated by `adeF`, followed by `rsmA`, `BJP-1`, and `FosA8`. The AMR profile also contains other AMR genes coding for aminoglycoside and quaternary ammonium compound resistance.

**Possible Explanations for AMR Gene Similarity:**

Several factors could explain the shared AMR gene profile:

1.  **Horizontal Gene Transfer (HGT):**  This is the most likely driver. Bacteria can share genetic material, including AMR genes, through various mechanisms.

   