<a href="https://colab.research.google.com/github/akshayonly/Bioinformatics-codes/blob/master/Fetch_Abstracts_From_Titles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install biopython

Collecting biopython
  Downloading biopython-1.85-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading biopython-1.85-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: biopython
Successfully installed biopython-1.85


In [None]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
Fetch abstracts and authors for a list of publication titles.
Primary source: NCBI PubMed (Biopython Entrez).
Fallback: Crossref REST API (handles many journals + preprints).
Output: publications_metadata.csv

Usage:
  - Fill TITLES with the 21 titles from the μEvo Lab publications page.
  - Set Entrez.email (and optionally Entrez.api_key).
"""

import time
import re
import html
import csv
from difflib import SequenceMatcher
from typing import List, Dict, Optional, Any

import requests
import pandas as pd
from Bio import Entrez
from Bio import Medline

# ========= User settings =========
Entrez.email = "your_email@your_domain.com"   # REQUIRED by NCBI policies
Entrez.api_key = None                         # Optional: "YOUR_NCBI_API_KEY"

# Paste the 21 titles here (exact titles work best).
# TITLES: List[str] = [
#     # "Facultative cheating and hybrid vigor resolves cooperator-cheater conflict in a yeast public-goods system",
#     # "Effects of resource packaging on the adaptative and pleiotropic consequences of evolution",
#     # ...
#     # Replace with the authoritative 21 titles from the page you shared.
# ]

TITLES = [
    "Facultative cheating and hybrid vigor resolves cooperator-cheater conflict in a yeast public-goods system",
    "Effects of resource packaging on the adaptative and pleiotropic consequences of evolution",
    "Resource presentation dictates genetic and phenotypic adaptation in yeast",
    "On the metabolic basis and predictability of global epistasis",
    "Variations and predictability of epistasis on an intragenic fitness landscape",
    "Empirical evidence of resource-dependent evolution of payoff matrices in Saccharomyces cerevisiae populations",
    "Synonymous and single nucleotide changes facilitate the adaptation of a horizontally transferred gene",
    "Toward the integration of speciation research",
    "Convergent genetic adaptation of Escherichia coli in minimal media leads to pleiotropic divergence",
    "Ecological disruptive selection acting on quantitative loci can drive sympatric speciation",
    "Increased privatization of a public resource leads to spread of cooperation in a microbial population",
    "Rapid evolution of pre-zygotic barriers in allopatric populations",
    "Exploring the Influence of pH on the Dynamics of Acetone–Butanol–Ethanol Fermentation",
    "Public-good driven release of heterogeneous resources leads to genotypic diversification of an isogenic yeast population in melibiose",
    "A method to determine the mating efficiency of haploids in Saccharomyces cerevisiae",
    "Selection in a growing bacterial/yeast colony biases results of mutation accumulation experiments",
    "Limited pairwise synergistic and antagonistic interactions impart stability to microbial communities",
    "GAL regulon in the yeast S. cerevisiae is highly evolvable via acquisition in the coding regions of the regulatory elements of the network",
    "Evolution of multicellularity and unicellularity in yeast S. cerevisiae to study reversibility of evolutionary trajectories",
    "Cell growth model with stochastic gene expression helps understand the growth advantage of metabolic exchange and auxotrophy",
    "Experimental evolution of anticipatory gene regulation in Escherichia coli"
]


# Respect NCBI rate limits: ~3 req/sec w/o key, up to ~10 req/sec with key.
SLEEP_SECONDS = 0.12 if Entrez.api_key else 0.34

# =================================

def normalize_title(t: str) -> str:
    t = html.unescape(t or "").strip()
    t = re.sub(r"\s+", " ", t)
    return t

def title_similarity(a: str, b: str) -> float:
    a_norm, b_norm = normalize_title(a).lower(), normalize_title(b).lower()
    return SequenceMatcher(None, a_norm, b_norm).ratio()

def best_pubmed_match_for_title(title: str, max_hits: int = 10, min_sim: float = 0.82) -> Optional[str]:
    """Return the PubMed ID (PMID) with best title similarity, else None."""
    query = f'"{title}"[Title]'
    try:
        with Entrez.esearch(db="pubmed", term=query, retmax=max_hits) as handle:
            result = Entrez.read(handle)
    except Exception:
        time.sleep(SLEEP_SECONDS)
        return None

    idlist = result.get("IdList", [])
    best = (None, 0.0)
    for pmid in idlist:
        time.sleep(SLEEP_SECONDS)
        try:
            with Entrez.efetch(db="pubmed", id=pmid, rettype="medline", retmode="text") as h:
                records = list(Medline.parse(h))
        except Exception:
            continue
        for rec in records:
            pm_title = rec.get("TI") or rec.get("TI  -")
            if pm_title:
                sim = title_similarity(title, pm_title)
                if sim > best[1]:
                    best = (pmid, sim)
    if best[0] and best[1] >= min_sim:
        return best[0]
    return None

def fetch_pubmed_details(pmid: str) -> Dict[str, Any]:
    """Return dict with fields from PubMed MEDLINE."""
    time.sleep(SLEEP_SECONDS)
    with Entrez.efetch(db="pubmed", id=pmid, rettype="medline", retmode="text") as handle:
        records = list(Medline.parse(handle))
    if not records:
        return {}
    rec = records[0]
    # Authors: list of "Last FM"
    authors = rec.get("AU", []) or rec.get("FAU", [])
    # Journal + Year
    journal = rec.get("JT") or rec.get("TA") or ""
    year = ""
    if rec.get("DP"):
        m = re.search(r"(\d{4})", rec["DP"])
        if m:
            year = m.group(1)
    # DOI
    doi = ""
    for idv in rec.get("AID", []):
        if idv.endswith("[doi]"):
            doi = idv.replace(" [doi]", "").strip()
            break
    abstract = rec.get("AB", "")
    title = rec.get("TI", "")
    return {
        "Title": normalize_title(title),
        "PubMedID": pmid,
        "DOI": doi,
        "Journal": journal,
        "Year": year,
        "Authors": "; ".join(authors),
        "Abstract": abstract.strip(),
        "Source": "PubMed",
    }

def strip_tags(text: str) -> str:
    if not text:
        return ""
    # Remove simple JATS/HTML tags
    text = re.sub(r"<[^>]+>", "", text)
    return html.unescape(text).strip()

def search_crossref_by_title(title: str, rows: int = 5) -> Optional[Dict[str, Any]]:
    """Return the best Crossref match for title."""
    url = "https://api.crossref.org/works"
    params = {"query.title": title, "rows": rows, "sort": "relevance", "select": "title,DOI,container-title,author,issued,abstract"}
    try:
        r = requests.get(url, params=params, timeout=20)
        r.raise_for_status()
        items = r.json().get("message", {}).get("items", [])
    except Exception:
        return None
    if not items:
        return None
    # Score by title similarity and presence of abstract or authors
    scored = []
    for it in items:
        it_title = ""
        tlist = it.get("title") or []
        if tlist:
            it_title = tlist[0]
        sim = title_similarity(title, it_title)
        bonus = 0.02 if it.get("abstract") else 0.0
        scored.append((sim + bonus, it))
    scored.sort(key=lambda x: x[0], reverse=True)
    best = scored[0][1]
    return best

def parse_crossref_item(item: Dict[str, Any]) -> Dict[str, Any]:
    t = (item.get("title") or [""])[0]
    doi = item.get("DOI") or ""
    journal = (item.get("container-title") or [""])[0]
    year = ""
    issued = item.get("issued", {}).get("date-parts", [])
    if issued and issued[0] and len(issued[0]) > 0:
        year = str(issued[0][0])
    authors_list = []
    for a in item.get("author", []) or []:
        family = a.get("family") or ""
        given = a.get("given") or ""
        name = (family + " " + given).strip() if family or given else ""
        if not name and a.get("name"):
            name = a["name"]
        if name:
            authors_list.append(name)
    abstract = strip_tags(item.get("abstract") or "")
    return {
        "Title": normalize_title(t),
        "PubMedID": "",
        "DOI": doi,
        "Journal": journal,
        "Year": year,
        "Authors": "; ".join(authors_list),
        "Abstract": abstract,
        "Source": "Crossref",
    }

def get_metadata_for_title(title: str) -> Dict[str, Any]:
    title = normalize_title(title)
    # 1) Try PubMed
    pmid = best_pubmed_match_for_title(title)
    if pmid:
        try:
            return fetch_pubmed_details(pmid)
        except Exception:
            pass
    # 2) Fall back to Crossref
    item = search_crossref_by_title(title)
    if item:
        return parse_crossref_item(item)
    # 3) Nothing found
    return {
        "Title": title,
        "PubMedID": "",
        "DOI": "",
        "Journal": "",
        "Year": "",
        "Authors": "",
        "Abstract": "",
        "Source": "Not found",
    }

def main(titles: List[str]) -> pd.DataFrame:
    rows = []
    for i, t in enumerate(titles, 1):
        print(f"[{i}/{len(titles)}] Resolving: {t}")
        try:
            meta = get_metadata_for_title(t)
        except Exception as e:
            meta = {
                "Title": normalize_title(t),
                "PubMedID": "",
                "DOI": "",
                "Journal": "",
                "Year": "",
                "Authors": "",
                "Abstract": f"ERROR: {e}",
                "Source": "Error",
            }
        rows.append(meta)
    data = pd.DataFrame(rows, columns=["Title","PubMedID","DOI","Journal","Year","Authors","Abstract","Source"])
    data.to_csv("publications_metadata.csv", index=False, quoting=csv.QUOTE_MINIMAL)
    print("Saved: publications_metadata.csv")
    return data

if __name__ == "__main__":
    if not TITLES:
        print("Please paste your 21 titles into the TITLES list at the top of this script.")
    else:
        main(TITLES)


[1/21] Resolving: Facultative cheating and hybrid vigor resolves cooperator-cheater conflict in a yeast public-goods system
[2/21] Resolving: Effects of resource packaging on the adaptative and pleiotropic consequences of evolution
[3/21] Resolving: Resource presentation dictates genetic and phenotypic adaptation in yeast
[4/21] Resolving: On the metabolic basis and predictability of global epistasis
[5/21] Resolving: Variations and predictability of epistasis on an intragenic fitness landscape
[6/21] Resolving: Empirical evidence of resource-dependent evolution of payoff matrices in Saccharomyces cerevisiae populations
[7/21] Resolving: Synonymous and single nucleotide changes facilitate the adaptation of a horizontally transferred gene
[8/21] Resolving: Toward the integration of speciation research
[9/21] Resolving: Convergent genetic adaptation of Escherichia coli in minimal media leads to pleiotropic divergence
[10/21] Resolving: Ecological disruptive selection acting on quantitati

In [None]:
!head -10 /content/publications_metadata.csv

Title,PubMedID,DOI,Journal,Year,Authors,Abstract,Source
Facultative Cheating and Hybrid Vigor Resolves Cooperator-Cheater Conflict in a Yeast Public Goods System,,10.1101/2025.04.28.651155,,2025,Raj Namratha; Saini Supreet,"AbstractThe persistence of cooperation in the face of cheating is a central paradox in evolutionary biology. Microbial public goods systems employ diverse solutions to this dilemma, yet most studies assume fixed strategies wherein genotypes function strictly as cooperators or cheaters. Here, using the GAL/MEL regulon ofSaccharomyces cerevisiae, we uncover a dynamic resolution to this conflict through facultative strategy switching. When haploid cheater-cooperator strains were co-evolved in melibiose, we observed the repeated emergence of same-mating-type diploid hybrids. These hybrids arise early in evolution and ultimately spread in the population. The hybrids exploit the public good produced by cooperator strains when present, acting as facultative cheaters. Howev

In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('/content/publications_metadata.csv')

In [None]:
data.to_csv()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Title     21 non-null     object 
 1   PubMedID  0 non-null      float64
 2   DOI       21 non-null     object 
 3   Journal   13 non-null     object 
 4   Year      21 non-null     int64  
 5   Authors   21 non-null     object 
 6   Abstract  19 non-null     object 
 7   Source    21 non-null     object 
dtypes: float64(1), int64(1), object(6)
memory usage: 1.4+ KB


In [None]:
# Create combined text: Title, then newline, then Abstract
data["combined"] = data["Title"].fillna("") + "\n" + data["PubMedID"].fillna("") + "\n" + data["Abstract"].fillna("")

# Join all entries with double newlines (blank line between publications)
all_text = "\n\n".join(data["combined"])

# Save to single text file
with open("titles_and_abstracts.txt", "w", encoding="utf-8") as f:
    f.write(all_text)

print("Saved: titles_and_abstracts.txt")

Saved: titles_and_abstracts.txt


In [None]:
data.iloc[0]['combined']

'Facultative Cheating and Hybrid Vigor Resolves Cooperator-Cheater Conflict in a Yeast Public Goods System\n\nAbstractThe persistence of cooperation in the face of cheating is a central paradox in evolutionary biology. Microbial public goods systems employ diverse solutions to this dilemma, yet most studies assume fixed strategies wherein genotypes function strictly as cooperators or cheaters. Here, using the GAL/MEL regulon ofSaccharomyces cerevisiae, we uncover a dynamic resolution to this conflict through facultative strategy switching. When haploid cheater-cooperator strains were co-evolved in melibiose, we observed the repeated emergence of same-mating-type diploid hybrids. These hybrids arise early in evolution and ultimately spread in the population. The hybrids exploit the public good produced by cooperator strains when present, acting as facultative cheaters. However, following cooperator extinction, hybrids switch to a cooperative phenotype. This dynamic role transition enabl

In [None]:
!head /content/titles_and_abstracts.txt

Facultative Cheating and Hybrid Vigor Resolves Cooperator-Cheater Conflict in a Yeast Public Goods System

AbstractThe persistence of cooperation in the face of cheating is a central paradox in evolutionary biology. Microbial public goods systems employ diverse solutions to this dilemma, yet most studies assume fixed strategies wherein genotypes function strictly as cooperators or cheaters. Here, using the GAL/MEL regulon ofSaccharomyces cerevisiae, we uncover a dynamic resolution to this conflict through facultative strategy switching. When haploid cheater-cooperator strains were co-evolved in melibiose, we observed the repeated emergence of same-mating-type diploid hybrids. These hybrids arise early in evolution and ultimately spread in the population. The hybrids exploit the public good produced by cooperator strains when present, acting as facultative cheaters. However, following cooperator extinction, hybrids switch to a cooperative phenotype. This dynamic role transition enables 

In [None]:
data.head(1)

Unnamed: 0,Title,PubMedID,DOI,Journal,Year,Authors,Abstract,Source,combined
0,Facultative Cheating and Hybrid Vigor Resolves...,,10.1101/2025.04.28.651155,,2025,Raj Namratha; Saini Supreet,AbstractThe persistence of cooperation in the ...,Crossref,Facultative Cheating and Hybrid Vigor Resolves...


In [None]:
# Create text file with specified format from dataframe
def create_text_file(data, output_filename='papers_output.txt'):
    """
    Creates a text file from dataframe with format:
    #year1
    #title1
    #abstract1

    #year2
    #title2
    #abstract2
    """
    with open(output_filename, 'w', encoding='utf-8') as f:
        for index, row in data.iterrows():
            # Get Year, handling NaN values
            year = row['Year'] if pd.notna(row['Year']) else 'N/A'

            # Get Title, handling NaN values
            title = row['Title'] if pd.notna(row['Title']) else 'N/A'

            # Get Abstract, handling NaN values
            abstract = row['Abstract'] if pd.notna(row['Abstract']) else 'N/A'

            # Write in the specified format
            f.write(f"#{year}\n")
            f.write(f"{title}\n")
            f.write(f"{abstract}\n")

            # Add blank line between entries (except for the last one)
            if index < len(data) - 1:
                f.write("\n")

    print(f"Text file '{output_filename}' created successfully!")
    print(f"Total entries: {len(data)}")



In [None]:
min_data = data.sort_values('Year', ascending=False)[['Title', 'Year', 'Abstract']]

In [None]:
create_text_file(min_data, output_filename='papers_output.txt')

Text file 'papers_output.txt' created successfully!
Total entries: 21


In [None]:
!head -10 /content/papers_output.txt

#2025
Facultative Cheating and Hybrid Vigor Resolves Cooperator-Cheater Conflict in a Yeast Public Goods System
AbstractThe persistence of cooperation in the face of cheating is a central paradox in evolutionary biology. Microbial public goods systems employ diverse solutions to this dilemma, yet most studies assume fixed strategies wherein genotypes function strictly as cooperators or cheaters. Here, using the GAL/MEL regulon ofSaccharomyces cerevisiae, we uncover a dynamic resolution to this conflict through facultative strategy switching. When haploid cheater-cooperator strains were co-evolved in melibiose, we observed the repeated emergence of same-mating-type diploid hybrids. These hybrids arise early in evolution and ultimately spread in the population. The hybrids exploit the public good produced by cooperator strains when present, acting as facultative cheaters. However, following cooperator extinction, hybrids switch to a cooperative phenotype. This dynamic role transition ena