# <span style="color:salmon">ADS</span> in Python

(*Life is too short for manual literature searches*)

**Pycoffee** January, 2026  
Carlos Cifuentes

# 1. Introduction

## What is it?

- The NASA Astrophysics Data System (ADS) is the primary literature database for Astrophysics. 
- People go crazy on the rare occasions when it goes down.
- ```ads``` is a Python Module to interact with NASA’s ADS

Using the ADS API with Python, we can automate common literature tasks, such as:
- Searching and filtering in a more customisable way
- Retrieving bibliographic entries, formatted for publications or CV
- Building automated weekly digests

## Installation & Setup

Documentation: https://ads.readthedocs.io/.  

1. Obtain your API key. Go to https://ui.adsabs.harvard.edu/user/settings/token
2. Copy your personal token
3. Store it to a file `~/.ads/dev_key` or in an environment variable: `export ADS_DEV_KEY="YOUR_API_KEY"` (Linux/Mac) 

<span style="color:salmon">Caution:</span> Do not store in your script directly!  

To store it as an environment variable (recommended), in Terminal (UNIX) do:  

```bash
cd
nano ~/.zshrc
export ADS_DEV_KEY="your_api_key_here"
source ~/.zshrc
echo $ADS_DEV_KEY
```

In [None]:
!pip install ads

In [None]:
import ads
import os
from termcolor import colored
import matplotlib.pyplot as plt
from datetime import datetime

# Verify your key is set
ads.config.token = os.getenv('ADS_DEV_KEY')
print("ADS_DEV_KEY is set:", ads.config.token)

Test ADS server status:

In [None]:
try:
    test = ads.SearchQuery(q="*", rows=1)
    for paper in test:
        print(colored("ADS is online!", "green"))
        break
except Exception as e:
    print(colored(f"ADS is down: {e}", "red"))

# 2. Basic usage

The main class in `ads` library is `SearchQuery`, which builds and executes queries against the ADS API. 
The arguments are:

- `q` searches indexed text (title/abstract/authors/bibcode). Specify with with `title`, `abstract`, `authors` or `body`.
- `fl` (Field List): **fields** to return (e.g., `id`, `title`, `author`, `year`, `bibcode`, `bibstem`, `citation_count`).
- `fq` (Filter Query): **filters** (e.g., `pub:A&A`, `property:refereed`, `year:2025`).
- `sort` order (e.g., `date desc`, `citation_count asc`).
- `rows` sets the number of results to get.
- `max_pages` is a security limit (default is 3).
  
It yields `paper` objects with the requested fields. These are **sortable**.

The list of available **fields** and **filters** is [here](https://adsabs.github.io/help/search/search-syntax).



### <span style="color:salmon">**Example**</span> Requesting fields

Search for the _most cited_ article in 2025.
Use all the available **fields** (`fl`). 

In [None]:
query = ads.SearchQuery(
    q="*",
    fl=["title",
        "author",
        "year",
        "bibcode",
        "pub",
        "citation_count",
        "doi",
        "abstract",
        "keyword",
        "aff",
        "bibstem",
        "property",
        "id",
        "score",
        "read_count",
        "first_author",
        "arxiv_class",
        "database"
    ],
    fq=["year:2025"],
    sort="citation_count desc",
    rows=1
)

for paper in query:
    print(colored("\nCONTENT", "cyan"))
    print(f"{colored(paper.title[0], 'cyan')}")
    print(f"Abstract: {paper.abstract[:200] + '...'}")  
    print(f"Keywords: {paper.keyword}")

    print(colored("\nAUTHOR(S)", "green"))
    print(f"Author: {paper.first_author}")
    print(f"All authors: {colored('; '.join(paper.author), 'green')}")
    print(f"Affiliations: {paper.aff[0] if paper.aff else 'N/A'}")

    print(colored("\nJOURNAL", "cyan"))
    print(f"Bibcode: {colored(paper.bibcode, 'cyan')}")
    print(f"Journal: {paper.pub}")
    print(f"Year: {paper.year}")
    print(f"DOI: {paper.doi[0]}")

    print(f"\nArXiv class: {paper.arxiv_class}")
    print(f"Bibstem: {paper.bibstem}")
    print(f"Database: {paper.database}")
    print(f"Properties: {paper.property}")

    print(colored("\nMETRICS", "yellow"))
    print(f"Citations: {colored(str(paper.citation_count), 'yellow')}")
    print(f"Read count: {paper.read_count}")
    print(f"Score: {paper.score}")

### <span style="color:salmon">**Example**</span> Using filters

Search for the _top 3 most cited_ article in Astronomy & Astrophysics.
Use all the available **filters** (`fq`). 

`bibstem` is the parameter for the journals (e.g. `A&A`, `AJ`, `ApJ`, `MNRAS`, `PASP`, `Natur`, `Sci`).

In [None]:
query = ads.SearchQuery(
    q="*",
    fl=["title",
        "author",
        "year",
        "citation_count"
    ],
    fq=["collection:Astronomy", 
        "property:refereed", 
        "pubdate:[1900-01-01 TO 2025-12-31]",
        "citation_count:[10 TO *]",
        "doctype:article",
        "bibstem:A&A"
        ],
    sort="citation_count desc",
    rows=3
)

for paper in query:
    print(f"\n{colored(paper.title[0], 'cyan')}")
    print(f"  Authors: {colored('; '.join(paper.author), 'green')}")
    print(f"  Year: {paper.year}")
    print(f"  Citations: {colored(str(paper.citation_count), 'yellow')}")


### <span style="color:salmon">**Example**</span> Search for word(s)

Search for the word _inflation_ in any part of the paper. 
Sort by citation count (more cited first).

**Tip:** If you are lazy loading attributes, multiple calls to the API are made, which is slower.

In [None]:
query = ads.SearchQuery(
    q="inflation", 
    fl=["title", 
    "citation_count", 
    "year", 
    "author"], 
    sort="citation_count"
)

for i, paper in enumerate(query, 1):
    authors = paper.author[0] if paper.author else "Unknown"
    print(f"{i}. {colored(paper.title[0], 'cyan')} ({paper.year}), {colored(authors, 'green')} — Citations: {colored(str(paper.citation_count), 'yellow')}")

### <span style="color:salmon">**Example**</span> Search excluding words

Search for recent papers on  <u>_gaia_</u> avoiding <u>_galaxies_</u> from the last 2 years.
Sort by date (more recent first).

In [None]:
query = ads.SearchQuery(
    q="gaia planet -galaxies -galaxy",
    fl=["title", "author", "year", "bibcode", "pub", "citation_count"],
    fq=["collection:Astronomy", "property:refereed", "year:[2023-01-01 TO *]"],
    sort="date desc",
    rows=10
)

for i, paper in enumerate(query, 1):
    authors = "; ".join(paper.author[:3]) + ("..." if len(paper.author) > 3 else "")
    print(f"\n{i}. {colored(paper.title[0], 'cyan')}")
    print(f"   Authors: {colored(authors, 'green')}")
    print(f"   Year: {paper.year} | Bibcode: {colored(paper.bibcode, 'blue')}")
    print(f"   Citations: {colored(str(paper.citation_count), 'yellow')}")


### <span style="color:salmon">**Example**</span> Search by object

Search papers focused on a specific object using _keyword_ content.

- With quotes: "Luhman 16" - Finds papers where "Luhman 16" appears EXACTLY as a keyword
- Without quotes: Luhman16 - Finds papers where "Luhman16" appears as a single word/keyword
- Use '*' or '?' as wildcards. Asterisk is _any_ character (including none); Question mark is _some_ character (including spaces).
- Hyphen (-) works as a space ( ).

GJ48* → GJ48, GJ486, GJ488...   
GJ4?6 → GJ486, GJ406...


In [None]:
query = ads.SearchQuery(
    q='keyword:"Luhman-16"',
    # q='abstract:Luhman 16',
    fl=["title", "author", "year", "bibcode", "citation_count", "keyword"],
    fq=["property:refereed"],
    sort="citation_count desc",
    rows=20
)

for i, paper in enumerate(query, 1):
    authors = "; ".join(paper.author[:3]) + ("..." if len(paper.author) > 3 else "")
    keywords = " - ".join(paper.keyword) if paper.keyword else "N/A"
    print(f"\n{i}. {colored(paper.title[0], 'cyan')}")
    print(f"   Authors: {colored(authors, 'green')}")
    print(f"   Year: {paper.year} | Bibcode: {colored(paper.bibcode, 'cyan')}")
    print(f"   Citations: {colored(str(paper.citation_count), 'yellow')}")
    print(f"   Keywords: {keywords}")
    

# 3. Automatise your bibliography

### <span style="color:salmon">**Example**</span> Fetching the BibTeX using author-year combination

Obtain the bibliographic entry in LaTeX format given the first-author name and the year:
(author, year) → bibcode → **BibTeX**  

In [None]:
import requests

first_author = "Mayor, M"
year = 1995

query = ads.SearchQuery(
    q=f'first_author:"{first_author}" AND year:{year} AND property:refereed',
    fl=["bibcode, citation_count"],
    sort="citation_count desc",
    rows=5
)

found = False
for paper in query:
    found = True
    bibcode = paper.bibcode
    print(f"Found: {colored(bibcode, 'cyan')}")
    print(f"Citation count: {colored(str(paper.citation_count), 'yellow')}")
    
    # Fetch BibTeX for this bibcode
    export_url = 'https://api.adsabs.harvard.edu/v1/export/bibtex'
    headers = {
        'Authorization': f'Bearer {ads.config.token}',
        'Content-Type': 'application/json'
    }
    payload = {'bibcode': [bibcode]}
    export_response = requests.post(export_url, headers=headers, json=payload)
    
    if export_response.status_code == 200:
        bibtex = export_response.json()['export']
        print(colored("\nBibTeX:", "yellow"))
        print(bibtex)
    else:
        print(f"Error: {export_response.text}")

if not found:
    print(colored(f"No refereed entries found for '{first_author}' in {year}.", "red"))


### <span style="color:salmon">**Example**</span> Fetching (several) BibTeX using (a list of) author-year combination. Create a bibliography file.

We can use the same utility as before, using a list of (author, year) to produce a **.bib** file ready to use in a manuscript.  

<span style="color:salmon">Caution:</span> This method usually finds the right paper by selecting the most cited one for each author-year pair. However, always double-check the results to make sure they're correct.


In [None]:
import pandas as pd
import io

# Create a CSV with author-year pairs
csv_data = """author;year
Mayor, M.;1995
Delfosse, X.;1999
Mamajek, E.;2010
Torres, G.;2012
Tokovinin, A.;2018"""

df = pd.read_csv(io.StringIO(csv_data), delimiter=';')

all_bibtex = []

for idx, row in df.iterrows():
    first_author = row['author']
    year = int(row['year'])
    
    query = ads.SearchQuery(
        q=f'first_author:"{first_author}" AND year:{year} AND property:refereed',
        fl=["bibcode", "citation_count"],
        sort="citation_count desc",
        rows=1
    )
    
    found = False
    for paper in query:
        found = True
        bibcode = paper.bibcode
        print(f"{colored(f'{first_author} ({year})', 'green')}: {colored(bibcode, 'cyan')} - Citations: {colored(str(paper.citation_count), 'yellow')}")
        
        # Fetch BibTeX
        export_url = 'https://api.adsabs.harvard.edu/v1/export/bibtex'
        headers = {
            'Authorization': f'Bearer {ads.config.token}',
            'Content-Type': 'application/json'
        }
        payload = {'bibcode': [bibcode]}
        export_response = requests.post(export_url, headers=headers, json=payload)
        
        if export_response.status_code == 200:
            bibtex = export_response.json()['export']
            all_bibtex.append(bibtex)
        else:
            print(f"  {colored('Error fetching BibTeX', 'red')}")
    
    if not found:
        print(f"{colored(f'No entry for {first_author} ({year})', 'red')}")

# Save all to a .bib file
with open("biblio.bib", "w") as f:
    f.write("\n".join(all_bibtex))

print(f"\n{'BibTeX file'} {colored('biblio.bib', 'yellow')} {'created with'} {colored(len(all_bibtex), 'green')} {'entries'}")


# 4. Utilities for your CV

### <span style="color:salmon">**Example**</span> List your refereed papers

- Search for all refereed papers by a given author (in astrophysics)
- Sort them by date (most recent first)
- Escape special LaTeX characters automatically
- Generate a complete document structure ready for copy-paste into your CV

In [None]:
import subprocess

author_name = "Cifuentes, C."
author_papers = ads.SearchQuery(
    author=author_name,
    fl=["author", "title", "year", "bibcode"],
    fq=["collection:Astronomy", "property:refereed", "pubdate:[1970-01-01 TO *]"],
    sort="date desc",
    rows=1000
)

latex_items = []
latex_items.append("\\documentclass[11pt]{article}")
latex_items.append("\\usepackage{xcolor}")
latex_items.append("\\usepackage{etaremune}")
latex_items.append("\\newcommand{\\paper}[1]{{\\textcolor{blue}{\\textit{#1}}}}")
latex_items.append("\\begin{document}")
latex_items.append("")
latex_items.append("\\title{\\huge{Curriculum vit\\ae}\\\\ \\large{\\textcolor{gray}{\\em the-deadline-is-tomorrow} version}}")
latex_items.append("\\author{C. Cifuentes}")
latex_items.append("\\date{\\today}")
latex_items.append("\\maketitle")
latex_items.append("")
latex_items.append("\\section{Refereed articles}")
latex_items.append("\\begin{enumerate}")

def escape_latex(text):
    """Escape special LaTeX characters"""
    replacements = {
        '&': '\\&',
        '%': '\\%',
        '$': '\\$',
        '#': '\\#',
        '_': '\\_',
        '{': '\\{',
        '}': '\\}',
        '~': '\\textasciitilde{}',
        '^': '\\textasciicircum{}',
        '″': '"',
        '°': '$\\degree$',
        '≥': '$\\geq$',
        '≤': '$\\leq$',
        '≃': '$\\simeq$',
        'α': '$\\alpha$',
        'β': '$\\beta$',
        'γ': '$\\gamma$',
        'δ': '$\\delta$',
        'ε': '$\\epsilon$',
        'ζ': '$\\zeta$',
        'λ': '$\\lambda$',
        'μ': '$\\mu$',
        'π': '$\\pi$',
        'Ω': '$\\Omega$',
        'θ': '$\\theta$',
        'Δ': '$\\Delta$',
        'Φ': '$\\Phi$',
        'Σ': '$\\Sigma$',
        'Ψ': '$\\Psi$',
        'φ': '$\\phi$',
        'ω': '$\\omega$',
        'Θ': '$\\Theta$',
        'Λ': '$\\Lambda$',
        'ϕ': '$\\phi$',
        'σ': '$\\sigma$',
        'ρ': '$\\rho$',
        'κ': '$\\kappa$',
        'τ': '$\\tau$',
        'η': '$\\eta$',
        '⋆': '$\\star$',
        'ä': '\\"a',
        'ë': '\\"e',
        'ö': '\\"o',
        'ü': '\\"u',
        'Ä': '\\"A',
        'Ë': '\\"E',
        'Ö': '\\"O',
        'Ü': '\\"U',
        '<SUB>eff</SUB>': '$_{\\rm eff}$',
        '<SUB>⊕</SUB>': '$_{\\oplus}$',
        '<SUB>☉</SUB>': '$_{\\odot}$',
        '<SUP>': '$^{',
        '</SUP>': '}$',
        '<SUB>': '$_{',
        '</SUB>': '}$'
    }
    for old, new in replacements.items():
        text = text.replace(old, new)
    return text

for p in author_papers:
    first_author = escape_latex(p.author[0])
    title = escape_latex(p.title[0])
    year = p.year
    bibcode = f"\\texttt{{{escape_latex(p.bibcode)}}}"
    item = f"  \\item {first_author} ({year}) \\paper{{{title}}} {bibcode}"
    latex_items.append(item)

latex_items.append("\\end{enumerate}")
latex_items.append("")
latex_items.append("\\end{document}")

latex_text = "\n".join(latex_items)

# Copy to clipboard
try:
    process = subprocess.Popen(['pbcopy'], stdin=subprocess.PIPE)
    process.communicate(latex_text.encode('utf-8'))
    print(colored("\nLaTeX document copied to clipboard", "green"))
except:
    print(colored("\nError!", "red"))

# Show preview
print(colored("\n" + "="*80, "yellow"))
print(colored("LATEX DOCUMENT PREVIEW:", "yellow"))
print(colored("="*80, "yellow"))
print(latex_text[:400] + "..." if len(latex_text) > 500 else latex_text)

### <span style="color:salmon">**Example**</span> Compute citation indices (h-index, i-10, i-100)


In [None]:
author = "Henning, Thomas"
query = ads.SearchQuery(
    author=author,
    fl=["citation_count"],
    fq=["collection:Astronomy", "property:refereed"],
    rows=1000
)

citations = sorted([paper.citation_count for paper in query if paper.citation_count], reverse=True)

# Calculate h-index
h_index = 0
for i, c in enumerate(citations):
    if c >= i + 1:
        h_index = i + 1

# Calculate i-10 and i-100
i_10 = sum(1 for c in citations if c >= 10)
i_100 = sum(1 for c in citations if c >= 100)

print(colored(f"Author: {author}", "cyan"))
print(colored("-" * len(f"Author: {author}"), "cyan"))
print(colored(f"h-index: {h_index}", "yellow"))
print(colored(f"i-10 index: {i_10}", "green"))
print(colored(f"i-100 index: {i_100}", "red"))
print(f"Total refereed papers: {len(citations)}")


# 5. Customised uses

### <span style="color:salmon">**Example**</span> Create a customised weekly digest in your mail inbox

Schedule a weekly digest (e.g on Monday) in your email featuring articles from a given topic, object, or author.
Delivered via SMTP.

In [None]:
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from datetime import datetime

# Configuration
TOPIC = "triple+stars"
YOUR_EMAIL = "yourmail@gmail.com"
YOUR_PASSWORD = "***"  # Use app-specific password for Gmail
RECIPIENT_EMAIL = "yourmail@gmail.com"

# Query ADS for recent papers
query = ads.SearchQuery(
    q=TOPIC,
    fl=["title", "author", "year", "bibcode", "pubdate", "citation_count", "abstract"],
    fq=["collection:Astronomy", "property:refereed", "pubdate:[NOW-30DAYS TO NOW]"],
    sort="pubdate desc",
    rows=10
)

papers_html = []
count = 0

for paper in query:
    count += 1
    authors = "; ".join(paper.author[:3]) + ("..." if len(paper.author) > 3 else "")
    abstract_preview = paper.abstract[:200] + "..." if paper.abstract else "No abstract available"
    
    paper_html = f"""
    <div style="margin-bottom: 20px; padding: 15px; border-left: 3px solid #4CAF50;">
        <h3 style="color: #2E86AB; margin-top: 0;">{count}. {paper.title[0]}</h3>
        <p><strong>Authors:</strong> {authors}</p>
        <p><strong>Year:</strong> {paper.year} | <strong>Bibcode:</strong> <code>{paper.bibcode}</code></p>
        <p><strong>Citations:</strong> {paper.citation_count if paper.citation_count else 0}</p>
        <p><em>{abstract_preview}</em></p>
        <a href="https://ui.adsabs.harvard.edu/abs/{paper.bibcode}" style="color: #4CAF50;">View on ADS</a>
    </div>
    """
    papers_html.append(paper_html)

# Create email
msg = MIMEMultipart('alternative')
msg['Subject'] = f"Weekly ADS Digest: {TOPIC.capitalize()} ({datetime.now().strftime('%Y-%m-%d')})"
msg['From'] = YOUR_EMAIL
msg['To'] = RECIPIENT_EMAIL

html = f"""
<html>
<body style="font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px;">
    <h1 style="color: #2E86AB;">Weekly Literature Digest</h1>
    <h2 style="color: #555;">Topic: {TOPIC.capitalize()}</h2>
    <p style="color: #777;">Generated on {datetime.now().strftime('%B %d, %Y')}</p>
    <hr style="border: 1px solid #ddd;">
    
    {"".join(papers_html) if papers_html else "<p>No new papers found this week.</p>"}
    
    <hr style="border: 1px solid #ddd; margin-top: 30px;">
    <p style="color: #999; font-size: 0.9em;">This digest was automatically generated using the NASA ADS API.</p>
</body>
</html>
"""

msg.attach(MIMEText(html, 'html'))

# Send email
try:
    with smtplib.SMTP_SSL('smtp.gmail.com', 465) as server:
        server.login(YOUR_EMAIL, YOUR_PASSWORD)
        server.send_message(msg)
    print(colored("✓ Email sent successfully!", "green"))
except Exception as e:
    print(colored(f"✗ Error sending email: {e}", "red"))



### How to schedule this digest weekly

**Step 1:** Save the code as a standalone Python script.

Create a file named `ads_digest.py` in your home directory (or any location):

```bash
# Create the script file
nano ~/ads_digest.py
# Copy the code from the cell above and paste it
# Replace email credentials with environment variables
```

**Step 2:** Use environment variables for security (recommended).

Update `~/.zshrc`:
```bash
nano ~/.zshrc

# Add these lines (don't hardcode your real password!)
export EMAIL_ADDRESS="your_email@gmail.com"
export EMAIL_APP_PASSWORD="xxxx xxxx xxxx xxxx"
export ADS_DEV_KEY="your_ads_key"

# Save and reload
source ~/.zshrc
```

Update the Python script to use environment variables:
```python
import os

YOUR_EMAIL = os.getenv('EMAIL_ADDRESS')
YOUR_PASSWORD = os.getenv('EMAIL_APP_PASSWORD')
```

**Step 3:** Schedule with `cron` (macOS/Linux).

Open your crontab editor:
```bash
crontab -e
```

Add this line to run every **Monday at 9:00 AM**:
```bash
0 9 * * 1 /usr/bin/python3 ~/ads_digest.py
```

**Cron schedule breakdown:**
- `0` = minute (0)
- `9` = hour (9 AM)
- `*` = day of month (any)
- `*` = month (any)
- `1` = day of week (1 = Monday)

Other examples:
- `0 9 * * *` = Every day at 9 AM
- `0 9 * * 1-5` = Monday to Friday at 9 AM
- `0 9,18 * * 1` = Mondays at 9 AM and 6 PM

**Step 4:** Verify it's scheduled.

```bash
# List your cron jobs
crontab -l

# Check cron logs (macOS)
log stream --predicate 'process == "cron"' --level debug
```

**Step 5:** Test before scheduling.

```bash
# Run manually to test
python3 ~/ads_digest.py
```

### <span style="color:salmon">**Example**</span> Obtain a list of papers that cite your work

Searches for all first-author publications (refereed) and retrieves their recent citations. For each paper with citations, displays the list of citing articles.

In [None]:
author = "Cifuentes, C."
lookback_days = 45
max_first_author_papers = 2000

# Get bibcodes for first-author papers (refereed or not)
first_author_query = ads.SearchQuery(
    q=f'first_author:"{author}"',
    fl=["bibcode", "title", "year", "pubdate"],
    fq=["collection:Astronomy", "property:refereed"],
    rows=max_first_author_papers
)
first_author_papers = list(first_author_query)
first_author_bibcodes = [paper.bibcode for paper in first_author_papers]

print(colored(f"Monitoring {len(first_author_bibcodes)} first-author publications by {author}", "cyan"))
print(colored(f"\nRecent citations to your first-author publications (last {lookback_days} days):", "yellow"))

if not first_author_bibcodes:
    print(colored("  No first-author papers found", "grey"))
else:
    pubdate_filter = f"pubdate:[NOW-{lookback_days}DAYS TO NOW]"
    total_with_citations = 0

    for bibcode in first_author_bibcodes:
        citing_query = ads.SearchQuery(
            q=f"citations({bibcode})",
            fl=["bibcode", "title", "author", "year", "pubdate"],
            fq=["collection:Astronomy", pubdate_filter],
            rows=2000
)
        citing_papers = list(citing_query)
        if not citing_papers:
            continue

        total_with_citations += 1
        print()
        print(colored(f"{len(citing_papers)} citations found for ", "white") + colored(f"{bibcode}", "green") + colored(f" in the last {lookback_days} days:", "white"))
        for i, citing_paper in enumerate(citing_papers, 1):
            title = citing_paper.title[0] if citing_paper.title else "Untitled"
            author_str = citing_paper.author[0] if citing_paper.author else "Unknown"
            print(colored(f"  {i}. {author_str} ({citing_paper.year}): {title}", "grey"))

    if total_with_citations == 0:
        print(colored("  No new citations detected in the last period", "grey"))


# 6. Research trends

### <span style="color:salmon">**Example**</span> Track keyword mentions over years

Note: ADS API uses a limit of 2000 articles, to overcome this limitation we use pagination.

In [None]:
keywords = {
    "hipparcos": "red",
    "gaia": "grey"
}

# keywords = {
#     "Oumuamua": "red",
#     "interestellar object": "grey"
# }

ax, fig = plt.subplots(figsize=(12, 5))

for keyword, color in keywords.items():
    papers_by_month = {}
    
    # ADS API has a 2000 row limit per request, so we need pagination
    start = 0
    batch_size = 2000
    total_fetched = 0
    
    print(colored(f"\nFetching papers for '{keyword}'...", "cyan"))
    
    while True:
        query = ads.SearchQuery(
            q=f'title:"{keyword}" OR abstract:"{keyword}" OR keyword:"{keyword}"',
            fl=["pubdate"],
            fq=["collection:Astronomy"],
            rows=batch_size,
            start=start
        )
        
        batch_count = 0
        for paper in query:
            batch_count += 1
            if paper.pubdate:
                month = paper.pubdate[:7]
                papers_by_month[month] = papers_by_month.get(month, 0) + 1
        
        total_fetched += batch_count
        print(colored(f"  Batch: {total_fetched} papers fetched...", "grey"))
        
        # If we got fewer than batch_size, we've reached the end
        if batch_count < batch_size:
            break
        
        start += batch_size
    
    print(colored(f"  Total for '{keyword}': {total_fetched} papers", "green"))

    months = sorted(papers_by_month.keys())
    counts = [papers_by_month[m] for m in months]

    # Convert to datetime and filter invalid dates
    month_dates = []
    valid_counts = []
    for i, m in enumerate(months):
        try:
            if '-00' not in m and m.count('-') == 1:
                date_obj = datetime.strptime(m + '-01', '%Y-%m-%d').date()
                month_dates.append(date_obj)
                valid_counts.append(counts[i])
        except ValueError:
            continue

    plt.bar(month_dates, valid_counts, width=20, color=color, alpha=0.7, label=keyword)

plt.xlabel('Date')
plt.ylabel('Number of Papers')
plt.ylim(bottom=0)
plt.xlim(left=min(month_dates))
plt.legend()
plt.tight_layout()
plt.show()


### <span style="color:salmon">**Example**</span> Track publications based on topics

Obtain the refereed publications on **eclipsing binaries** in the last 90 days. Avoid white dwarfs and neutron stars. Sort by date (most recent first).


In [None]:
query = ads.SearchQuery(
    q="eclipsing AND binary -neutron -white",
    fl=["title", "author", "year", "bibcode", "pubdate", "citation_count"],
    fq=["collection:Astronomy", "property:refereed", "pubdate:[NOW-90DAYS TO NOW]"],
    sort="pubdate desc",
    rows=100
)

count = 0
for paper in query:
    count += 1
    authors = "; ".join(paper.author[:3]) + ("..." if len(paper.author) > 3 else "")
    print(f"\n{count}. {colored(paper.title[0], 'cyan')}")
    print(f"   Authors: {authors}")
    print(f"   Year: {paper.year} | Bibcode: {colored(paper.bibcode, 'green')}")
    print(f"   Citations: {colored(str(paper.citation_count), 'yellow')}")

if count == 0:
    print(colored("No new papers found in this period.", "yellow"))
else:
    print(f"\n{colored(f'Total: {count} new papers', 'green')}")


### <span style="color:salmon">**Example**</span> Visualize citations as a function of time


In [None]:
author = "Perlmutter, S."

query = ads.SearchQuery(
    q=f'first_author:"{author}"',
    fl=["title", "author", "year", "citation_count", "read_count", "bibcode", "pubdate"],
    fq=["collection:Astronomy", "property:refereed", "year:[1960 TO 2025]"],
    sort="citation_count desc",
    rows=1
)

# Step 1: Get the most cited paper
for paper in query:
    first_author = paper.author[0] if paper.author else "Unknown"
    title = paper.title[0]
    bibcode = paper.bibcode
    year = paper.year
    citations = paper.citation_count if paper.citation_count else 0
    reads = paper.read_count if paper.read_count else 0
    pubdate = paper.pubdate if paper.pubdate else "N/A"
    
    print(f"Most cited article of {author}")
    print(colored(f"Author: {first_author}", "green"))
    print(colored(f"Title: {title}", "cyan"))
    print(f"Bibcode: {bibcode}")
    print(f"Year: {year}")
    print(f"Publication Date: {pubdate}")
    print(colored(f"\nCitations: {citations}", "yellow"))
    print(colored(f"Reads: {reads}", "green"))

# Step 2: Get articles that cite the most cited paper (paginate all results)
citing_papers = []
start = 0
batch_size = 2000

while True:
    citing_query = ads.SearchQuery(
        q=f'citations(bibcode:"{bibcode}")',
        fl=["title", "author", "year", "pubdate", "citation_count"],
        fq=["collection:Astronomy", "property:refereed"],
        sort="pubdate asc",
        rows=batch_size,
        start=start
    )
    
    batch_count = 0
    for paper in citing_query:
        batch_count += 1
        citing_papers.append({
            'first_author': paper.author[0] if paper.author else "Unknown",
            'year': paper.year,
            'title': paper.title[0],
            'pubdate': paper.pubdate if hasattr(paper, 'pubdate') and paper.pubdate else "N/A",
            'citations': paper.citation_count if paper.citation_count else 0
        })
    
    if batch_count < batch_size:
        break
    
    start += batch_size

print(f"\nTotal articles citing this paper: {len(citing_papers)}")


def extract_date(pubdate):
    """Parse pubdate with flexible format handling"""
    if not pubdate or pubdate == 'N/A':
        return None
    pubdate = pubdate.strip()
    # Handle ADS API's YYYY-MM-00 format (day is always 00)
    if pubdate.endswith('-00'):
        pubdate = pubdate[:-3] + '-01'
    date_formats = ['%Y-%m-%d', '%Y-%m', '%Y']
    for date_fmt in date_formats:
        try:
            return datetime.strptime(pubdate, date_fmt).date()
        except (ValueError, TypeError):
            continue
    return None

# Simple cumulative citations plot
if len(citing_papers) > 0:
    papers_with_dates = []
    for paper in citing_papers:
        date_obj = extract_date(paper['pubdate'])
        if date_obj:
            papers_with_dates.append((date_obj, paper))
    
    papers_with_dates.sort(key=lambda x: x[0])
    
    if papers_with_dates:
        dates = [p[0] for p in papers_with_dates]
        cumulative = []
        total = 0
        for date, paper in papers_with_dates:
            total += 1
            cumulative.append(total)
        
        plt.figure(figsize=(8, 5))
        plt.bar(dates, cumulative, width=20, color='black')
        plt.xlabel('Date')
        plt.ylabel('Cumulative Citations')
        plt.ylim(bottom=0)
        plt.xlim(left=datetime(1998, 1, 1).date(), right=max(dates))
        plt.tight_layout()
        plt.show()

### <span style="color:salmon">**Example**</span> Find frequent collaborators


In [None]:
author = "Mayor, M."
query = ads.SearchQuery(
    author=author,
    fl=["author", "aff"],
    fq=["collection:Astronomy", "property:refereed"],
    rows=1000
)

co_author_count = {}
co_author_papers = {}  # Store paper count for each co-author

# Extract last name from input author for robust comparison
author_lastname = author.split(',')[0].strip() if ',' in author else author.split()[0].strip()

for paper in query:
    if paper.author:
        for co_author in paper.author:
            # Extract last name from co-author
            coauthor_lastname = co_author.split(',')[0].strip() if ',' in co_author else co_author.split()[0].strip()
            
            # Skip if it's the same author (check exact match, partial match, and last name match)
            if (co_author.strip() == author.strip() or 
                author in co_author or 
                co_author in author or
                author_lastname.lower() == coauthor_lastname.lower()):
                continue
            
            co_author_count[co_author] = co_author_count.get(co_author, 0) + 1
            
            co_author_count[co_author] = co_author_count.get(co_author, 0) + 1
                
            # Store affiliations only from papers where this co-author is first author
            if co_author not in co_author_papers:
                co_author_papers[co_author] = {
                    'aff': paper.aff[0] if paper.aff else "N/A",
                    'is_first': paper.author[0] == co_author if paper.author else False
                }

# Sort by frequency
top_collaborators = sorted(co_author_count.items(), key=lambda x: x[1], reverse=True)[:10]

print(colored(f"Top collaborators of {author}:", "yellow"))
for i, (coauth, count) in enumerate(top_collaborators, 1):
    aff_info = co_author_papers.get(coauth, {})
    aff = aff_info.get('aff', 'N/A')[:50] if aff_info else "N/A"
    is_first = aff_info.get('is_first', False) if aff_info else False
    print(f"  {i}. {colored(coauth, 'cyan')} ({count} papers)")
    aff_note = " (first author)" if is_first else ""
    print(colored(f"     Affiliation: {aff}{aff_note}", "grey"))
