<a href="https://colab.research.google.com/github/bthomson2/github.io/blob/main/gemini_ads_doi_workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview

Here’s a potential workflow:

Set up Google Colab: You will be using a Google Colab notebook as the environment for your Python script [1, 2].
Install Gemini API Libraries: You'll need to install the necessary Python libraries to interact with the Gemini API. This typically involves using pip install within the Colab notebook [3].
Authenticate Gemini API: You will need to obtain and securely store your Gemini API key within Colab, possibly using the secrets tab [3, 4].
Create a Prompt for ADS: You will need to construct a prompt that instructs Gemini to search the ADS website (https://ui.adsabs.harvard.edu/) for specific keywords.
Call the Gemini API: Use the Python SDK to send the prompt to the chosen Gemini model (e.g., Gemini Pro, Gemini 2.0 Flash) [2]. You might need to explore the capabilities of different Gemini models to see which provides the best results for this task.
Extract DOIs: Write code to parse the response from Gemini and extract the DOIs of the identified papers.
Export to File: Implement functionality to save the extracted DOIs to a file (e.g., a CSV or TXT file).
Open DOIs in Browser: Create a script that reads the DOIs from the file and opens a specified number of them in a web browser using Python's webbrowser module.

# Get keywords from the user

In [None]:
# Get keywords as input from the user
keywords = input("Enter the keywords to search on ADS (comma-separated): ")
print(f"You entered the following keywords: {keywords}")


# 4. Extracting the DOIs from the response
# 5. Exporting the DOIs to a CSV file
# 6. Opening the first 5 DOIs in a web browser

# Setting up the Gemini API

In [None]:
# Install the Gemini API library
!pip install -q google-generativeai

# Import the necessary library
import google.generativeai as genai

# Configure the Gemini API key
# You will need to obtain your API key from Google AI Studio (ai.google.dev)
# and store it securely in Colab (e.g., using the Secrets tab)
from google.colab import userdata
try:
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=GOOGLE_API_KEY)
except KeyError:
    print("Error: Please store your Gemini API key in Colab's Secrets tab as 'GOOGLE_API_KEY'")
    exit()

# Select the Gemini model
model = genai.GenerativeModel('gemini-2.0-flash') # Or 'gemini-pro'

# Construct the prompt and call Gemini API using the keywords

In [None]:
# Construct the prompt for Gemini to search ADS
prompt = f"""Search the NASA ADS database at https://ui.adsabs.harvard.edu/ for papers related to the following keywords: '{keywords}'.
Return a list of the DOIs of the papers you find. If no DOIs are found, please indicate that.
Only provide the list of DOIs or a statement indicating no DOIs were found."""

# Call the Gemini API
try:
    response = model.generate_content(prompt)
    #response.raise_for_status() # Raise an exception for bad status codes
    doi_results = response.text
    print("Gemini's response:")
    print(doi_results)
except Exception as e:
    print(f"An error occurred while calling the Gemini API: {e}")
    doi_results = ""

In [None]:
from google.colab import userdata
userdata.get('ADS_API_KEY')

# Extract DOIs from response

In [None]:
import csv
import re
import webbrowser

def extract_dois(text):
    """Extracts DOIs from a string using regular expressions."""
    doi_pattern = r'10\.\d{4,9}/[-._;()/:A-Z0-9]+'
    dois = re.findall(doi_pattern, text, re.IGNORECASE)
    return dois

# Get keywords as input from the user
keywords = input("Enter the keywords to search on ADS (comma-separated): ")
print(f"You entered the following keywords: {keywords}")

# Install the Gemini API library
# Note: In Google Colab, you need to use ! before shell commands
!pip install -q google-generativeai

# Import the necessary library
import google.generativeai as genai

# Configure the Gemini API key
# You will need to obtain your API key from Google AI Studio (ai.google.dev)
# and store it securely in Colab (e.g., using the Secrets tab)
from google.colab import userdata
try:
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=GOOGLE_API_KEY)
except KeyError:
    print("Error: Please store your Gemini API key in Colab's Secrets tab as 'GOOGLE_API_KEY'")
    exit()

# Select the Gemini model
model = genai.GenerativeModel('gemini-2.0-flash') # Or 'gemini-pro'

# Construct the prompt for Gemini to search ADS
prompt = f"""Search the NASA ADS database at https://ui.adsabs.harvard.edu/ for papers related to the following keywords: '{keywords}'.
Return a list of the DOIs of the papers you find. If no DOIs are found, please indicate that.
Only provide the list of DOIs or a statement indicating no DOIs were found."""

# Call the Gemini API
try:
    response = model.generate_content(prompt)
    #response.raise_for_status() # Raise an exception for bad status codes
    doi_results = response.text
    print("Gemini's response:")
    print(doi_results)
    dois = extract_dois(doi_results)  # Extract DOIs here

except Exception as e:
    print(f"An error occurred while calling the Gemini API: {e}")
    dois = [] # Initialize dois even if there's an error

if dois:
    print("Extracted DOIs:")
    for doi in dois:
        print(doi)

    # Save the DOIs to a CSV file
    csv_file_path = "ads_dois.csv"
    try:
        with open(csv_file_path, mode='w', newline='') as csvfile:
            doi_writer = csv.writer(csvfile)
            doi_writer.writerow(['DOI'])  # Write header
            for doi in dois:
                doi_writer.writerow([doi])
        print(f"\nDOIs saved to {csv_file_path}")

        # Open the first 5 DOIs in a web browser
        num_to_open = min(5, len(dois))
        print(f"\nOpening the first {num_to_open} DOIs in your web browser:")
        for i in range(num_to_open):
            url = f"https://doi.org/{dois[i]}"  # Construct the DOI URL
            webbrowser.open_new_tab(url)

    except Exception as e:
        print(f"An error occurred while saving to CSV or opening URLs: {e}")

else:
    print("No DOIs found in Gemini's response.")

If using the NASA ADS API package, rather than Gemini searching ADS, use the code below.

In [None]:
!pip install ads

In [None]:
import ads
import csv
import webbrowser
import os

# Configure the ADS API token
# You need to set the ADS_API_TOKEN environment variable
# or create an .adsclirc file with the token.
# See: https://ads.readthedocs.io/en/v1/config.html
os.environ["ADS_API_KEY"] = userdata.get("ADS_API_KEY")



# Get keywords as input from the user
keywords_str = input("Enter the keywords to search on ADS (comma-separated): ")
keywords = [kw.strip() for kw in keywords_str.split(',')]

try:
    # Search ADS for the keywords
    papers = ads.SearchQuery(q=keywords)
    dois = [paper.doi[0] for paper in papers if paper.doi]

    if dois:
        print("Found the following DOIs:")
        for doi in dois:
            print(doi)

        # Save the DOIs to a CSV file
        csv_file_path = "ads_dois.csv"
        with open(csv_file_path, mode='w', newline='') as csvfile:
            doi_writer = csv.writer(csvfile)
            doi_writer.writerow(['DOI'])  # Write header
            for doi in dois:
                doi_writer.writerow([doi])
        print(f"\nDOIs saved to {csv_file_path}")

        # Open the first 5 DOIs in a web browser
        num_to_open = min(5, len(dois))
        print(f"\nOpening the first {num_to_open} DOIs in your web browser:")
        for i in range(num_to_open):
            url = f"https://doi.org/{dois[i]}"
            webbrowser.open_new_tab(url)

    else:
        print("No papers found for the given keywords.")

except ads.exceptions.APIQueryError as e:
    print(f"An error occurred while querying the ADS API: {e}")