# Collection of articles - Scholarly
### Author : Maela Guillaume-Le Gall
#### Date : 13/02/2025

**The purpose of this code is to systematically extract academic articles on Environmental Impacts of AI in Europe. It uses the 'scholarly' package for systematic searches on google scholar.


This code sets up a headless Chrome browser using Selenium and installs necessary dependencies to scrape full abstracts from academic articles. It searches for articles related to the query "Environmental Impacts of AI, Europe" on Google Scholar, retrieves the first five results, and attempts to fetch and display the full abstract from each article's URL, handling cookie banners and potential variations in webpage structure. After the results are printed, the Selenium WebDriver is closed. **PB : all the abstracts are not displayed

In [1]:
!pip install scholarly

# Import necessary libraries
import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
# Instead, ensure you have Chrome installed and adjust the binary location:
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-gpu')
# Update the binary location to your Windows Chrome path:
chrome_options.binary_location = r'C:\Program Files\Google\Chrome\Application\chrome.exe'

# Continue with the rest of your setup:
driver = webdriver.Chrome(options=chrome_options)

# Automatically install and match the correct version of ChromeDriver
chromedriver_autoinstaller.install()

# Set up ChromeDriver with the specified options
driver = webdriver.Chrome(options=chrome_options)



In [2]:
# Function to handle cookies and fetch full abstract from the article's URL (if available)
def get_full_abstract(url):
    try:
        driver.get(url)  # Open the article URL
        time.sleep(3)  # Wait for the page to load

        # Wait for cookies banner or pop-up to appear, and attempt to close it
        try:
            cookie_button = WebDriverWait(driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Accept') or contains(text(),'Close')]"))
            )
            cookie_button.click()
            time.sleep(2)  # Allow time for the cookie banner to be dismissed
        except:
            pass  # If no cookie banner, continue

        # Wait for abstract to load, use WebDriverWait to ensure content is visible
        abstract = ""
        try:
            abstract_section = WebDriverWait(driver, 10).until(
                EC.visibility_of_element_located((By.XPATH, "//h2[contains(text(),'Abstract')]/following-sibling::p"))
            )
            abstract = abstract_section.text
        except:
            try:
                # Fallback if abstract is in a different section
                abstract_section = WebDriverWait(driver, 10).until(
                    EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'abstract') or contains(@class,'section')]/p"))
                )
                abstract = abstract_section.text
            except:
                # Fallback to paragraphs if abstract still not found
                paragraphs = driver.find_elements(By.TAG_NAME, 'p')
                for para in paragraphs:
                    text = para.text.strip()
                    if text and len(text.split()) > 5:
                        abstract += text + '\n'

        return abstract if abstract else "No abstract found"

    except Exception as e:
        return f"Error fetching abstract: {str(e)}"

# Search query on Google Scholar
from scholarly import scholarly
query = "Environmental Impacts of AI"
search_query = scholarly.search_pubs(query)

# Retrieve the first 5 results
articles = []
for i in range(5):  # Adjust the number of results as needed
    article = next(search_query)
    articles.append(article)

# Display the results with full abstracts
for i, article in enumerate(articles, start=1):
    print(f"Article {i}:")
    print(f"Title: {article['bib']['title']}")
    print(f"Author(s): {article['bib']['author']}")
    print(f"Year: {article['bib']['pub_year']}")
    print(f"Citations: {article['num_citations']}")

    # Try to fetch the full abstract from the article's URL
    full_abstract = get_full_abstract(article['pub_url'])
    print(f"Full Abstract: {full_abstract}")

    print(f"URL: {article['pub_url']}")
    print("-" * 40)

# Close the Selenium WebDriver after scraping
driver.quit()

Article 1:
Title: Unraveling the hidden environmental impacts of AI solutions for environment life cycle assessment of AI solutions
Author(s): ['AL Ligozat', 'J Lefevre', 'A Bugeau', 'J Combaz']
Year: 2022
Citations: 103
Full Abstract: No abstract found
URL: https://www.mdpi.com/2071-1050/14/9/5172
----------------------------------------
Article 2:
Title: Sustainable AI: AI for sustainability and the sustainability of AI
Author(s): ['A Van Wynsberghe']
Year: 2021
Citations: 697
Full Abstract: While there is a growing effort towards AI for Sustainability (e.g. towards the sustainable development goals) it is time to move beyond that and to address the sustainability of developing and using AI systems. In this paper I propose a definition of Sustainable AI; Sustainable AI is a movement to foster change in the entire lifecycle of AI products (i.e. idea generation, training, re-tuning, implementation, governance) towards greater ecological integrity and social justice. As such, Sustainabl