<a href="https://colab.research.google.com/github/MaelaGLG/Policy-In-Action-ARCEP/blob/main/Python%20Script/Collection_of_articles_scholarly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collection of articles - Scholarly
### Author : Maela Guillaume-Le Gall
### Date : 13/02/2025

The purpose of this code is to systematically extract academic articles on Environmental Impacts of AI in Europe. It uses the 'scholarly' package for systematic searches on google scholar.

In [None]:
# Installing scholarly package
!pip install scholarly


Collecting scholarly
  Downloading scholarly-1.7.11-py3-none-any.whl.metadata (7.4 kB)
Collecting arrow (from scholarly)
  Downloading arrow-1.3.0-py3-none-any.whl.metadata (7.5 kB)
Collecting bibtexparser (from scholarly)
  Downloading bibtexparser-1.4.3.tar.gz (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.6/55.6 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting fake-useragent (from scholarly)
  Downloading fake_useragent-2.1.0-py3-none-any.whl.metadata (17 kB)
Collecting free-proxy (from scholarly)
  Downloading free_proxy-1.1.3.tar.gz (5.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting python-dotenv (from scholarly)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting selenium (from scholarly)
  Downloading selenium-4.29.0-py3-none-any.whl.metadata (7.1 kB)
Collecting sphinx-rtd-theme (from scholarly)
  Downloading sphinx_rtd_theme-3.0.2-py2.p

## Making a list of academic articles
The following code searches for academic articles related to the query "Environmental Impacts Artificial Intelligence Europe" on Google Scholar, retrieves and sorts the top 30 articles based on keyword match, publication year, and citations. It also fetches and displays the full abstract (if available) for each article by scraping the respective URL. The results, including article titles, authors, years, citations, abstracts, and URLs, are printed for the user.

In [None]:
from scholarly import scholarly
import requests
from bs4 import BeautifulSoup

# Search query
query = "Environmental Impacts Artificial Intelligence Europe"

# Function to fetch full abstract from the article's URL (if available)
def get_full_abstract(url):
    try:
        # Send a GET request to the article URL
        page = requests.get(url)
        soup = BeautifulSoup(page.content, 'html.parser')

        # Try to find the abstract within the HTML (this can vary between websites)
        abstract_section = soup.find('div', {'class': 'abstract'})
        if abstract_section:
            return abstract_section.get_text(strip=True)
        else:
            return 'Full abstract not available'
    except Exception as e:
        return f"Error fetching abstract: {str(e)}"

# Function to search articles and sort them
def search_and_sort_articles(query, num_results=30):
    # Search on Google Scholar
    search_query = scholarly.search_pubs(query)

    # Retrieve and store articles
    articles = []
    for _ in range(num_results):
        article = next(search_query)
        articles.append(article)

    # Sort articles: first by keyword match, then by publication year, and finally by citations
    articles_sorted = sorted(
        articles,
        key=lambda x: (
            query.lower() in x['bib']['title'].lower(),  # Check for keyword match in title
            -int(x['bib']['pub_year']),                  # Sort by year (recent first)
            -int(x['num_citations'])                     # Sort by citations (most first)
        ),
        reverse=False
    )

    return articles_sorted

# Fetch the top 30 articles sorted by the defined criteria
articles = search_and_sort_articles(query)

# Display the results with full abstracts
for i, article in enumerate(articles, start=1):
    print(f"Article {i}:")
    print(f"Title: {article['bib']['title']}")
    print(f"Author(s): {article['bib']['author']}")
    print(f"Year: {article['bib']['pub_year']}")
    print(f"Citations: {article['num_citations']}")

    # Try to fetch the full abstract from the article's URL
    full_abstract = get_full_abstract(article['pub_url'])
    print(f"Full Abstract: {full_abstract}")

    print(f"URL: {article['pub_url']}")
    print("-" * 40)


Article 1:
Title: Smarter eco-cities and their leading-edge artificial intelligence of things solutions for environmental sustainability: A comprehensive systematic review
Author(s): ['SE Bibri', 'J Krogstie', 'A Kaboli', 'A Alahi']
Year: 2024
Citations: 325
Full Abstract: Full abstract not available
URL: https://www.sciencedirect.com/science/article/pii/S2666498423000959
----------------------------------------
Article 2:
Title: A review of green artificial intelligence: Towards a more sustainable future
Author(s): ['V Bolón-Canedo', 'L Morán-Fernández', 'B Cancela']
Year: 2024
Citations: 52
Full Abstract: Full abstract not available
URL: https://www.sciencedirect.com/science/article/pii/S0925231224008671
----------------------------------------
Article 3:
Title: Ecological footprints, carbon emissions, and energy transitions: the impact of artificial intelligence (AI)
Author(s): ['Q Wang', 'Y Li', 'R Li']
Year: 2024
Citations: 49
Full Abstract: Full abstract not available
URL: https:



Full Abstract: Full abstract not available
URL: https://files.eric.ed.gov/fulltext/ED639262.pdf#page=78
----------------------------------------
Article 16:
Title: Digitalization and AI in European agriculture: a strategy for achieving climate and biodiversity targets?
Author(s): ['B Garske', 'A Bau', 'F Ekardt']
Year: 2021
Citations: 125
Full Abstract: Full abstract not available
URL: https://www.mdpi.com/2071-1050/13/9/4652
----------------------------------------
Article 17:
Title: The role of artificial intelligence in the European green deal
Author(s): ['P Gailhofer', 'A Herold', 'JP Schemmel', 'CS Scherf']
Year: 2021
Citations: 84
Full Abstract: Full abstract not available
URL: https://www.academia.edu/download/75652615/IPOL_STU2021662906_EN.pdf
----------------------------------------
Article 18:
Title: Interpretation of the views of east European Catholics on the impact of artificial intelligence on the social environment
Author(s): ['MV Vinichenko', 'EV Frolova']
Year: 2021
Ci



Full Abstract: Full abstract not available
URL: http://www.ejst.tuiasi.ro/Files/86/2_Vinichenko%20et%20al.pdf
----------------------------------------
Article 19:
Title: Artificial intelligence–challenges and chances for Europe
Author(s): ['J Straus']
Year: 2021
Citations: 15
Full Abstract: As one of the building blocks of the fourth industrial revolution, artificial intelligence has attracted much public attention and sparked protracted discussions about its impact on future technological, economic and social developments. This contribution conveys insights into artificial intelligence’s basic methods and tools, its main achievements, its economic environment and the surrounding ethical and social issues. Based on the announced and taken measures of the EU organs in the area of artificial intelligence, the contribution analyses the position of Europe in the global context.
URL: https://www.cambridge.org/core/journals/european-review/article/artificial-intelligence-challenges-and-chanc



Full Abstract: Full abstract not available
URL: https://ai-watch.ec.europa.eu/system/files/2022-01/dpad_report.pdf
----------------------------------------
Article 21:
Title: Societal and ethical impacts of artificial intelligence: Critical notes on European policy frameworks
Author(s): ['L Vesnic-Alujevic', 'S Nascimento', 'A Polvora']
Year: 2020
Citations: 175
Full Abstract: Full abstract not available
URL: https://www.sciencedirect.com/science/article/pii/S0308596120300537
----------------------------------------
Article 22:
Title: Black boxes, not green: Mythologizing artificial intelligence and omitting the environment
Author(s): ['B Brevini']
Year: 2020
Citations: 119
Full Abstract: Full abstract not available
URL: https://journals.sagepub.com/doi/abs/10.1177/2053951720935141
----------------------------------------
Article 23:
Title: The assessment list for trustworthy artificial intelligence (ALTAI)
Author(s): ['P Ala-Pietilä', 'Y Bonnet', 'U Bergmann', 'M Bielikova']
Year: 202



Full Abstract: Error fetching abstract: The markup you provided was rejected by the parser. Trying a different parser or a different encoding may help.

Original exception(s) from parser:
 AssertionError: expected name token at '<![@8Ƈ�(�:Ъw~�XV\x17te\x1b'
URL: https://www.i-proclaim.my/journals/index.php/apjee/article/download/542/500
----------------------------------------
Article 28:
Title: Building trust in artificial intelligence
Author(s): ['F Rossi']
Year: 2018
Citations: 232
Full Abstract: Full abstract not available
URL: https://www.jstor.org/stable/26588348
----------------------------------------
Article 29:
Title: For a meaningful artificial intelligence: Towards a French and European strategy
Author(s): ['C Villani', 'Y Bonnet', 'B Rondepierre']
Year: 2018
Citations: 224
Full Abstract: Full abstract not available
URL: https://books.google.com/books?hl=en&lr=&id=9cVUDwAAQBAJ&oi=fnd&pg=PA3&dq=Environmental+Impacts+Artificial+Intelligence+Europe&ots=WBibA_ZOGJ&sig=DjTNNYtd-K

KeyError: 'pub_url'

## Extract abstract to qualitatively choose articles

This code sets up a headless Chrome browser in Google Colab using Selenium and installs necessary dependencies to scrape full abstracts from academic articles. It searches for articles related to the query "Environmental Impacts of AI, Europe" on Google Scholar, retrieves the first five results, and attempts to fetch and display the full abstract from each article's URL, handling cookie banners and potential variations in webpage structure. After the results are printed, the Selenium WebDriver is closed.
PB : it only work for article 1 & 4.

In [None]:
# Install necessary dependencies for running Selenium with Chromium in Colab
!apt-get update -qq
!apt-get install -y wget curl unzip
!apt-get install -y chromium-browser
!apt-get install -y chromium-chromedriver
!pip install selenium
!pip install chromedriver-autoinstaller

# Import necessary libraries
import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# Set up Chrome options for headless mode in Google Colab
chrome_options = Options()
chrome_options.add_argument('--headless')  # Run Chrome in headless mode
chrome_options.add_argument('--no-sandbox')  # Disable sandboxing (required in Colab)
chrome_options.add_argument('--disable-dev-shm-usage')  # Avoid shared memory errors
chrome_options.add_argument('--disable-gpu')  # Disable GPU usage (for headless mode)
chrome_options.add_argument('--remote-debugging-port=9222')  # Enable remote debugging
chrome_options.binary_location = '/usr/bin/chromium-browser'  # Path to the Chromium binary in Colab

# Automatically install and match the correct version of ChromeDriver
chromedriver_autoinstaller.install()

# Set up ChromeDriver with the specified options
driver = webdriver.Chrome(options=chrome_options)

# Function to handle cookies and fetch full abstract from the article's URL (if available)
def get_full_abstract(url):
    try:
        driver.get(url)  # Open the article URL
        time.sleep(3)  # Wait for the page to load

        # Wait for cookies banner or pop-up to appear, and attempt to close it
        try:
            cookie_button = WebDriverWait(driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Accept') or contains(text(),'Close')]"))
            )
            cookie_button.click()
            time.sleep(2)  # Allow time for the cookie banner to be dismissed
        except:
            pass  # If no cookie banner, continue

        # Wait for abstract to load, use WebDriverWait to ensure content is visible
        abstract = ""
        try:
            abstract_section = WebDriverWait(driver, 10).until(
                EC.visibility_of_element_located((By.XPATH, "//h2[contains(text(),'Abstract')]/following-sibling::p"))
            )
            abstract = abstract_section.text
        except:
            try:
                # Fallback if abstract is in a different section
                abstract_section = WebDriverWait(driver, 10).until(
                    EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'abstract') or contains(@class,'section')]/p"))
                )
                abstract = abstract_section.text
            except:
                # Fallback to paragraphs if abstract still not found
                paragraphs = driver.find_elements(By.TAG_NAME, 'p')
                for para in paragraphs:
                    text = para.text.strip()
                    if text and len(text.split()) > 5:
                        abstract += text + '\n'

        return abstract if abstract else "No abstract found"

    except Exception as e:
        return f"Error fetching abstract: {str(e)}"

# Search query on Google Scholar
from scholarly import scholarly
query = "Environmental Impacts of AI, Europe"
search_query = scholarly.search_pubs(query)

# Retrieve the first 5 results
articles = []
for i in range(5):  # Adjust the number of results as needed
    article = next(search_query)
    articles.append(article)

# Display the results with full abstracts
for i, article in enumerate(articles, start=1):
    print(f"Article {i}:")
    print(f"Title: {article['bib']['title']}")
    print(f"Author(s): {article['bib']['author']}")
    print(f"Year: {article['bib']['pub_year']}")
    print(f"Citations: {article['num_citations']}")

    # Try to fetch the full abstract from the article's URL
    full_abstract = get_full_abstract(article['pub_url'])
    print(f"Full Abstract: {full_abstract}")

    print(f"URL: {article['pub_url']}")
    print("-" * 40)

# Close the Selenium WebDriver after scraping
driver.quit()


Article 1:
Title: The environmental challenges of AI in EU law: lessons learned from the Artificial Intelligence Act (AIA) with its drawbacks
Author(s): ['U Pagallo', 'J Ciani Sciolla', 'M Durante']
Year: 2022
Citations: 42
Full Abstract: Important note for authors: phishing scams.
Transforming Government: People, Process and Policy
Article publication date: 22 June 2022 Permissions
Issue publication date: 12 July 2022
The paper aims to examine the environmental challenges of artificial intelligence (AI) in EU law that regard both illicit uses of the technology, i.e. overuse or misuse of AI and its possible underuses. The aim of the paper is to show how such regulatory efforts of legislators should be understood as a critical component of the Green Deal of the EU institutions, that is, to save our planet from impoverishment, plunder and destruction.
To illustrate the different ways in which AI can represent a game-changer for our environmental challenges, attention is drawn to a multid