# The Impact of Artificial Intelligence on Higher Education

### Introduction
Artificial intelligence (AI) is transforming higher education, influencing teaching, learning, and administrative practices. As AI becomes more prevalent in educational settings, there is a growing body of research examining its implications, benefits, and challenges in higher education. This notebook aims to explore this research landscape by conducting a two-part analysis.

### Part 1: Bibliometric Analysis
In the first phase, we perform a bibliometric analysis using data from SCOPUS, aiming to summarize the state of the research in this field. Specifically, our objectives include:
- **Quantifying Publication Trends ->** Determining how research on AI in higher education has grown over recent years.
- **Identifying Key Contributors ->** Recognizing influential authors, institutions, and countries that are leading research efforts.
- **Research Themes and Collaboration Networks ->** Exploring recurring themes, as well as collaborations and partnerships in the literature.

### Part 2: Text Mining Analysis
Following the bibliometric analysis, we will apply text mining techniques to the articles themselves. This phase will enable us to delve deeper into the content, uncovering nuanced insights into how AI is being discussed and understood in higher education contexts. In particular, we aim to:
- **Extract Key Topics ->** Use natural language processing (NLP) methods to identify key themes and subtopics.
- **Analyze Sentiment and Context ->** Examine how AI’s impact is portrayed in higher education, focusing on sentiments around its benefits and challenges.
- **Identify Emerging Trends ->** Detect emerging applications or innovative uses of AI in education.

### Methodology
We will use SCOPUS API to aquire articles based on the following search query:
``` "impact" AND "high* education" AND "artificial intelligence" AND PUBYEAR < 2025 ```. Our approach will follow these steps:
1. **Data collection ->** Retrieve bibliometric data from SCOPUS, including article titles, abstracts, authors, affiliations, and publication years.
2. **Data Processing ->** Organize and clean the data for analysis, ensuring it is suitable for quantitative and qualitative assessments.
3. **Bibliometric Analysis:**
    * **Publication Trends ->** Analyze the number of publications over time to identify growth patterns.
    * **Key Contributors ->** Identify leading authors, institutions, and countries in AI research within higher education.
    * **Research Themes ->** Use text mining techniques to uncover major themes and topics in the literature.
    * **Collaborative Networks ->** Examine co-authorship and institutional collaborations.
4. **Exploring Potential Machine Learning Applications ->** Briefly discuss potential applications of machine learning techniques to further analyze or extend insights from the bibliometric data.

### Expected outcomes
This analysis will contribute to understanding the broader impact of AI on higher education, offering valuable perspectives for academics, practitioners, and policymakers aiming to leverage AI technologies to enhance educational experiences and outcomes.


========================================================================================================================================

#### Importing required libraries and modules

In [6]:
import os
import sys
import requests
import json
import pickle
import math 
import pandas as pd

# Add custom module that provides auxiliar functions
aux_modules_path = os.path.abspath(os.path.join('./scripts'))
if aux_modules_path not in sys.path:
    sys.path.append(aux_modules_path)


#### API Key Configuration

In [7]:
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Retrieve the API key
API_KEY = os.getenv('API_KEY')

# Verify if the API key was loaded
print(f"API Key: {API_KEY}")

API Key: dc7c18eddcd470e03ed6f72cf4a11585


#### Data Collection
This will take some time due to the high amount of articles retrieved. This will retrieve 5000 articles, even though the query results in over 50000 articles, due to SCOPUS API imposing a maximum cap on the number of results retrieved per query of 5000. 

In [None]:
# variable to store the data returned by the SCOPUS API
API_DATA = None

# query to send to SCOPUS API
user_query = '"impact" AND "high* education" AND "artificial intelligence" AND PUBYEAR < 2025'

def fetch_all_articles():
    articles = []
    total_results = None  # total articles queriable with this query
    start = 0
    count = 25  # results per page, max 25
    
    while True:
        params = {
            'query': user_query,
            'apiKey': API_KEY,
            'start': start,
            'count': count
        }
        
        response = requests.get("https://api.elsevier.com/content/search/scopus", params=params)
        
        # Check if the request was successful
        if response.status_code != 200:
            if response.status_code == 429:
                print("You have hit the limit quota:", response.status_code)
                break
            if len(articles) >= 5000:
                print("Hit the rate limit of results retrieved.")
                break
            print("Failed to retrieve data:", response.status_code)
            break
        
        data = response.json()
        entries = data.get("search-results", {}).get("entry", [])
        
        if total_results == None and 'opensearch:totalResults' in data.get("search-results", {}):
            total_results = int(data.get("search-results", {}).get("opensearch:totalResults"))

        # If there are no more entries, stop the loop
        if not entries:
            break
        
        # Process each entry and add it to the articles list
        for entry in entries:
            article_data = {
                "title": entry.get("dc:title", "No title"),
                "author_names": [author.get("authname") for author in entry.get("author", [])],
                "publication_name": entry.get("prism:publicationName"),
                "publication_date": entry.get("prism:coverDate"),
                "doi": entry.get("prism:doi"),
                "cited_by_count": entry.get("citedby-count", "0"),
                "abstract": entry.get("dc:description", "No abstract"),
                "keywords": [kw.get("keyword") for kw in entry.get("keywords", [])],
                "affiliations": [
                    {
                        "name": affil.get("affilname", "No affiliation name"),
                        "city": affil.get("affiliation-city", "No city"),
                        "country": affil.get("affiliation-country", "No country")
                    }
                    for affil in entry.get("affiliation", [])
                ]
            }
            articles.append(article_data)
        
        # Update the start index for the next batch
        start += count

    print("Number of articles retrieved: ", len(articles))
    print(f"Total number of articles: {total_results}")
    return pd.DataFrame(articles)

# Fetch all articles and load them into a DataFrame
API_DATA = fetch_all_articles()

Hit the rate limit of results retrieved.
Number of articles retrieved:  5000
Total number of articles: 50568


In [9]:
len(API_DATA)

5000

## Part 1: Bibliometric analysis