## Importing Libraries for Data Processing and API Requests
This section imports various libraries required for data processing and API requests. These include csv for CSV file handling, json for JSON data manipulation, requests for making HTTP requests, time for handling time-related operations, pandas for data analysis, and tqdm for creating progress bars.

In [54]:
# Import necessary libraries
import csv
import json
import requests
import time
import pandas as pd
from tqdm import tqdm



## Defining Constants for ORES API Integration
This part defines constants related to the ORES (Objective Revision Evaluation Service) API. It includes the API endpoint for ORES, the model name for English article quality, assumed API latency, and the time to wait between consecutive requests to adhere to API rate limits.

In [55]:
# Constants for the ORES API
API_ORES_LIFTWING_ENDPOINT = "https://api.wikimedia.org/service/lw/inference/v1/models/{model_name}:predict"
API_ORES_EN_QUALITY_MODEL = "enwiki-articlequality"
API_LATENCY_ASSUMED = 0.002
API_THROTTLE_WAIT = (60.0 / 5000.0) - API_LATENCY_ASSUMED



## Creating a Function for ORES API Requests and Handling Errors

Code provided by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. 
The code is provided under the Creative Commons CC-BY license. 
Revision 1.0 - August 15, 2023.

*****this is modified code

This section presents a Python function named request_ores_score_per_article. The function is used to make requests to the ORES API for obtaining article quality scores. It includes error handling mechanisms and retries in case of failed requests or HTTP errors.

In [56]:
# Other necessary constants
USERNAME = "<Qwertyishank09>"
ACCESS_TOKEN = "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJhdWQiOiJmOWI2MzZiZTk3MDk4YmVmNTZkOTY0Yjk4ZDhmNDRhOSIsImp0aSI6IjVkMWQzMTIxN2NhNTZlMDhiNzkyNzUyYWQzYjI3YjJmYzUwZmNhMDI2Njg3ODIyN2NjOTc1Y2FhNmU1NTNlOWFhMDk4MTEyOWYyNDhkMDY5IiwiaWF0IjoxNjk3NDI2MzM4LjI5MTI1MywibmJmIjoxNjk3NDI2MzM4LjI5MTI1NiwiZXhwIjozMzI1NDMzNTEzOC4yOTAwMSwic3ViIjoiNzQwMTQwNzIiLCJpc3MiOiJodHRwczovL21ldGEud2lraW1lZGlhLm9yZyIsInJhdGVsaW1pdCI6eyJyZXF1ZXN0c19wZXJfdW5pdCI6NTAwMCwidW5pdCI6IkhPVVIifSwic2NvcGVzIjpbImJhc2ljIl19.QF2MjTO90RxvSmQm3ydHsXMIzCsviJ-XNvi4NSW03cOuawJHzs4mRgnjNkrmfZKTMEPK8DHsmqLK80n8cPpOAu8WPWxQSiFvB9Zce4ePo-Gox3Wy45Om4K_n8ubcG4uQSbRGRJyGxk1yViYSLF00x67uEXBRLheZso0J4fY-sn0P0YjVgrVd5tQs7AFzo7jouAq4_iiSv4Dy8qpgvxcsQh3b-4P0kqv3WJHymHIzrQWModfjeePRGRoJxDeUgxAnS6pOwPkHihlxCLfpNSJoMGi8sGTMGb2IptP6SvReTJOiMqezguD9A1PUSFKtIrw77ZQDf0isPpNJWaNTlC4RkCqyMr1TxoGOx4zHkui8VmLwxKYSeNqBmhsX5YyaUTgHW71gcdS7u1svAn6rnErnEQZrsie8dBuZYFtiUnfLUejfMJ3s1gQ0RWQia7WM_BPTJr9nFjPfVthSO7LHK7vOPeKnCd_TDvZq-YR6YBHqcYonjfll-v9A0181ItdgOdaPL9r8Isdb42ER9rG_BCbkuho_yeMJhc9YzAq4zYDKN4uVCKlbXIeTdbuJUfCNCfitRGGW5O270cu917OtRrVASAcyGimlF_J9dnRPmn0Nyh5SHuHTgsZClj0dtqeqk4ugBlLwr6FumoXULKBU3rRwjqlt6R9pc0MwgROyFKVrv3E"

# Function to make the ORES API request
def request_ores_score_per_article(article_title, article_revid, email_address, access_token):
    request_data = {
        "lang": "en",
        "rev_id": article_revid,
        "features": True
    }

    header_params = {
        'email_address': email_address,
        'access_token': access_token
    }

    headers = {
        'User-Agent': f"{email_address}, University of Washington, MSDS DATA 512 - AUTUMN 2023",
        'Content-Type': 'application/json',
        'Authorization': f"Bearer {access_token}"
    }

    request_url = API_ORES_LIFTWING_ENDPOINT.format(model_name=API_ORES_EN_QUALITY_MODEL)
    try_count = 0
    while try_count < 10:
        try:
            if API_THROTTLE_WAIT > 0.0:
                time.sleep(API_THROTTLE_WAIT)
            response = requests.post(request_url, headers=headers, data=json.dumps(request_data))
            response.raise_for_status()  # Check for non-200 status
            json_response = response.json()
            return json_response
        except requests.exceptions.HTTPError as http_err:
            print(f"HTTP error occurred: {http_err}")
            print(f"Status code: {response.status_code}")
            print(f"Retrying after {2 ** try_count} seconds.")
            time.sleep(2 ** try_count)
            try_count += 1
        except requests.exceptions.RequestException as e:
            print(f"Request failed for {article_title}. Retrying after {2 ** try_count} seconds.")
            time.sleep(2 ** try_count)
            try_count += 1
    return None



## Converting Wikipedia Page Information from CSV to Dictionary
This block reads the CSV file containing Wikipedia page information and converts it into a list of dictionaries.

In [57]:
# Your CSV file name
csv_file = '../data/wiki_page_info.csv'

# Reading the CSV file and converting it to a list of dictionaries
articles = pd.read_csv(csv_file).to_dict(orient='records')


In [58]:
articles

[{'Title': 'Abbeville, Alabama', 'Last_Revision_ID': 1171163550},
 {'Title': 'Adamsville, Alabama', 'Last_Revision_ID': 1177621427},
 {'Title': 'Addison, Alabama', 'Last_Revision_ID': 1168359898},
 {'Title': 'Akron, Alabama', 'Last_Revision_ID': 1165909508},
 {'Title': 'Alabaster, Alabama', 'Last_Revision_ID': 1179139816},
 {'Title': 'Albertville, Alabama', 'Last_Revision_ID': 1179198677},
 {'Title': 'Alexander City, Alabama', 'Last_Revision_ID': 1179140073},
 {'Title': 'Aliceville, Alabama', 'Last_Revision_ID': 1167792390},
 {'Title': 'Allgood, Alabama', 'Last_Revision_ID': 1165909718},
 {'Title': 'Altoona, Alabama', 'Last_Revision_ID': 1165909823},
 {'Title': 'Andalusia, Alabama', 'Last_Revision_ID': 1179141586},
 {'Title': 'Anderson, Lauderdale County, Alabama',
  'Last_Revision_ID': 662691565},
 {'Title': 'Anniston, Alabama', 'Last_Revision_ID': 1176049382},
 {'Title': 'Arab, Alabama', 'Last_Revision_ID': 1171375371},
 {'Title': 'Ardmore, Alabama', 'Last_Revision_ID': 1176903479},


## Checking for Existing Entries in the ORES Predictions CSV File
check for already processed entries

In [62]:
# Check for existing entries in the CSV
existing_pairs = set()
with open('../data/ores_predictions.csv', mode='r') as file:
    reader = csv.reader(file)
    next(reader)  # Skip the header row
    for row in reader:
        existing_pairs.add((row[0], row[1]))


In [63]:
existing_pairs

{('Upton, Wyoming', '1166331959'),
 ('Reed City, Michigan', '1160686746'),
 ('Mulberry, Arkansas', '1165894704'),
 ('Myrtle Beach, South Carolina', '1180335677'),
 ('Conneaut, Ohio', '1168137365'),
 ('West Tisbury, Massachusetts', '1174583314'),
 ('Crab Orchard, Kentucky', '1170225918'),
 ('Redwood City, California', '1180012128'),
 ('Fillmore, New York', '1169058706'),
 ('Davis, California', '1177529748'),
 ('Gilsum, New Hampshire', '1174704297'),
 ('Winona Lake, Indiana', '1166334105'),
 ('Boy River, Minnesota', '1165536774'),
 ('Tega Cay, South Carolina', '1172815958'),
 ('Clarion, Iowa', '1165539512'),
 ('Pine Township, Mercer County, Pennsylvania', '1160604569'),
 ('Elk Township, Warren County, Pennsylvania', '1173803663'),
 ('Port Orchard, Washington', '1179567904'),
 ('Lincoln Park, New Jersey', '1178951121'),
 ('Centre Island, New York', '1167338362'),
 ('Dorchester, Illinois', '1166852557'),
 ('East Brandywine Township, Chester County, Pennsylvania', '1166990549'),
 ('Vernon, 

## Obtaining ORES Scores for Wikipedia Articles and Storing in CSV
This section initiates a loop over each article in the list of dictionaries obtained from the CSV. It retrieves the title and revision ID, checks if the entry exists in the existing pairs, and then requests the ORES score for the article using the previously defined function. It handles different HTTP error scenarios and writes the obtained data to the 'ores_predictions.csv' file.

In [61]:
# Loop over each article to get the ORES score
for article in tqdm(articles):
    title = article["Title"]
    rev_id = article["Last_Revision_ID"]
    
    if (title, rev_id) in existing_pairs:
        print(f"The entry for Title: {title} and Revision ID: {rev_id} already exists in the CSV. Skipping.")
        continue
    
    score = request_ores_score_per_article(title, rev_id, USERNAME, ACCESS_TOKEN)
    


    if score is not None:
        try:
            prediction = score["enwiki"]["scores"][str(rev_id)]["articlequality"]["score"]["prediction"]
        except KeyError as e:
            print(f"KeyError: {e}. One of the required keys is missing in the response for Title: {title}, Rivision Id: {rev_id}")
            prediction = "N/A"  # Set default value for prediction

        # Storing the data in a CSV
        with open('../data/ores_predictions.csv', mode='a', newline='') as file:
            writer = csv.writer(file)
            writer.writerow([title, rev_id, prediction])
    else:
        print(f"Failed to get score for {title}.")

 13%|████▌                               | 2803/22157 [39:41<8:02:01,  1.49s/it]

HTTP error occurred: 504 Server Error: Gateway Timeout for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 504
Retrying after 1 seconds.


 17%|██████▏                             | 3797/22157 [58:34<5:09:28,  1.01s/it]

HTTP error occurred: 502 Server Error: Bad Gateway for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 502
Retrying after 1 seconds.


 25%|████████▌                         | 5540/22157 [1:29:29<5:14:16,  1.13s/it]

HTTP error occurred: 502 Server Error: Bad Gateway for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 502
Retrying after 1 seconds.


 26%|████████▋                         | 5682/22157 [1:31:47<4:31:37,  1.01it/s]

HTTP error occurred: 502 Server Error: Bad Gateway for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 502
Retrying after 1 seconds.


 27%|█████████                         | 5934/22157 [1:35:35<2:34:10,  1.75it/s]

HTTP error occurred: 504 Server Error: Gateway Timeout for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 504
Retrying after 1 seconds.


 54%|█████████████████▋               | 11868/22157 [2:19:46<1:16:52,  2.23it/s]

HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 1 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 2 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 4 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 8 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 16 seconds.
HTTP error occurred

 54%|████████████████              | 11869/22157 [2:36:51<879:43:34, 307.84s/it]

Failed to get score for Jennings, Missouri.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 1 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 2 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 4 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 8 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Re

 54%|█████████████████▋               | 11909/22157 [2:45:45<1:19:30,  2.15it/s]

HTTP error occurred: 504 Server Error: Gateway Timeout for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 504
Retrying after 1 seconds.


 76%|██████████████████████████▋        | 16868/22157 [3:29:48<33:15,  2.65it/s]

HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 1 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 2 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 4 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 8 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 16 seconds.
HTTP error occurred

 76%|██████████████████████▊       | 16869/22157 [3:46:53<452:11:03, 307.84s/it]

Failed to get score for Jefferson Township, Greene County, Pennsylvania.


 99%|██████████████████████████████████▌| 21869/22157 [4:25:26<02:16,  2.11it/s]

HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 1 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 2 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 4 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 8 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 16 seconds.
HTTP error occurred

 99%|██████████████████████████████▌| 21870/22157 [4:42:31<24:32:40, 307.88s/it]

Failed to get score for Alma, Wisconsin.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 1 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 2 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 4 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retrying after 8 seconds.
HTTP error occurred: 429 Client Error: Too Many Requests for url: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
Status code: 429
Retry

100%|███████████████████████████████████| 22157/22157 [4:45:32<00:00,  1.29it/s]
