# Part A

#### Go to the page https://en.wikipedia.org/wiki/List_of_country_music_performers and extract all of the links using your regular expressions from above.

In [2]:
import re 

with open('performers.txt', 'r') as file:
    text = file.read()

pattern = r'\[\[([^\]|]+)(?:\|([^\]]+))?\]\]'
performers = re.findall(pattern, text)

for performer in performers:
    link = performer[0]
    print(f"Link: {link}")

Link: The Abrams Brothers
Link: Ace in the Hole Band
Link: Roy Acuff
Link: Kay Adams (singer)
Link: Ryan Adams
Link: Doug Adkins
Link: Trace Adkins
Link: David "Stringbean" Akeman
Link: Rhett Akins
Link: Alabama (band)
Link: Lauren Alaina
Link: Jason Aldean
Link: Alee (singer)
Link: Daniele Alexander
Link: Jessi Alexander
Link: Gary Allan
Link: Susie Allanson
Link: Deborah Allen
Link: Duane Allen
Link: Harley Allen
Link: Jimmie Allen
Link: Rex Allen
Link: Terry Allen (country singer)
Link: Allman Brothers Band
Link: Gregg Allman
Link: Tommy Alverson
Link: Dave Alvin
Link: Amazing Rhythm Aces
Link: American Young
Link: Don Amero
Link: Colin Amey
Link: Al Anderson (NRBQ)
Link: Bill Anderson (singer)
Link: Brent Anderson (singer)
Link: Coffey Anderson
Link: John Anderson (singer)
Link: Keith Anderson
Link: Liz Anderson
Link: Lynn Anderson
Link: Sharon Anderson (singer)
Link: Elisabeth Andreassen
Link: Ingrid Andress
Link: Courtney Marie Andrews
Link: Jessica Andrews
Link: Sheila Andrews
L

#### Use your knowledge of APIs and the list of all the wiki-pages to download all the text on the pages of the country performers.

In [3]:
import os
import re
import requests
from time import sleep

output_dir = "performer_files"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# Assuming 'performers' is already defined earlier in your code
page_titles = list(set(performer[0] for performer in performers))
page_titles = [title.strip().replace(' ', '_') for title in page_titles]

WIKIPEDIA_API_URL = "https://en.wikipedia.org/w/api.php"
headers = {
    'User-Agent': 'YourAppName/1.0 (your_email@example.com)'
}

# Process each page individually
for title in page_titles:
    params = {
        "action": "query",
        "prop": "revisions",
        "rvprop": "content",
        "rvslots": "main",
        "format": "json",
        "titles": title
        # Not including "redirects":1 to skip redirects
    }

    try:
        response = requests.get(WIKIPEDIA_API_URL, params=params, headers=headers)
        response.raise_for_status()
        data = response.json()

        pages = data['query']['pages']
        page = next(iter(pages.values()))
        
        # Skip missing pages
        if 'missing' in page:
            print(f"Page '{title}' is missing. Skipping.")
            continue
        # Skip redirects
        if 'redirect' in page:
            print(f"Page '{title}' is a redirect. Skipping.")
            continue

        if 'revisions' in page:
            wikitext = page['revisions'][0]['slots']['main']['*']

            # Save the wikitext in the performer_files folder
            filename = os.path.join(output_dir, f"{title}.txt")
            with open(filename, 'w', encoding='utf-8') as f:
                f.write(wikitext)
            print(f"Downloaded wikitext for {title}")
        else:
            print(f"No content found for page '{title}'. Skipping.")

        # Respectful crawling
        sleep(0.5)  # Sleep for half a second between requests
    except requests.exceptions.RequestException as e:
        print(f"An error occurred while fetching '{title}': {e}")
    except Exception as e:
        print(f"An unexpected error occurred with '{title}': {e}")


Downloaded wikitext for McClymonts
Downloaded wikitext for Love_and_Theft_(band)
Downloaded wikitext for Keith_Palmer_(singer)
Downloaded wikitext for Charles_Kelley
Downloaded wikitext for Aaron_Lewis_(musician)
Downloaded wikitext for Ryan_Tyler
Downloaded wikitext for Larry_the_Cable_Guy
Downloaded wikitext for Whiskey_Falls
Downloaded wikitext for Chayce_Beckham
Downloaded wikitext for Pirates_of_the_Mississippi
Downloaded wikitext for Buddy_Brown_(musician)
Downloaded wikitext for Josh_Turner
Downloaded wikitext for Cooder_Graw
Downloaded wikitext for The_Devil_Makes_Three_(band)
Downloaded wikitext for Jerry_Douglas
Downloaded wikitext for Billy_Gilman
Downloaded wikitext for Stonewall_Jackson_(musician)
Downloaded wikitext for Valerie_June
Downloaded wikitext for A_Thousand_Horses
Downloaded wikitext for American_Young
Downloaded wikitext for Debby_Boone
Downloaded wikitext for Michael_White_(singer)
Downloaded wikitext for Amazing_Rhythm_Aces
Downloaded wikitext for Lobo_(music