Required libraries.

In [1]:
from bs4 import BeautifulSoup 
import requests

Scraping functions: generate BeautifulSoup object and analyze medal table.

In [2]:
def wiki_medal_table(year, event='Summer'):
    '''Returns BeautifulSoup object of Wikipedia medal table page 
    for given Olympic event('Summer' or 'Winter')and year.
    '''
    
    print(f'Getting Wikipedia page for {year} {event} Olympics...')
    url = f'https://en.wikipedia.org/wiki/{year}_{event}_Olympics_medal_table'
    return BeautifulSoup(requests.get(url).text, 'html5lib')

def first_gold_medals(year,record):
    '''Writes countries that scored their first Olympic gold during
    the specified year to the dictionary record.
    '''
    
    tables = wiki_medal_table(year).find_all('table')
    medals = ''
    for table in tables:
        if table['class'][0] == 'wikitable':
            medals = table
            break
    
    print(f'Analyzing medal table for {year}...')
    for row in medals.find_all('tr'):
        if row.find_all('span'):
            country = row.find_all('span')[0].string.strip('()')
            if country in record:
                continue
            cols = row.find_all('td')
            # Check if gold medal column has nonzero value
            if cols and cols[len(cols) - 4].string != '0':
                record[country] = year

Generating Summer Olympics records. Simpler to hardcode missing years that can be counted on one hand rather than additional code to handle missing Wikipedia pages. Counts reallocations.

In [4]:
firsts = {}

for year in range(1896,2020,4):
    if year not in (1916,1940,1944):
        first_gold_medals(year, firsts)
print(firsts)

Getting Wikipedia page for 1896 Summer Olympics...
Analyzing medal table for 1896...
Getting Wikipedia page for 1900 Summer Olympics...
Analyzing medal table for 1900...
Getting Wikipedia page for 1904 Summer Olympics...
Analyzing medal table for 1904...
Getting Wikipedia page for 1908 Summer Olympics...
Analyzing medal table for 1908...
Getting Wikipedia page for 1912 Summer Olympics...
Analyzing medal table for 1912...
Getting Wikipedia page for 1920 Summer Olympics...
Analyzing medal table for 1920...
Getting Wikipedia page for 1924 Summer Olympics...
Analyzing medal table for 1924...
Getting Wikipedia page for 1928 Summer Olympics...
Analyzing medal table for 1928...
Getting Wikipedia page for 1932 Summer Olympics...
Analyzing medal table for 1932...
Getting Wikipedia page for 1936 Summer Olympics...
Analyzing medal table for 1936...
Getting Wikipedia page for 1948 Summer Olympics...
Analyzing medal table for 1948...
Getting Wikipedia page for 1952 Summer Olympics...
Analyzing meda

Writing to CSV.

In [8]:
import csv

with open('first_golds.csv', 'w') as first_golds:  
    writer = csv.writer(first_golds)
    writer.writerow(['Country', 'Year'])
    for country, year in firsts.items():
       writer.writerow([country, year])

Testing and debugging: 

[X] Erroneous results in 1912 with NED. Time.sleep did not work. ~~Try identifying user agent.~~ Not potential timeout or lag. Was catching last row with medal total due to wrong indentation.

[X] Erroneous results in 1936 with TUR, not recording first. Wrong table found, due to new "Part of a series" sidebar. Need way to identify medal table: Second table (or (tables)[1]) for 1936 onward.

[X] Missing rank 21 and 24 rows for 1928, medal first ties. Getting wrong table due to setting last wikitable, need to break (assumes first wikitable is main medal table).

In [16]:
#tables = wiki_medal_table(1928).find_all('table')
#medals = ''
for table in tables:
    if table['class'][0] == 'wikitable':
        print(table)

<table class="wikitable sortable plainrowheaders jquery-tablesorter" style="text-align:center"><caption></caption><tbody><tr><th scope="col">Rank</th><th scope="col">Nation</th><th class="headerSort" scope="col" style="width:4em;background-color:gold">Gold</th><th class="headerSort" scope="col" style="width:4em;background-color:silver">Silver</th><th class="headerSort" scope="col" style="width:4em;background-color:#c96">Bronze</th><th scope="col" style="width:4em">Total</th></tr><tr><td>1</td><th scope="row" style="background-color:#f8f9fa;text-align:left"><img alt="" class="thumbborder" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Flag_of_the_United_States_%281912-1959%29.svg/22px-Flag_of_the_United_States_%281912-1959%29.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Flag_of_the_United_States_%281912-1959%29.svg/33px-Flag_of_the_United_States_%281912-1959%29.svg.png 1.5x, //up

In [6]:
#print(firsts)
firsts['TJK']

2016