### First run with Wikipedia to try out Selenium for the first time

In [5]:
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('../chromedriver_mac64/chromedriver')

In [6]:
# Use Wikipedia of Swedish politician
driver.get('https://sv.wikipedia.org/wiki/G%C3%B6ran_Persson')

My goal is the following:
1. I want to know who the next primeminister of Sweden was after the one whos wikipedia page im on.
2. After that I want to move to the wikipedia page of the next one.
3. Go back to step 1 until Im at the current primeminister.

Example of how the HTML looks locally around the data I want

```
<tr>
    <th> Efterträdare </th>
    <td><a href="/wiki/Fredrik_Reinfeldt" title="Fredrik Reinfeldt">Fredrik Reinfeldt</a></td>
</tr>

```

In [7]:
def NextPresident(driver):
    th_element = driver.find_element(By.XPATH,"//th[contains(text(), 'Efterträdare')]")
    tr_element = th_element.find_element(By.XPATH,"./ancestor::tr")
    td_element = tr_element.find_element(By.XPATH,"./td")

    # Find the link element within the <td> element
    link_element = td_element.find_element(By.XPATH,"./a")

    # Extract the link URL and text
    link_url = link_element.get_attribute("href")
    link_text = link_element.text

    return link_text, link_url

In [8]:
def NextPresidents(driver):
    while True:
        try:
            pres,link = NextPresident(driver)
            print(pres)
            driver.get(link)
        except:
            return

In [9]:
driver.get('https://sv.wikipedia.org/wiki/G%C3%B6ran_Persson')
NextPresidents(driver)

Fredrik Reinfeldt
Stefan Löfven
Magdalena Andersson
Ulf Kristersson
Annika Strandhäll
Romina Pourmokhtari
Erik Berg


The code should have terminated after Ulf Kristersson but instead continued on the journey of finding 'Efterträdare', meaning successor in English. This is a problem that is easily fixable by making the code more specific. But given time constraints and the fact that this is only a learning example lets move on to the next try of how to use Selenium and call this good enough.

## Test 2: Collect info from site with Async data streaming
This test is done on the site I actually want to collect data from. It is a weather site and I will simply be collecting info on upcoming forecasts. The site has a fairly annoying async data streaming which warrants the use of Selenium

You need to click certain elements on the site to be able to retrieve the weather information.

In [57]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd

driver = webdriver.Chrome('../chromedriver_mac64/chromedriver')

In [58]:
# Use the site we want to try
driver.get('https://www.smhi.se/vader/prognoser/ortsprognoser/q/Stockholm/2673730')

In [59]:
# Find the pane where the individual days are located.
root = driver.find_element(By.ID,'root')
pane = root.find_elements(By.XPATH,'//*[@role="tabpanel"]')[0]

In [64]:
# You need to click the buttons to be able to retrieve the weather data
buttons = pane.find_elements(By.XPATH,'./Button')

# Lets get the weather data for the rest of today and tomorrow by first clicking these days.
buttons[0].click()
buttons[1].click()

In [65]:
# On the pane the weather data is now visible
tables = pane.find_elements(By.XPATH,'./div/div/table/tbody')

In [66]:
# Make sure to convert the data to a medium where it is more easily readable and usable for later operations.
def tableToPandas(table):
    rows = table.find_elements(By.XPATH, './tr')
    rownumbers,degrees,rains,humidities = [],[],[],[]

    for row in rows:
        # RowNumber
        rownumber = row.find_elements(By.XPATH, './th/span')[0].text
        degree = row.find_elements(By.XPATH, './td')[0].text.replace('°','')
        rain = row.find_elements(By.XPATH, './td')[1].text
        humidity = row.find_elements(By.XPATH, './td')[4].text
        rownumbers.append(rownumber),degrees.append(degree),rains.append(rain),humidities.append(humidity)

    df = pd.DataFrame({'degree':degrees,'rain':rains,'humidity':humidities})   
    df.index = rownumbers
    return df

In [67]:
today = tableToPandas(tables[0])
tomorrow = tableToPandas(tables[1])

In [68]:
today.to_csv('today.csv')

In [69]:
tomorrow.to_csv('tomorrow.csv')