# Task

Webscrap information about the latest political cases in Sunndal municipality (starting from 2023). Earlier years are not covered here.

# Parameters

**We first try to open the political cases from the main webpage.**

We can find them by clicking on 'Innsyn politiske saker' on the Politiske saker webpage of the Sunndal kommune. It opens the cases for the current month.

In [None]:
SUNNDAL_MUNICIPALITY_POLITICAL_MATTERS = 'https://www.sunndal.kommune.no/toppmeny/politikk-og-demokrati/politiske-saker/'
LATEST_POLITICAL_AFFAIRS = 'Innsyn politiske saker'

We are interested in Kommunestyret. **If the current month contains 'Kommunestyret'**, we click on it and it opens a specific case. It also shows the directory to the current case in the following form on the top of the page:

`Hjem > Kommunestyret > Kommunestyret (05.02.2025)`

 We click on the 'Kommunestyret' in the directory to open all the cases with Kommunestyret

In [None]:
KOMMUNESTYRET_INDICATOR = 'Kommunestyret'
REPETATIVE_PATTERN = '/meetings/sunndal'
CURRENT_YEAR = 2025

**If the current month doesn't contain 'Kommunestyret'**, we use this direct url to all the cases with Kommunestyret of the current year - 2025. This url needs to be updated with the new year.

In [None]:
KOMMUNESTYRET_2025_URL = 'https://opengov.360online.com/Meetings/sunndal/Boards/Details/209078'

These are PDF files with the most important information about political cases
- 'Komplett innkalling: Kommunestyret' has political cases description
- 'Protokoll Kommunestyret' file includes information about decisions


In [None]:
POLITICAL_CASES_DESCRIPTION_FILE_INDICATOR = 'Komplett innkalling: Kommunestyret'
INFORMATION_ABOUT_DECISIONS_FILE_INDICATOR = 'Protokoll Kommunestyret'

# Import Libraries

In [None]:
%%capture
pip install PyPDF2

In [None]:
import io
import requests
from bs4 import BeautifulSoup
from google.colab import drive
from PyPDF2 import PdfReader
from PyPDF2 import PdfMerger

In [None]:
drive.mount('drive')

Mounted at drive


# Home Page

In the Norwegian language text, we have to properly encode the Norwegian alphabet

In [None]:
page = requests.get(SUNNDAL_MUNICIPALITY_POLITICAL_MATTERS)
page.encoding = page.apparent_encoding
home_webpage = BeautifulSoup(page.text, 'html')

In [None]:
political_affairs_webpage = ''
political_affairs_url = ''
for a in home_webpage.find_all('a', href = True):

    if a.get_text() == LATEST_POLITICAL_AFFAIRS:
        political_affairs_url = a['href']

        page = requests.get(political_affairs_url)
        page.encoding = page.apparent_encoding
        political_affairs_webpage = BeautifulSoup(page.text, 'html')
        break

In [None]:
kommunestyret_url = ''

for a in political_affairs_webpage.find_all('a', href = True):
    if KOMMUNESTYRET_INDICATOR in a.get_text():

        url_prefix = political_affairs_url.lower().removesuffix(REPETATIVE_PATTERN)
        kommunestyret_case_url = url_prefix + a['href']

        page = requests.get(kommunestyret_case_url)
        page.encoding = page.apparent_encoding
        kommunestyret_case_webpage = BeautifulSoup(page.text, 'html')

        for a in kommunestyret_case_webpage.find_all('a', href = True):

            if KOMMUNESTYRET_INDICATOR == a.get_text():
                kommunestyret_url = url_prefix + a['href']
                break
        break

In [None]:
if not kommunestyret_url:
  kommunestyret_url = KOMMUNESTYRET_2025_URL

kommunestyret_url_per_year = {}
kommunestyret_url_per_year[CURRENT_YEAR] = kommunestyret_url

for year in range(2023, CURRENT_YEAR):
    kommunestyret_url_per_year[year] = kommunestyret_url + '?Year=' + str(year) + '&Month=-1&focus=true'

In [None]:
kommunestyret_url_per_year

{2025: 'https://opengov.360online.com/Meetings/sunndal/Boards/Details/209078',
 2023: 'https://opengov.360online.com/Meetings/sunndal/Boards/Details/209078?Year=2023&Month=-1&focus=true',
 2024: 'https://opengov.360online.com/Meetings/sunndal/Boards/Details/209078?Year=2024&Month=-1&focus=true'}

In [None]:
kommunestyret_urls = []

for year, url in kommunestyret_url_per_year.items():
    page = requests.get(url)
    page.encoding = page.apparent_encoding
    kommunestyret_webpage = BeautifulSoup(page.text, 'html')

    for a in kommunestyret_webpage.find_all('a', href = True):
            if KOMMUNESTYRET_INDICATOR in a.get_text() and str(year) in a.get_text():

                url_prefix = political_affairs_url.lower().removesuffix(REPETATIVE_PATTERN)
                kommunestyret_case_url = url_prefix + a['href']
                kommunestyret_urls.append(kommunestyret_case_url)

In [None]:
pdf_files_urls = {}
for url in kommunestyret_urls:

    page = requests.get(url)
    page.encoding = page.apparent_encoding
    webpage = BeautifulSoup(page.text, 'html')

    komplett_innkalling_pdf_url = ''
    protokoll_kommunestyret_pdf_url = ''

    for item in webpage.find_all('li'):

        if POLITICAL_CASES_DESCRIPTION_FILE_INDICATOR.lower() in item.get_text().lower():

            for a in item.find_all('a', href = True):

                url_prefix = political_affairs_url.lower().removesuffix(REPETATIVE_PATTERN)
                komplett_innkalling_pdf_url = url_prefix + a['href']

        if INFORMATION_ABOUT_DECISIONS_FILE_INDICATOR.lower() in item.get_text().lower():

            for a in item.find_all('a', href = True):

                url_prefix = political_affairs_url.lower().removesuffix(REPETATIVE_PATTERN)
                protokoll_kommunestyret_pdf_url = url_prefix + a['href']


    pdf_files_urls[url] = [komplett_innkalling_pdf_url, protokoll_kommunestyret_pdf_url]

In [None]:
pdf_files_urls

{'https://opengov.360online.com/Meetings/sunndal/Meetings/Details/346733': ['https://opengov.360online.com/Meetings/sunndal/File/Details/464243.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(05.02.2025)&fileSize=66805438',
  'https://opengov.360online.com/Meetings/sunndal/File/Details/465056.PDF?fileName=Protokoll%20Kommunestyret%2005.02.2025&fileSize=1251824'],
 'https://opengov.360online.com/Meetings/sunndal/Meetings/Details/314014': ['https://opengov.360online.com/Meetings/sunndal/File/Details/401503.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(04.10.2023)&fileSize=101840175',
  'https://opengov.360online.com/Meetings/sunndal/File/Details/402095.PDF?fileName=Protokoll%20Kommunestyret%2004.10.2023&fileSize=234976'],
 'https://opengov.360online.com/Meetings/sunndal/Meetings/Details/314310': ['https://opengov.360online.com/Meetings/sunndal/File/Details/405462.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(01.11.2023)&fileSize=347049498',
  'https://openg

In [None]:
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Windows; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36'
    }

for url, files_urls in pdf_files_urls.items():
    file_name = url.split('/')[-1]
    print(file_name)

    merger = PdfMerger()

    for file_url in files_urls:

        print(file_url)
        response = requests.get(url = file_url, headers = headers, timeout = 120)
        on_fly_obj = io.BytesIO(response.content)

        try:
            pdf_file = PdfReader(on_fly_obj)
            merger.append(pdf_file)
        except:
            print('An exception occurred when reading file from ' + file_url)

    try:
        merger.write('result_' + file_name + '.pdf')
    except:
        print('An exception occurred when writing file ' + file_name)

    merger.close()

346733
https://opengov.360online.com/Meetings/sunndal/File/Details/464243.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(05.02.2025)&fileSize=66805438
https://opengov.360online.com/Meetings/sunndal/File/Details/465056.PDF?fileName=Protokoll%20Kommunestyret%2005.02.2025&fileSize=1251824
314014
https://opengov.360online.com/Meetings/sunndal/File/Details/401503.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(04.10.2023)&fileSize=101840175
https://opengov.360online.com/Meetings/sunndal/File/Details/402095.PDF?fileName=Protokoll%20Kommunestyret%2004.10.2023&fileSize=234976
314310
https://opengov.360online.com/Meetings/sunndal/File/Details/405462.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(01.11.2023)&fileSize=347049498
An exception occurred when reading file from https://opengov.360online.com/Meetings/sunndal/File/Details/405462.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(01.11.2023)&fileSize=347049498
https://opengov.360online.com/Meetings/sunnd



An exception occurred when writing file 321636
321637
https://opengov.360online.com/Meetings/sunndal/File/Details/438265.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(12.06.2024)&fileSize=101048396
https://opengov.360online.com/Meetings/sunndal/File/Details/438448.PDF?fileName=Protokoll%20Kommunestyret%2012.06.2024&fileSize=695200
321638
https://opengov.360online.com/Meetings/sunndal/File/Details/449030.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(11.09.2024)&fileSize=56350875
https://opengov.360online.com/Meetings/sunndal/File/Details/449032.PDF?fileName=Protokoll%20Kommunestyret%2011.09.2024&fileSize=187904
321639
https://opengov.360online.com/Meetings/sunndal/File/Details/453798.PDF?fileName=Komplett%20innkalling%3A%20Kommunestyret%20(30.10.2024)&fileSize=28004382
https://opengov.360online.com/Meetings/sunndal/File/Details/454479.PDF?fileName=Protokoll%20Kommunestyret%2030.10.2024&fileSize=1219344
321640
https://opengov.360online.com/Meetings/sunndal/File/Deta