# **University of Otago Key Dates to Calendar** 

Throughout this notebook, I will document the process I have taken to process the events from the [University of Otago's key dates page](https://www.otago.ac.nz/news/events/keydates/) in to Google Calendar events. Techniques I used included web scraping and OAuth authentication and making calls to Google Calendars API. 

## **Problem**

The problem I was having was that I didnt want to have to go through Otago's key dates website and manually add all the events I needed to keep track of. So I devised a Python script to do it for me. 


## **Set Up**

First of all we need to install the packages we are going to use for this script. These include all of the Google authentication packages...

`pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client`

Firther more, we need to import all the packages into our project

In [1]:
# Importing modules for webscraping
import requests
from bs4 import BeautifulSoup
from datetime import datetime

# Importing modules for Google Calendar API
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

## **Web Scraping the Key Dates Page**

After I installed all of the packages required, I got streight on to learning the python web scraping library `BeautifulSoup`. 

Before we start processing the webpage, we need to use `requests` to get the page. We then follow this creating an instance of `BeautifulSoup`, passing in our responce. 

In [2]:
# Scraping the webpage and getting the data
url = "https://www.otago.ac.nz/news/events/keydates/"
responce = requests.get(url)
soup = BeautifulSoup(responce.text, "html.parser")
data = {}

#### **Processing the web page** 

From here we can start to process the page and extract all the elemetns we want. In our case, our months are stored under the selector `#content > div > h2`, and the tables containing the events stored under the selector `#content > div > dl`

In [3]:
# Stores all the months
months = soup.select("#content > div > h2")

# Stores all the Dates and Events from each month
months_tables = soup.select("#content > div > dl")

# Processing the events into a dictionary
for i in range(12):
    month_key = months[i].text          # month text eg. "January"
    month_events = months_tables[i]

    # Appendng the events to the month key in the dictionary
    data[month_key] = month_events

# Iterate through each month and create a dictionaly of key = date and value = event
for month in data:
    events = data[month]
    event_dict = {}

    # Iterate over each dt and dd elements
    for i in range(len(events.select("dt"))):
        date = events.select("dt")[i].text
        event = events.select("dd")[i].text

        # convert date to datetime object from this format: Monday, 9 January
        date = datetime.strptime(date, "%A, %d %B").replace(year=2023)

        # Append the date and event to the dictionary
        event_dict[date] = event

    # Replace the list of events with the dictionary
    data[month] = event_dict

#### **Filtering the events** 

If you were to have a look through Otago's Key Dates you'll notice that they have a lot events there that do not apply to everyone. So I was now tasked with coming up with a way to filter through, leaving only the relevent events for me.

Unfortunatly, my Python skills are not comprehensive enough and I struggled to find a solution to my problem. So, thanks to ChatGPT, for the help. 

In [4]:
# Defining a filter for events
filter_words = ['fees', 'begin', 'end', 'semester']

# Thanks for ChatGPT for this filter code
# Creating a dict of events that do not contain any of the filter words
filtered_events = {
    month: {
        date: event for date, event in events.items() if any(word in event for word in filter_words)
    } for month, events in data.items()
}

## **Processing Events using Google Calendar API**

Now I had a dictionary of events that were ready to enter my calendar, all I had to do now was connect up Google Calendar's API and process all of them

#### **Authentication**

Authentication was the first step. Without it, the script wouldn't know whos calendar to input the events into. Thankfully, Google Calendar provides a comprehensive [quickstart guide](https://developers.google.com/calendar/api/quickstart/python) that I modified to authenticate the user and retirive their credentials. 

In [5]:
# Modified from quickstart guide: https://developers.google.com/calendar/api/quickstart/python#prerequisites

import os.path

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow


# @return Credentials object
def authenticate():

    # If modifying these scopes, delete the file token.json.
    SCOPES = ['https://www.googleapis.com/auth/calendar']

    creds = None

    # The file token.json stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.

    # If there are existing credentials available, load them.
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)

    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        # Checking if creds are expired and refreshing them if they are
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            # Running a local server for authentication and authorization
            flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)

            # Save the credentials to file for the next run
            with open('token.json', 'w') as token:
                token.write(creds.to_json())

    return creds

### **Processing Events and Executing them on the API**

Now that I have defined a method for authenticating the user, I can now sore the credentials and proceed onto processing the events into the calendar 

From here I can iterate over the months (line 12), and then over the events inside the month (line 16) to create an event I can insert into the calendar via calling the `execute()` method API. 

In [None]:
# Authenticating the user via local webserver and building the service
creds = authenticate()

# Try to create the service followed by the resulting events
try:
    # Calendar to insert event into: 'UoO Key Dates'
    calendar_id = 'fe37442ef0332cbc52ec1e0e61f1b966e5b7e3c5d4c1ab0ce860789253b2bc38@group.calendar.google.com'

    service = build('calendar', 'v3', credentials=creds)

    # Iterating over the data dict storing the months and events
    for month, events in filtered_events.items():

        print("\n", month)

        for date, event_title in events.items():
            event = {
                'summary': f'{event_title}',
                'location': 'University of Otago, 362 Leith Street, Dunedin North, Dunedin 9016, New Zealand',
                # 'description': '',
                'start': {
                    'date': f'{date.date()}',
                },
                'end': {
                    'date': f'{date.date()}',
                },
                'transparency': 'transparent',
                'visibility': 'public'
            }

            # Executing the event creation
            event = service.events().insert(calendarId=calendar_id, body=event).execute()
            print(f"Event created: {event.get('htmlLink')}")

            # TESTING: Printing the filtered events
            # print(event)

except HttpError as error:
    print(f'[ERROR] : {error}')


## **Conclution** 

In conclution, this mini project took way longer to create than it would have taken to manually input all the relevant events for each of the 3 years I am studying at Otago. None the less, It was a good little project to prictice using Python and learn a it more about web scraping, OAuth authentication and calling APIs. 

#### **Further Improvement**

Here are a couple of improvements I would like to make to this project in the future

1. Create improvements on the filter as there are still some irralevant events being added to the calendar
   -  Could be done through removing events that contain certain words from the already filtered events

###### **James Robiony-Rogers - February 2023**