# Assignment  2

## Due Date & Time: 29-Apr-2025 (11:59pm HKT)

**GOAL**

Obtain information on UG course exams for the ISOM department from the provided link, then store it in a dictionary named `exam_by_date` and display them according to date.

This question is further divided into three parts:
1) Creating a class for exam
2) Web scraping
3) Data processing

**ASSUMPTIONS:**
- You can assume that for any given course, all lecture sections share the same exam date, time, duration, and venue(s).


### Task 1: Class representation

 You are required to represent each course's exam as an instance of the class `Exam`.<br>Complete the following dunder methods for the class:
 - `__init__()`: constructor
 - `__repr__()`: official string representation
 - `__str__()`: invoked when an object is printed using print().

<br>

As an example, given the following information:
<div class='course'>
	<div class='courseanchor' style='position:relative; float:left; visibility:hidden;'><a></a></div>
	<div class='subject'><b>ISOM 0000 - Some Demo Course</b></div>
	<table class='sections'>
		<tbody>
			<tr>
				<th>&nbsp;</th>
				<th>Section</th>
				<th>Date</th>
				<th>Time</th>
				<th>Venue</th>
				<th>Remarks</th>
			</tr>
			<tr class='newsect secteven'>
				<td align='center'></td>
				<td align='center'>L1</td>
				<td class='date'>12-May-2025</td>
				<td class='time'>08:30AM - 10:30AM</td>
				<td class='venue'>Venue A<br>Venue B</td>
				<td class='remarks' align='center' colspan=''> </td>
			</tr>
				<tr class='newsect sectodd'>
				<td align='center'></td>
				<td align='center'>L2</td>
				<td class='date'>12-May-2025</td>
				<td class='time'>08:30AM - 10:30AM</td>
				<td class='venue'>Venue A<br>Venue B</td>
				<td class='remarks' align='center' colspan=''> </td>
			</tr>
		</tbody>
	</table>
</div>
<em>*Please refer to the actual website for the HTML code and exam information. The above representation may not be 100% accurate.</em>

The properties of this Exam object are:

| Property | Type | Example Value |
|----------|------|---------------|
|`course_code`|string|`'ISOM 0000'`|
|`course_name`|string|`'Some Demo Course'`|
|`date`|string|`'12-May-2025'`|
|`start`|string|`'08:30AM'`|
|`end`|string|`'10:30AM'`|
|`venues`|**list** of strings|`['Venue A', 'Venue B']`|

The constructor call should be:
```Python
some_exam = Exam('ISOM 0000', 'Some Demo Course', '12-May-2025', '08:30AM', '10:30AM', ['Venue A', 'Venue B'])
```

The official string representation (`__repr__()`) should return the course code of the exam, i.e. **ISOM 0000**

The object when printed (`__str__()`) should return exam information as `[course_code]: [date] [start]-[end] at [venues]`, e.g.:
```Python
'ISOM 0000: 12-May-2025 08:30AM-10:30AM at Venue A / Venue B'
```
*Note that if there are multiple venues, each venue should be separated by a forward slash (`/`). If there is only one venue, nothing special is required.





In [11]:
class Exam:

    # Complete the constructor for the Exam class below.
    def __init__(self, course_code, course_name, date, start, end, venues):
        # *Please use the same property names as shown in the example table above.*
        self.course_code = course_code
        self.course_name = course_name
        self.date = date
        self.start = start
        self.end = end
        self.venues = venues

    def __repr__(self):
        # repr: defines "official" representation of the object when you type its name, e.g., 'ISOM 0000'
        # Please complete the return value for this method.
        return f"{self.course_code}"


    def __str__(self):
        # str: defines what you see when you print it out, e.g, 'ISOM 0000: 12-May-2025 08:30AM-10:30AM at Venue A / Venue B'
        # Please complete the return value for this method
        return f"{self.course_code}: {self.date} {self.start}-{self.end} at {' / '.join(self.venues)}"


Test your code by instantiating the oject and show the representations.

In [12]:
some_exam = Exam('ISOM 0000', 'Some Demo Course', '12-May-2025', '08:30AM', '10:30AM', ['Venue A', 'Venue B'])
some_exam

ISOM 0000

In [13]:
print(some_exam)

ISOM 0000: 12-May-2025 08:30AM-10:30AM at Venue A / Venue B


### Task 2: Web Scraping

Your script should access the required page at [https://w5.ab.ust.hk/wex/cgi-bin/2430/subject/ISOM](https://w5.ab.ust.hk/wex/cgi-bin/2430/subject/ISOM).

You are required to scrape the page, create `Exam` objects, and store these objects in a list `exam_list`.

For each course on the examination schedule website:
- Your script should determine whether it:
    - is an undergraduate level course (numerical course code < 5000)
    - has an exam scheduled on the website (has a valid date and time on the website)
- If the course satisfies **both** requirements above, create an `Exam` object that stores all relevant data. Else, skip that course.
    - append the created object to the list `exam_list`.

In [14]:
# Starter code
from bs4 import BeautifulSoup
import requests

#### Important Note
> You may not be able to access the website using requests.get() due to the authorization issue.<br>If that is the case, please download <a>ISOM_2430_exam.html</a>, place it in the same directory (folder) as this notebook, and run the following code cell.

In [15]:
# Do not run this code cell if you are using requests.get().
# open() and requests.get() retrieve HTML in different ways
# open() is the built-in function to read and write local files
# so, you need to download the webpage manually and save it in your file system beforehand
# In contrast, requests.get() is to retrieve HTML over the Internet
html_code = open('ISOM_2430_exam.html', 'r')
soup = BeautifulSoup(html_code, 'html.parser') # <- use soup for your scraping operations, or you can change the variable name if you like.
html_code.close()

Hints:
1. Consider using `.split()` method to split course code and course name. You need to specify `maxsplit=1` to deal with the special case of more than 1 separator, e.g, 'ISOM 3380 - Advanced Network Management (CISCO - ICND)'
2. To extract all the venues (in the case of multiple venues), use a separator (e.g., `|`) when calling `get_text()` method to join multiple pieces of text from within the HTML element. Then split the extracted text into a list, using the same symbol (e.g., `|`) as the separator.

In [16]:
exam_list = []
# Write your code below
course_code_list = []
course_name_list = []
date_list = []
start_list = []
end_list = []
venues_list = []

for course_info in soup.find_all('div', {'class': 'course'}):
    course_title = course_info.find('div', {'class': 'subject'}).text.split(' - ', maxsplit=1)
    course_code_list.append(course_title[0])
    course_name_list.append(course_title[1])
    
    if course_info.find('td', {'class': 'remarks'}).text == ' ': # If there is no remarks, then there is an exam scheduled.
        
        time = course_info.find('td', {'class': 'time'}).text.split(' - ')
        
        date_list.append(course_info.find('td', {'class': 'date'}).text)
        start_list.append(time[0])
        end_list.append(time[1])
        venues_list.append(course_info.find('td', {'class': 'venue'}).get_text(separator='|').split('|'))
    else:
        date_list.append(None)
        start_list.append(None)
        end_list.append(None)
        venues_list.append(None)

for i in range(len(course_code_list)):
    if date_list[i] and int(course_code_list[i][5])<5: # make sure the course code is 1XXX-4XXX and there is an exam scheduled.
        exam_list.append(Exam(course_code_list[i], course_name_list[i], date_list[i], start_list[i], end_list[i], venues_list[i]))
# Actually can be done by list comprehension, but not doing that given the requirement said ''append'' and the empty list in the front.

exam_list

[ISOM 1380,
 ISOM 2010,
 ISOM 2500,
 ISOM 2600,
 ISOM 2700,
 ISOM 3180,
 ISOM 3210,
 ISOM 3230,
 ISOM 3260,
 ISOM 3310,
 ISOM 3320,
 ISOM 3360,
 ISOM 3370,
 ISOM 3380,
 ISOM 3400,
 ISOM 3530,
 ISOM 3710,
 ISOM 3770,
 ISOM 3780,
 ISOM 3900,
 ISOM 4040,
 ISOM 4300,
 ISOM 4540,
 ISOM 4750,
 ISOM 4780,
 ISOM 4830,
 ISOM 4840]

### Task 3: Data Processing

Using the `Exam` objects previously created and stored in `exam_list`, create a dictionary `exam_by_date`, where each key is an exam date, and each value is a list of exams held on that day. The keys should be sorted from the earliest to the latest. Then, print out the list of exams in the format provided below.

Example: Given the following three objects:

| Property | Value |
|----------|---------------|
|`course_code`|`'ISOM 0000'`|
|`course_name`|`'Some Demo Course'`|
|`date`|`'12-May-2025'`|
|`start`|`'08:30AM'`|
|`end`|`'10:30AM'`|
|`venues`|`['Venue A', 'Venue B']`|


 | Property | Value |
|----------|---------------|
|`course_code`|`'ISOM 1111'`|
|`course_name`|`'Other Demo Course'`|
|`date`|`'15-May-2025'`|
|`start`|`'08:30AM'`|
|`end`|`'10:30AM'`|
|`venues`|`['Venue C']`|


| Property | Value |
|----------|---------------|
|`course_code`|`'ISOM 2222'`|
|`course_name`|`'Third Demo Course'`|
|`date`|`'12-May-2025'`|
|`start`|`'12:30PM'`|
|`end`|`'02:30PM'`|
|`venues`|`['Venue A']`|

The dictionary `exam_by_date` should be:

<p style="font-family: Consolas, 'Courier New', monospace">{<br>&nbsp;&nbsp;&nbsp;&nbsp;'12-May-2025': [ISOM 0000, ISOM 2222],<br>&nbsp;&nbsp;&nbsp;&nbsp;'15-May-2025': [ISOM 1111]<br>}</p>

In [17]:
exam_by_date = {}
# Write your code below
exam_unordered = {}
for exam in exam_list:
    if exam.date not in exam_unordered:
        exam_unordered[exam.date] = [exam]
    else:
        exam_unordered[exam.date].append(exam)

exam_by_date = {exam_date: exam for exam_date, exam in sorted(exam_unordered.items(), key=lambda x: x[0])} # All exam scheduled on XX-May-2025, therefore we can simply sort by the whole date string

exam_by_date

{'17-May-2025': [ISOM 3360, ISOM 3370, ISOM 3780],
 '19-May-2025': [ISOM 3260, ISOM 3900],
 '20-May-2025': [ISOM 1380, ISOM 3530, ISOM 3770],
 '21-May-2025': [ISOM 3710],
 '22-May-2025': [ISOM 2500, ISOM 3320, ISOM 4780],
 '23-May-2025': [ISOM 3310, ISOM 3380, ISOM 4840],
 '24-May-2025': [ISOM 2010, ISOM 3400, ISOM 4750],
 '26-May-2025': [ISOM 2700, ISOM 3230, ISOM 4300, ISOM 4540, ISOM 4830],
 '27-May-2025': [ISOM 3180, ISOM 4040],
 '28-May-2025': [ISOM 2600],
 '29-May-2025': [ISOM 3210]}