# Assignment: Extracting Static WebPage

Extract information about “วันพระ” for 3 years from:
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2565.aspx
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2566.aspx
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2567.aspx


Note that you can use dateparse package to parse Thai date.  First, we will have to install the package, this is for Google Colab users.  Otherwise, installing via command line is recommended (pip or conda).

In [64]:
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    %pip install dateparser selenium chromedriver_autoinstall

In [65]:
import dateparser

To convert from thai text date string, we will use the parse method.  Note that the parse method assumes the BC year, not BE.  Thus, we will have to subtract 543 from the year.  In addition, weekday() returns day of week with 0=Monday, ..., 6=Sunday.

In [66]:
dt = dateparser.parse('วันศุกร์ที่ 17 มกราคม 2563')

# this will print out weekday == 0 (Monday)
print(dt)
print(dt.weekday())

# this will print out weekday == 4 (Friday)
dt = dt.replace(year=dt.year-543)
print(dt)
print(dt.weekday())

2563-01-17 00:00:00
0
2020-01-17 00:00:00
4


In [67]:
dt = dateparser.parse('วันเสาร์ที่ 21 กันยายน 2564')
dt = dt.replace(year=dt.year-543)
print(dt)
print(dt.weekday())

2021-09-21 00:00:00
1


Count the distribution of number of week days that are “วันพระ” for all three years and answer the following questions:

## How many วันพระ in total (of 3 years)?

In [68]:
from selenium import webdriver

urls = ["https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2565.aspx",
"https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2566.aspx",
"https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2567.aspx"]

driver = webdriver.Chrome()

year = 2565
sum = 0
for url in urls:
    driver.get(url)

    data = driver.find_elements("class name", "bud-day-col")

    bud_days = [bud_day_col for bud_day_col in data if str(year) in bud_day_col.text]
    year += 1
    sum += len(bud_days)
print("Total buddha days: ", sum,  "days")
driver.quit()

Total buddha days:  152 days


## How many days in total (of 3 years) that วันพระ is Monday?

In [69]:
from selenium import webdriver

urls = ["https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2565.aspx",
"https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2566.aspx",
"https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2567.aspx"]

driver = webdriver.Chrome()

year = 2565
sum = 0
for url in urls:
    driver.get(url)

    data = driver.find_elements("class name", "bud-day-col")

    bud_days = [bud_day_col for bud_day_col in data if str(year) in bud_day_col.text]
    monday_bud_days = []
    for bud_day in bud_days:
        dt = dateparser.parse(bud_day.text)
        dt = dt.replace(year=dt.year-543)
        day = dt.weekday()
        
        if day == 0:
            sum += 1
    year += 1

print("Total buddha days on Monday: ", sum)
driver.quit()

Total buddha days on Monday:  21


## Which day of the week that has the minimum number of วันพระ?

In [70]:
from selenium import webdriver

urls = ["https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2565.aspx",
"https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2566.aspx",
"https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2567.aspx"]

day_counts = {
    'Sunday': 0,
    'Monday': 0,
    'Tuesday': 0,
    'Wednesday': 0,
    'Thursday': 0,
    'Friday': 0,
    'Saturday': 0
}

map_day = {
    0: "Monday",
    1: "Tuesday",
    2: "Wednesday",
    3: "Thursday",
    4: "Friday",
    5: "Saturday",
    6: "Sunday"
}

driver = webdriver.Chrome()

year = 2565

for url in urls:
    driver.get(url)

    data = driver.find_elements("class name", "bud-day-col")

    bud_days = [bud_day_col for bud_day_col in data if str(year) in bud_day_col.text]

    for bud_day in bud_days:
        dt = dateparser.parse(bud_day.text)
        dt = dt.replace(year=dt.year-543)

        day_of_week = map_day[dt.weekday()]
        day_counts[day_of_week] += 1
    year += 1

min_day = min(day_counts, key=day_counts.get)
print(f"The day of the week that has the minimum number of buddha days is {min_day} with {day_counts[min_day]} days")
driver.quit()


The day of the week that has the minimum number of buddha days is Tuesday with 20 days


## Which day of the week that has the maximum number of วันพระ?

In [71]:
day_counts = {
    'Sunday': 0,
    'Monday': 0,
    'Tuesday': 0,
    'Wednesday': 0,
    'Thursday': 0,
    'Friday': 0,
    'Saturday': 0
}

map_day = {
    0: "Monday",
    1: "Tuesday",
    2: "Wednesday",
    3: "Thursday",
    4: "Friday",
    5: "Saturday",
    6: "Sunday"
}

driver = webdriver.Chrome()

year = 2565

for url in urls:
    driver.get(url)

    data = driver.find_elements("class name", "bud-day-col")

    bud_days = [bud_day_col for bud_day_col in data if str(year) in bud_day_col.text]

    for bud_day in bud_days:
        dt = dateparser.parse(bud_day.text)
        dt = dt.replace(year=dt.year-543)

        day_of_week = map_day[dt.weekday()]
        day_counts[day_of_week] += 1
    year += 1

max_day = max(day_counts, key=day_counts.get)
print(f"The day of the week that has the maximum number of buddha days is {max_day} with {day_counts[max_day]} days")
driver.quit()


The day of the week that has the maximum number of buddha days is Sunday with 24 days
