# Assignment: Extracting Static WebPage

Extract information about “วันพระ” for 3 years from:
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2565.aspx
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2566.aspx
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2567.aspx


Note that you can use dateparse package to parse Thai date.  First, we will have to install the package, this is for Google Colab users.  Otherwise, installing via command line is recommended (pip or conda).

In [48]:
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    %pip install dateparser

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [49]:
import dateparser

To convert from thai text date string, we will use the parse method.  Note that the parse method assumes the BC year, not BE.  Thus, we will have to subtract 543 from the year.  In addition, weekday() returns day of week with 0=Monday, ..., 6=Sunday.

In [50]:
dt = dateparser.parse('วันศุกร์ที่ 17 มกราคม 2563')

# this will print out weekday == 0 (Monday)
print(dt)
print(dt.weekday())

# this will print out weekday == 4 (Friday)
dt = dt.replace(year=dt.year-543)
print(dt)
print(dt.weekday())

2563-01-17 00:00:00
0
2020-01-17 00:00:00
4


In [51]:
dt = dateparser.parse('วันเสาร์ที่ 21 กันยายน 2564')
dt = dt.replace(year=dt.year-543)
print(dt)
print(dt.weekday())

2021-09-21 00:00:00
1


Count the distribution of number of week days that are “วันพระ” for all three years and answer the following questions:

## How many วันพระ in total (of 3 years)?

In [56]:
# <div class="bud-day"><div class="bud-day-col">วันอาทิตย์ที่ 2 มกราคม 2565</div><div class="bud-day-col">แรม ๑๔ ค่ำ เดือนอ้าย(๑) ปีฉลู</div><div class="bud-day-col"></div></div>
import requests
from bs4 import BeautifulSoup

years = ['2565', '2566', '2567']
num_bud_day = 0
bud_days_html = []
for year in years:
    url = f"https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.{year}.aspx"
    response = requests.get(url)
    response.encoding = 'utf-8' 
    soup = BeautifulSoup(response.content, 'html.parser')

    days = soup.find_all('div', {'class': 'bud-day'})
    bud_days_html.append(days)
    num_days = len(days)
    num_bud_day += num_days
print("Number of วันพระ in total (of 3 years):", num_bud_day)

Number of วันพระ in total (of 3 years): 152


## How many days in total (of 3 years) that วันพระ is Monday?

In [53]:
ans=0
for year in bud_days_html:
    for day in year:
        soup = BeautifulSoup(str(day), 'html.parser')
        date_text = soup.find('div', class_='bud-day-col').text
        dt = dateparser.parse(date_text)
        dt = dt.replace(year=dt.year-543)
        if dt.weekday() == 0: ans+=1

print("Number of days in total (of 3 years) that วันพระ is Monday:", ans)

Number of days in total (of 3 years) that วันพระ is Monday: 21


## Which day of the week that has the minimum number of วันพระ?

In [54]:
from collections import defaultdict

days_dict = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}
weekdays = defaultdict(lambda: 0)
for year in bud_days_html:
    for day in year:
        soup = BeautifulSoup(str(day), 'html.parser')
        date_text = soup.find('div', class_='bud-day-col').text
        dt = dateparser.parse(date_text)
        dt = dt.replace(year=dt.year-543)
        weekdays[dt.weekday()]+=1

min_key = min(weekdays, key=lambda k: weekdays[k])
print("Day of the week that has the minimum number of วันพระ is:", days_dict[min_key])

Day of the week that has the minimum number of วันพระ is: Tuesday


## Which day of the week that has the maximum number of วันพระ?

In [55]:
max_key = max(weekdays, key=lambda k: weekdays[k])
print("Day of the week that has the maximum number of วันพระ is:", days_dict[max_key])


Day of the week that has the maximum number of วันพระ is: Sunday
