### Scraping NLB to know the books that I borrowed

In [5]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
import re
import os
import time

import warnings
import pygsheets
import numpy as np
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup as bs

# Some notebook configs
warnings.filterwarnings('ignore')
pd.set_option('display.max_colwidth', 1000)

#### Load in self created functions 

In [8]:
from nlb_fun import *

In [9]:
browser = activate_chrome_selenium_latest(is_headless=False)



Current google-chrome version is 93.0.4577
Get LATEST driver version for 93.0.4577
There is no [mac64] chromedriver for browser 93.0.4577 in cache
Get LATEST driver version for 93.0.4577
Trying to download new driver from https://chromedriver.storage.googleapis.com/93.0.4577.63/chromedriver_mac64.zip
Driver has been saved in cache [/Users/cliff/.wdm/drivers/chromedriver/mac64/93.0.4577.63]


### Log in first! 

In [10]:
auth_csv_file = os.environ['nlb_login']

info = pd.read_csv(auth_csv_file)
account_name = info['values'][0]
password = info['values'][1]

browser = log_in_nlb(browser, account_name, password)

### Loop through the pages! 

In [11]:
loans_link = "https://www.nlb.gov.sg/mylibrary/Loans"
browser.get(loans_link)

time.sleep(5)

soup = bs(browser.page_source, "html5lib")

In [12]:
table_col = list()
table_cells = list()

for table in soup.find_all("table", class_="table table-bordered table-striped table-list bg-white"):
    for row in table.find_all('th'):
        table_col.append(row.text)
    
    for row in table.find_all('td'):
        table_cells.append(row.text)

table_col = table_col[:5]

In [13]:
browser.close()

### Preparing raw data to push into G Drive

In [14]:
books = pd.DataFrame(np.array(table_cells).reshape(int(len(table_cells)/5), 5))

books.columns = ['no', 'title', 'code', 'due', 'renewed']
books = books[['title', 'code', 'due']]

for i in ['title', 'code', 'due']:
    books[i] = [re.sub(' +', ' ', i.replace("\n", "")).strip() for i in books[i]]

books['title'] = [i.replace("Title: ", "").strip() for i in books['title']]
books['code'] = [i.replace("Barcode: ", "").strip() for i in books['code']]
books['due'] = [i.replace("Due on ", "") .strip() for i in books['due']]

In [15]:
books

Unnamed: 0,title,code,due
0,High-performance consulting skills : the internal consultant's guide to value-added performance,B17368128C,05 Oct 2021
1,打造百岁健步脚 / Da zao bai sui jian bu jiao,B36764855K,09 Oct 2021
2,"Join, or die : digital advertising in the age of automation",B36318214F,10 Oct 2021
3,Robust Python : write clean and maintainable code,B36770177E,13 Oct 2021
4,Expert : Understanding the path to mastery,B36758001H,15 Oct 2021
5,Advanced JavaScript : speed up web development with the powerful features and benefits of JavaScript,B34665519F,17 Oct 2021
6,Design things that make sense : tech. innovator's guide,B36779492C,21 Oct 2021
7,Conversational Japanese : the right word at the right time,B34335799J,23 Oct 2021


### Authenticate into G Drive and push data into G Drive

In [16]:
google_auth = os.environ['gsheet_cred']
gc = pygsheets.authorize(service_file=google_auth)

sh = gc.open('NLB Project')
wks = sh.worksheet_by_title("Current_borrowed")
wks.clear('A2:D17')

wks.update_value('D2', "=ARRAYFORMULA(C2:C{}-E1)".format(books.shape[0] + 1))
wks.update_value('C19', "Average:")
wks.update_value('D19', "=AVERAGE(D2:17)")

wks.set_dataframe(books,(1,1))

### [Link](https://docs.google.com/spreadsheets/d/1s5oYU59jyU_QO3IIhCClyWGoC_MpW9L_h4l4djDUKO0/edit#gid=1021888748) to my Google Sheet