<a href="https://colab.research.google.com/github/atlas-github/fi_analytics/blob/master/Chapter_4_Enrich_your_datasets_by_web_scraping_using_Beautiful_Soup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#APIs: Application Programming Interfaces

Bank Negara has [Open APIs](https://api.bnm.gov.my/portal) which are accessible by the public. The API enables users to extract datasets from BNM for various applications and systems. The example below shows how to use the API to obtain Base Rates.

In [1]:
!pip install requests

import requests

headers = {"Accept": "application/vnd.BNM.API.v1+json"}

response = requests.get("https://api.bnm.gov.my/public/base-rate/", headers = headers)

base_rate = response.json()
base_rate



{'data': [{'bank_code': 'BKKBMYKL',
   'bank_name': 'Bangkok Bank Berhad',
   'base_lending_rate': 6.12,
   'base_rate': 3.47,
   'indicative_eff_lending_rate': 4.67},
  {'bank_code': 'CIBBMYKL',
   'bank_name': 'CIMB Bank Berhad',
   'base_lending_rate': 5.85,
   'base_rate': 3,
   'indicative_eff_lending_rate': 3.75},
  {'bank_code': 'CITIMYKL',
   'bank_name': 'Citibank Berhad',
   'base_lending_rate': 5.8,
   'base_rate': 2.65,
   'indicative_eff_lending_rate': 3.45},
  {'bank_code': 'HLBBMYKL',
   'bank_name': 'Hong Leong Bank Malaysia Berhad',
   'base_lending_rate': 5.89,
   'base_rate': 2.88,
   'indicative_eff_lending_rate': 3.75},
  {'bank_code': 'HBMBMYKL',
   'bank_name': 'HSBC Bank Malaysia Berhad',
   'base_lending_rate': 5.74,
   'base_rate': 2.64,
   'indicative_eff_lending_rate': 3.75},
  {'bank_code': 'ICBKMYKL',
   'bank_name': 'Industrial and Commercial Bank of China (Malaysia) Berhad',
   'base_lending_rate': 5.7,
   'base_rate': 2.77,
   'indicative_eff_lending_ra

#Scrape

For purposes of this example, I’ll be using Google Colab. Do feel free to use any Python IDE you feel comfortable with. To get the site’s HTML code into your Python script, use Python’s `requests` library. 

To install the library, type and run the code below. 

In [2]:
!pip install requests



To retrieve the HTML code, you need just a few lines of code.

In [17]:
import requests

url = 'https://www.malaysiastock.biz/Listed-Companies.aspx?type=A&value=A'

response = requests.get(url)

This code runs a HTTP request to the given URL and retrieves the HTML data that the server returns and stores the HTML data in a Python object called `response`.

To check if the retrieval is successful, type response and you’ll get the following result.

In [18]:
response

<Response [200]>

A `<Response [200]>` indicates the retrieval is successful, while a `<Response [404]>` indicates there was an error.

#Parse

Now, to look at the HTML data retrieved. 

In [19]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
soup

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"><head id="Head1"><script data-cfasync="false" data-ezscrex="false" data-pagespeed-no-defer="" id="bsaihudashidsadhu" type="text/javascript">window.ezogtk="";if(typeof processGoogleToken!=="function"){processGoogleToken=function(a){window.ezogtk=a.newToken;processGoogleToken=undefined;var el=document.getElementById('bsaihudashidsadhu');if(el!==null){el.parentNode.removeChild(document.getElementById('bsaihudashidsadhu'))}
var eel=document.getElementById('ezintegrator');if(eel!==null){eel.parentNode.removeChild(document.getElementById('ezintegrator'))}}}</script>
<script id="ezintegrator" src="https://adservice.google.com/adsid/integrator.js?domain=www.malaysiastock.biz"></script>
<script type="text/javascript">
	var __banger_pmp_deals=function(){var d={17:{"DealId":17,"Floor":160},

You will notice there is stock related information in the HTML code, like the stock for 2 companies: `AASIA (7054)`, and `AAX (5238)`. Figures `49.50m` and `518.52m` indicate Market Cap, `0.08` and `0.13` indicate Last Price and so on.

To parse this data, we’ll be using a Python library called `BeautifulSoup`. This library has functions which can be used to explore the HTML data. 

In [20]:
table = soup.findAll('table', {'class': 'marketWatch'})
table

[<table cellspacing="0" class="marketWatch" id="MainContent_tStock">
 <tbody><tr>
 <th>Company</th>
 <th>Shariah</th>
 <th style="text-align:left;">Sector</th>
 <th>Market Cap</th>
 <th>Last Price</th>
 <th>PE</th>
 <th>DY</th>
 <th>ROE</th>
 </tr>
 <tr>
 <td width="250px"><h3><a href="Corporate-Infomation.aspx?securityCode=7054">AASIA (7054)</a><span class="marketBtn Blue marketType_ListedCompanies">MAIN</span></h3><br/><h3>ASTRAL ASIA BERHAD</h3></td>
 <td><img src="https://www.malaysiastock.biz/App_Themes/images/Yes.png" width="14"/></td>
 <td style="text-align:left;" width="120px"><h3>Plantation </h3></td>
 <td width="100px">69.30m</td>
 <td width="100px">0.11</td>
 <td width="100px">-</td>
 <td width="100px">0.00</td>
 <td width="100px">-5.18</td>
 </tr>
 <tr>
 <td width="250px"><h3><a href="Corporate-Infomation.aspx?securityCode=5238">AAX (5238)</a><span class="marketBtn Blue marketType_ListedCompanies">MAIN</span></h3><br/><h3>AIRASIA X BERHAD</h3></td>
 <td><img src="https://ww

You’ll see how to turn this data into a structured table next. 

#Pre-process

There is usually more than one way to turn the HTML data into a table. One of my go-to methods is to turn the HTML code into a `list` format, for easy slicing and subsetting of the dataset. This is achieved by using a list method. I would also get rid of any blank spaces that appear as a result of the conversion by using a loop. 

In [25]:
example = list(table[0])
example#2 = [x for x in example if x != "\n"]
#example2

['\n', <tbody><tr>
 <th>Company</th>
 <th>Shariah</th>
 <th style="text-align:left;">Sector</th>
 <th>Market Cap</th>
 <th>Last Price</th>
 <th>PE</th>
 <th>DY</th>
 <th>ROE</th>
 </tr>
 <tr>
 <td width="250px"><h3><a href="Corporate-Infomation.aspx?securityCode=7054">AASIA (7054)</a><span class="marketBtn Blue marketType_ListedCompanies">MAIN</span></h3><br/><h3>ASTRAL ASIA BERHAD</h3></td>
 <td><img src="https://www.malaysiastock.biz/App_Themes/images/Yes.png" width="14"/></td>
 <td style="text-align:left;" width="120px"><h3>Plantation </h3></td>
 <td width="100px">69.30m</td>
 <td width="100px">0.11</td>
 <td width="100px">-</td>
 <td width="100px">0.00</td>
 <td width="100px">-5.18</td>
 </tr>
 <tr>
 <td width="250px"><h3><a href="Corporate-Infomation.aspx?securityCode=5238">AAX (5238)</a><span class="marketBtn Blue marketType_ListedCompanies">MAIN</span></h3><br/><h3>AIRASIA X BERHAD</h3></td>
 <td><img src="https://www.malaysiastock.biz/App_Themes/images/Yes.png" width="14"/></td

Subsetting the second element using `example2[1]` and using the `get_text()` function with `separator="\n"` splits the text using the `\n` delimiter, like a comma separated value in Excel.  I then filter out all blanks in the list like below to get `entry_filtered`.

In [30]:
entry = example2[1].get_text(separator="\n").split("\n")
entry_filtered = [x for x in entry if x]
entry_filtered

IndexError: ignored

In [27]:
import requests
import pandas as pd
from bs4 import BeautifulSoup 

url =  'https://www.malaysiastock.biz/Listed-Companies.aspx?type=A&value=A'

response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
table = soup.findAll('table', {'class': 'marketWatch'})

example = list(table[0])
example2 = [x for x in example if x != "\n"]
example2

prices = []

for i in range(1, len(example2)):
    entry = example2[i].get_text(separator="\n").split("\n")
    entry_filtered = [x for x in entry if x]
    prices.append(entry_filtered)

complete_table = pd.DataFrame(prices)
complete_table.rename(columns = {0: "Company Code", 1: "Market", 2: "Company Name", 3: "Sector", 4: "Market Cap", 5: "Last Price", 6: "PE", 7: "DY", 8: "ROE"})

In [29]:
example2

[<tbody><tr>
 <th>Company</th>
 <th>Shariah</th>
 <th style="text-align:left;">Sector</th>
 <th>Market Cap</th>
 <th>Last Price</th>
 <th>PE</th>
 <th>DY</th>
 <th>ROE</th>
 </tr>
 <tr>
 <td width="250px"><h3><a href="Corporate-Infomation.aspx?securityCode=7054">AASIA (7054)</a><span class="marketBtn Blue marketType_ListedCompanies">MAIN</span></h3><br/><h3>ASTRAL ASIA BERHAD</h3></td>
 <td><img src="https://www.malaysiastock.biz/App_Themes/images/Yes.png" width="14"/></td>
 <td style="text-align:left;" width="120px"><h3>Plantation </h3></td>
 <td width="100px">69.30m</td>
 <td width="100px">0.11</td>
 <td width="100px">-</td>
 <td width="100px">0.00</td>
 <td width="100px">-5.18</td>
 </tr>
 <tr>
 <td width="250px"><h3><a href="Corporate-Infomation.aspx?securityCode=5238">AAX (5238)</a><span class="marketBtn Blue marketType_ListedCompanies">MAIN</span></h3><br/><h3>AIRASIA X BERHAD</h3></td>
 <td><img src="https://www.malaysiastock.biz/App_Themes/images/Yes.png" width="14"/></td>
 <td