# Extracting Company Symbols from BURSA Marketplace

To begin, financial stock website typically uses a `custom and universal symbol to identify a company`. It mainly contains combination of characters and numbers that uniquely identify a company where users can easily reference when browsing various stock marketplace websites such as Yahoo Finance, NASDAQ, KLSE Screener and more. 

With that, a method to `extract all of the stock symbols and its basic information` is essential before querying through their stocks, dividends and other ratios. BURSA Marketplace is the main stock exchange in Malaysia that offers a list of company symbols through a PDF file. The file can be accessed with the link below:

https://www.bursamalaysia.com/sites/5d809dcf39fba22790cad230/assets/641c0ff15b711a55808bf94e/List_of_Companies_2023-03-23.pdf

A Python library called `Tabula` provides exactly the functions needed as it can read a PDF file, and extract the contents of the table into a well-organized Dataframe. 

In [2]:
import tabula
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

In [3]:
pdf_to_read = "List_of_Companies_2023-03-23.pdf"


# Use tabula.read_pdf to extract the table
tables = tabula.read_pdf(pdf_to_read, pages='2-27', lattice=all)

# Loop through the list of DataFrames and remove the first two rows from each
tables = [table.iloc[1:] for table in tables]

# Table extends up to 25 arrays for every page, concatenate to combine the table into one dataframe
table = pd.concat(tables, ignore_index=True)

table

Unnamed: 0.1,LISTING TEAM IN CHARGE,Unnamed: 0,Unnamed: 1,Unnamed: 2
0,1,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250,3
1,2,ABF MALAYSIA BOND INDEX FUND,0800EA,2
2,3,ABLE GLOBAL BERHAD,7167,3
3,4,ABLEGROUP BERHAD,7086,2
4,5,ABM FUJIYA BERHAD,5198,2
5,6,ACE INNOVATE ASIA BERHAD,03028,4
6,7,ACME HOLDINGS BERHAD,7131,1
7,8,ACO GROUP BERHAD,0218,4
8,9,ADVANCE INFORMATION MARKETING BERHAD,0122,4
9,10,ADVANCE SYNERGY BERHAD,1481,3


In [4]:
del table['LISTING TEAM IN CHARGE']
del table['Unnamed: 2']

table

Unnamed: 0.1,Unnamed: 0,Unnamed: 1
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250
1,ABF MALAYSIA BOND INDEX FUND,0800EA
2,ABLE GLOBAL BERHAD,7167
3,ABLEGROUP BERHAD,7086
4,ABM FUJIYA BERHAD,5198
5,ACE INNOVATE ASIA BERHAD,03028
6,ACME HOLDINGS BERHAD,7131
7,ACO GROUP BERHAD,0218
8,ADVANCE INFORMATION MARKETING BERHAD,0122
9,ADVANCE SYNERGY BERHAD,1481


In [5]:
# Rename the column headers
table.rename(columns = {'Unnamed: 0': 'stock_name', 'Unnamed: 1': 'stock_code'}, inplace = True)

table

Unnamed: 0,stock_name,stock_code
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250
1,ABF MALAYSIA BOND INDEX FUND,0800EA
2,ABLE GLOBAL BERHAD,7167
3,ABLEGROUP BERHAD,7086
4,ABM FUJIYA BERHAD,5198
5,ACE INNOVATE ASIA BERHAD,03028
6,ACME HOLDINGS BERHAD,7131
7,ACO GROUP BERHAD,0218
8,ADVANCE INFORMATION MARKETING BERHAD,0122
9,ADVANCE SYNERGY BERHAD,1481


In [8]:
# Use the duplicated() method to identify duplicates in the specified column
duplicates = table[table.duplicated(subset="stock_code", keep=False)]

# Print the duplicate values
print(duplicates)

                            stock_name stock_code
450      KLCC PROPERTY HOLDINGS BERHAD     5235SS
451  KLCC REAL ESTATE INVESTMENT TRUST     5235SS


In [12]:
table = table.drop_duplicates(subset='stock_code')
duplicates = table[table.duplicated(subset="stock_code", keep=False)]
print(duplicates)

Empty DataFrame
Columns: [stock_name, stock_code]
Index: []


In [13]:
table

Unnamed: 0,stock_name,stock_code
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250
1,ABF MALAYSIA BOND INDEX FUND,0800EA
2,ABLE GLOBAL BERHAD,7167
3,ABLEGROUP BERHAD,7086
4,ABM FUJIYA BERHAD,5198
5,ACE INNOVATE ASIA BERHAD,03028
6,ACME HOLDINGS BERHAD,7131
7,ACO GROUP BERHAD,0218
8,ADVANCE INFORMATION MARKETING BERHAD,0122
9,ADVANCE SYNERGY BERHAD,1481


In [14]:
# Specify the file path where you want to save the CSV file
csv_file_path = 'exports/stocks.csv'

# Use the to_csv() method to export the DataFrame to a CSV file
table.to_csv(csv_file_path, index=False)  # Set index=False to exclude the DataFrame index from the CSV

print(f"DataFrame saved to {csv_file_path}")

DataFrame saved to exports/output.csv
