# Extracting Company Symbols from BURSA Marketplace

To begin, financial stock website typically uses a `custom and universal symbol to identify a company`. It mainly contains combination of characters and numbers that uniquely identify a company where users can easily reference when browsing various stock marketplace websites such as Yahoo Finance, NASDAQ, KLSE Screener and more. 

With that, a method to `extract all of the stock symbols and its basic information` is essential before querying through their stocks, dividends and other ratios. BURSA Marketplace is the main stock exchange in Malaysia that offers a list of company symbols through a PDF file. The file can be accessed with the link below:

https://www.bursamalaysia.com/sites/5d809dcf39fba22790cad230/assets/641c0ff15b711a55808bf94e/List_of_Companies_2023-03-23.pdf

A Python library called `Tabula` provides exactly the functions needed as it can read a PDF file, and extract the contents of the table into a well-organized Dataframe. 

In [5]:
import tabula
import pandas as pd
import yfinance as yf

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

In [3]:
pdf_to_read = "List_of_Companies_2023-03-23.pdf"


# Use tabula.read_pdf to extract the table
tables = tabula.read_pdf(pdf_to_read, pages='2-27', lattice=all)

# Loop through the list of DataFrames and remove the first two rows from each
tables = [table.iloc[1:] for table in tables]

# Table extends up to 25 arrays for every page, concatenate to combine the table into one dataframe
table = pd.concat(tables, ignore_index=True)

table

Unnamed: 0.1,LISTING TEAM IN CHARGE,Unnamed: 0,Unnamed: 1,Unnamed: 2
0,1,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250,3
1,2,ABF MALAYSIA BOND INDEX FUND,0800EA,2
2,3,ABLE GLOBAL BERHAD,7167,3
3,4,ABLEGROUP BERHAD,7086,2
4,5,ABM FUJIYA BERHAD,5198,2
5,6,ACE INNOVATE ASIA BERHAD,03028,4
6,7,ACME HOLDINGS BERHAD,7131,1
7,8,ACO GROUP BERHAD,0218,4
8,9,ADVANCE INFORMATION MARKETING BERHAD,0122,4
9,10,ADVANCE SYNERGY BERHAD,1481,3


In [4]:
del table['LISTING TEAM IN CHARGE']
del table['Unnamed: 2']

table

Unnamed: 0.1,Unnamed: 0,Unnamed: 1
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250
1,ABF MALAYSIA BOND INDEX FUND,0800EA
2,ABLE GLOBAL BERHAD,7167
3,ABLEGROUP BERHAD,7086
4,ABM FUJIYA BERHAD,5198
5,ACE INNOVATE ASIA BERHAD,03028
6,ACME HOLDINGS BERHAD,7131
7,ACO GROUP BERHAD,0218
8,ADVANCE INFORMATION MARKETING BERHAD,0122
9,ADVANCE SYNERGY BERHAD,1481


In [5]:
# Rename the column headers
table.rename(columns = {'Unnamed: 0': 'stock_name', 'Unnamed: 1': 'stock_code'}, inplace = True)

table

Unnamed: 0,stock_name,stock_code
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250
1,ABF MALAYSIA BOND INDEX FUND,0800EA
2,ABLE GLOBAL BERHAD,7167
3,ABLEGROUP BERHAD,7086
4,ABM FUJIYA BERHAD,5198
5,ACE INNOVATE ASIA BERHAD,03028
6,ACME HOLDINGS BERHAD,7131
7,ACO GROUP BERHAD,0218
8,ADVANCE INFORMATION MARKETING BERHAD,0122
9,ADVANCE SYNERGY BERHAD,1481


In [8]:
# Use the duplicated() method to identify duplicates in the specified column
duplicates = table[table.duplicated(subset="stock_code", keep=False)]

# Print the duplicate values
print(duplicates)

                            stock_name stock_code
450      KLCC PROPERTY HOLDINGS BERHAD     5235SS
451  KLCC REAL ESTATE INVESTMENT TRUST     5235SS


In [12]:
table = table.drop_duplicates(subset='stock_code')
duplicates = table[table.duplicated(subset="stock_code", keep=False)]
print(duplicates)

Empty DataFrame
Columns: [stock_name, stock_code]
Index: []


In [13]:
table

Unnamed: 0,stock_name,stock_code
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250
1,ABF MALAYSIA BOND INDEX FUND,0800EA
2,ABLE GLOBAL BERHAD,7167
3,ABLEGROUP BERHAD,7086
4,ABM FUJIYA BERHAD,5198
5,ACE INNOVATE ASIA BERHAD,03028
6,ACME HOLDINGS BERHAD,7131
7,ACO GROUP BERHAD,0218
8,ADVANCE INFORMATION MARKETING BERHAD,0122
9,ADVANCE SYNERGY BERHAD,1481


In [14]:
# Specify the file path where you want to save the CSV file
csv_file_path = 'exports/stocks.csv'

# Use the to_csv() method to export the DataFrame to a CSV file
table.to_csv(csv_file_path, index=False)  # Set index=False to exclude the DataFrame index from the CSV

print(f"DataFrame saved to {csv_file_path}")

DataFrame saved to exports/output.csv


# Part II: Adding basic stock information

In [85]:
testdf = pd.read_csv('exports/stocks.csv')

In [86]:

# Add ".KL" to each value in the 'stock_code' column
stock_code = testdf['stock_code'].apply(lambda x: x + ".KL")
stock_code = list(stock_code)[:5]
stock_code

['5250.KL', '0800EA.KL', '7167.KL', '7086.KL', '5198.KL']

In [100]:
import pandas as pd
import yfinance as yf

# Assuming you have already read the CSV file into 'testdf'
df = pd.read_csv('exports/stocks.csv')

df['stock_code'] = df['stock_code'].apply(lambda x: x + ".KL")

for symbol in df['stock_code']:
    try:
        ticker = yf.Ticker(symbol)
        ticker_info = ticker.info

        # Check if 'address1' and 'address2' are present in ticker_info
        if 'address1' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'address1'] = ticker_info['address1']
        if 'address2' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'address2'] = ticker_info['address2']
        if 'city' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'city'] = ticker_info['city']
        if 'zip' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'zip'] = ticker_info['zip']
        if 'phone' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'phone'] = ticker_info['phone']
        if 'website' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'website'] = ticker_info['website']
        if 'industry' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'industry'] = ticker_info['industry']
        if 'sector' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'sector'] = ticker_info['sector']
        if 'fullTimeEmployees' in ticker_info:
            df.loc[df['stock_code'] == symbol, 'fullTimeEmployees'] = ticker_info['fullTimeEmployees']

    except Exception as e:
        print(f"Error fetching data for symbol {symbol}: {e}")

# Now, 'testdf' contains the 'address1' and 'address2' columns with data


Error fetching data for symbol 8931.KL: 404 Client Error: Not Found for url: https://query2.finance.yahoo.com/v6/finance/quoteSummary/8931.KL?modules=financialData&modules=quoteType&modules=defaultKeyStatistics&modules=assetProfile&modules=summaryDetail&ssl=true
Error fetching data for symbol 6645.KL: 404 Client Error: Not Found for url: https://query2.finance.yahoo.com/v6/finance/quoteSummary/6645.KL?modules=financialData&modules=quoteType&modules=defaultKeyStatistics&modules=assetProfile&modules=summaryDetail&ssl=true
Error fetching data for symbol 5270.KL: 404 Client Error: Not Found for url: https://query2.finance.yahoo.com/v6/finance/quoteSummary/5270.KL?modules=financialData&modules=quoteType&modules=defaultKeyStatistics&modules=assetProfile&modules=summaryDetail&ssl=true
Error fetching data for symbol 5268.KL: 404 Client Error: Not Found for url: https://query2.finance.yahoo.com/v6/finance/quoteSummary/5268.KL?modules=financialData&modules=quoteType&modules=defaultKeyStatistics&

In [101]:
df

Unnamed: 0,stock_name,stock_code,address1,address2,city,zip,phone,website,industry,sector,fullTimeEmployees
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250.KL,Level 3A,"Podium Block Plaza Berjaya No. 12, Jalan Imbi",Kuala Lumpur,55100.0,60 3 2142 1136,https://www.7eleven.com.my,Grocery Stores,Consumer Defensive,10511.0
1,ABF MALAYSIA BOND INDEX FUND,0800EA.KL,,,,,,,,,
2,ABLE GLOBAL BERHAD,7167.KL,PTD 124298,Jalan Kempas Lama Kampung Seelong Jaya,Sekudai,81300.0,60 7 599 8990,https://www.ableglobalbhd.com.my,Packaged Foods,Consumer Defensive,199.0
3,ABLEGROUP BERHAD,7086.KL,Block D4-U2-10,"Level 2 Solaris Dutamas No. 1, Jalan Dutamas 1",Kuala Lumpur,50480.0,60 3 6207 8186,https://www.ablegroup.com.my,Building Products & Equipment,Industrials,
4,ABM FUJIYA BERHAD,5198.KL,"Lot 2224, Section 66",Lorong Pangkalan Off Jalan Pangkalan Pending Industrial Estate,Kuching,93450.0,60 8 233 3344,https://www.abmfujiya.com.my,Auto Parts,Consumer Cyclical,
5,ACE INNOVATE ASIA BERHAD,03028.KL,No. 19-1,Jalan USJ 10/1D Taipan Business Centre,Subang Jaya,47620.0,60 3 8081 7198,https://goinno2u.com,Capital Markets,Financial Services,
6,ACME HOLDINGS BERHAD,7131.KL,488A-16-01 Office Tower,Kompleks Midlands Park Jalan Burma,Georgetown,10350.0,60 4 210 9911,https://acmeholdings.com.my,Packaging & Containers,Consumer Cyclical,
7,ACO GROUP BERHAD,0218.KL,"PLO 264, No. 14",Jalan Firma 3 Kawasan Perindustrian Tebrau 4,Johor Bahru,81100.0,60 7 361 9399,https://www.acogroup.com.my,Electronics & Computer Distribution,Technology,150.0
8,ADVANCE INFORMATION MARKETING BERHAD,0122.KL,"No. 18, Jalan Balam",,Kuala Lumpur,51100.0,60 3 4043 2699,https://www.aim-net.com.my,Specialty Business Services,Industrials,
9,ADVANCE SYNERGY BERHAD,1481.KL,Synergy 9,Ground Floor 9 Jalan Kajibumi U1/70 Temasya Glenmarie,Shah Alam,40150.0,60 3 5192 8822,https://www.asb.com.my,Travel Services,Consumer Cyclical,607.0


In [102]:
df["stock_code"] = df["stock_code"].str[:-3]
df['phone'] = df['phone'].str.replace(" ", "")
df

Unnamed: 0,stock_name,stock_code,address1,address2,city,zip,phone,website,industry,sector,fullTimeEmployees
0,7-ELEVEN MALAYSIA HOLDINGS BERHAD,5250,Level 3A,"Podium Block Plaza Berjaya No. 12, Jalan Imbi",Kuala Lumpur,55100.0,60321421136,https://www.7eleven.com.my,Grocery Stores,Consumer Defensive,10511.0
1,ABF MALAYSIA BOND INDEX FUND,0800EA,,,,,,,,,
2,ABLE GLOBAL BERHAD,7167,PTD 124298,Jalan Kempas Lama Kampung Seelong Jaya,Sekudai,81300.0,6075998990,https://www.ableglobalbhd.com.my,Packaged Foods,Consumer Defensive,199.0
3,ABLEGROUP BERHAD,7086,Block D4-U2-10,"Level 2 Solaris Dutamas No. 1, Jalan Dutamas 1",Kuala Lumpur,50480.0,60362078186,https://www.ablegroup.com.my,Building Products & Equipment,Industrials,
4,ABM FUJIYA BERHAD,5198,"Lot 2224, Section 66",Lorong Pangkalan Off Jalan Pangkalan Pending Industrial Estate,Kuching,93450.0,6082333344,https://www.abmfujiya.com.my,Auto Parts,Consumer Cyclical,
5,ACE INNOVATE ASIA BERHAD,03028,No. 19-1,Jalan USJ 10/1D Taipan Business Centre,Subang Jaya,47620.0,60380817198,https://goinno2u.com,Capital Markets,Financial Services,
6,ACME HOLDINGS BERHAD,7131,488A-16-01 Office Tower,Kompleks Midlands Park Jalan Burma,Georgetown,10350.0,6042109911,https://acmeholdings.com.my,Packaging & Containers,Consumer Cyclical,
7,ACO GROUP BERHAD,0218,"PLO 264, No. 14",Jalan Firma 3 Kawasan Perindustrian Tebrau 4,Johor Bahru,81100.0,6073619399,https://www.acogroup.com.my,Electronics & Computer Distribution,Technology,150.0
8,ADVANCE INFORMATION MARKETING BERHAD,0122,"No. 18, Jalan Balam",,Kuala Lumpur,51100.0,60340432699,https://www.aim-net.com.my,Specialty Business Services,Industrials,
9,ADVANCE SYNERGY BERHAD,1481,Synergy 9,Ground Floor 9 Jalan Kajibumi U1/70 Temasya Glenmarie,Shah Alam,40150.0,60351928822,https://www.asb.com.my,Travel Services,Consumer Cyclical,607.0


In [104]:
data_types = df.dtypes
print(data_types)


stock_name            object
stock_code            object
address1              object
address2              object
city                  object
zip                   object
phone                 object
website               object
industry              object
sector                object
fullTimeEmployees    float64
dtype: object


In [105]:
# Specify the file path where you want to save the CSV file
csv_file_path = 'exports/stocks_with_information.csv'

# Use the to_csv() method to export the DataFrame to a CSV file
df.to_csv(csv_file_path, index=False)  # Set index=False to exclude the DataFrame index from the CSV

print(f"DataFrame saved to {csv_file_path}")

DataFrame saved to exports/stocks_with_information.csv
