### Codes to read directly from Market Index website
##### Put into dataframe
###### Do some data cleaning
######  1. Get rid of $ and convert to float from Price Column
######  2. Get rid of + and % and convert to float from 1 Day Column
######  3. Get rid of + and % and convert to float from 1 Week Column
######  4. Get rid of + and % and convert to float from 1 Month Column
######  5. Get rid of + and % and convert to float from 1 Year Column
###### Export the clean data to CSV.

In [1]:
# Dependencies
import pandas as pd
from splinter import Browser
from bs4 import BeautifulSoup

Find all the data pertaining to ASX tickers listed in this website.

https://www.marketindex.com.au/asx-listed-companies

#### Codes to retrieve all the ASX data direct from Market Index website and put into dataframe

In [2]:
# Set up Splinter
browser = Browser('chrome')

# Define the URL of the website you want to scrape
url = "https://www.marketindex.com.au/asx-listed-companies"

# Use Splinter to open the URL
browser.visit(url)

# Get the HTML content from the page
html_content = browser.html

# Create a BeautifulSoup object to parse the HTML
soup = BeautifulSoup(html_content, 'html.parser')

# Find the main table element
table = soup.find('table', {'class': 'mi-table mt-6'})

# Check if the table is found
if table:
    # Extract table rows
    rows = table.find_all('tr')
    
    # Initialize empty lists to store data
    rank = []
    code = []
    company = []
    price = []
    day_1 = []
    week_1 = []
    month_1 = []
    year_1 = []
    sector = []
    market_cap = []

    # Loop through rows and extract data
    for row in rows[1:]:  # Start from the second row to skip the header
        columns = row.find_all('td')
        rank.append(columns[0].get_text())
        code.append(columns[2].get_text())
        company.append(columns[3].get_text())
        price.append(columns[4].get_text())
        day_1.append(columns[5].get_text())
        week_1.append(columns[6].get_text())
        month_1.append(columns[7].get_text())
        year_1.append(columns[8].get_text())
        sector.append(columns[9].get_text())
        market_cap.append(columns[10].get_text())

    # Create a DataFrame
    data = {
        'Rank': rank,
        'Code': code,
        'Company': company,
        'Price': price,
        '1 Day': day_1,
        '1 Week': week_1,
        '1 Month': month_1,
        '1 Year': year_1,
        'Sector': sector,
        'Mkt Cap': market_cap
    }

    df = pd.DataFrame(data)

    # Display the DataFrame
    print(df.head(10))
else:
    print("The table with class 'mi-table mt-6' was not found on the page.")


  Rank Code                            Company     Price   1 Day  1 Week  \
0    1  BHP                   BHPBHP Group Ltd    $45.58  +0.44%  +1.45%   
1    2  CBA  CBACommonwealth Bank of Australia    $99.84  +1.43%  +2.93%   
2    3  CSL                         CSLCSL Ltd  $243.445  +2.25%  +3.89%   
3    4  NAB     NABNational Australia Bank Ltd    $28.98  +1.44%  +2.77%   
4    5  ANZ          ANZANZ Group Holdings Ltd   $25.605  +0.93%  +3.29%   
5    6  WBC     WBCWestpac Banking Corporation    $21.47  +1.42%  +4.38%   
6    7  FMG      FMGFortescue Metals Group Ltd    $23.25   0.00%  +4.68%   
7    8  WDS       WDSWoodside Energy Group Ltd    $34.00  +0.44%  -2.58%   
8    9  MQG             MQGMacquarie Group Ltd   $161.95  +0.97%  +0.34%   
9   10  WES                  WESWesfarmers Ltd    $52.24  +1.58%  +3.71%   

   1 Month   1 Year                  Sector   Mkt Cap  
0   +2.20%  +20.01%               Materials  $231.03B  
1   -0.01%   -4.46%              Financials  $167.2

In [3]:
df.head(10)

Unnamed: 0,Rank,Code,Company,Price,1 Day,1 Week,1 Month,1 Year,Sector,Mkt Cap
0,1,BHP,BHPBHP Group Ltd,$45.58,+0.44%,+1.45%,+2.20%,+20.01%,Materials,$231.03B
1,2,CBA,CBACommonwealth Bank of Australia,$99.84,+1.43%,+2.93%,-0.01%,-4.46%,Financials,$167.28B
2,3,CSL,CSLCSL Ltd,$243.445,+2.25%,+3.89%,-1.18%,-12.54%,Health Care,$117.6B
3,4,NAB,NABNational Australia Bank Ltd,$28.98,+1.44%,+2.77%,+0.07%,-9.86%,Financials,$90.68B
4,5,ANZ,ANZANZ Group Holdings Ltd,$25.605,+0.93%,+3.29%,+0.45%,-0.29%,Financials,$76.95B
5,6,WBC,WBCWestpac Banking Corporation,$21.47,+1.42%,+4.38%,+1.61%,-10.21%,Financials,$75.34B
6,7,FMG,FMGFortescue Metals Group Ltd,$23.25,0.00%,+4.68%,+10.40%,+52.16%,Materials,$71.59B
7,8,WDS,WDSWoodside Energy Group Ltd,$34.00,+0.44%,-2.58%,-6.21%,-7.46%,Energy,$64.56B
8,9,MQG,MQGMacquarie Group Ltd,$161.95,+0.97%,+0.34%,-3.52%,-2.70%,Financials,$62.59B
9,10,WES,WESWesfarmers Ltd,$52.24,+1.58%,+3.71%,-0.59%,+16.01%,Consumer Discretionary,$59.24B


In [4]:
print(df.dtypes)
print(df.describe())

Rank       object
Code       object
Company    object
Price      object
1 Day      object
1 Week     object
1 Month    object
1 Year     object
Sector     object
Mkt Cap    object
dtype: object
        Rank  Code           Company   Price  1 Day 1 Week 1 Month 1 Year  \
count   2408  2408              2408    2408   2408   2408    2408   2408   
unique  2408  2408              2408    1039    703   1021    1230   1785   
top        1   BHP  BHPBHP Group Ltd  $0.005  0.00%  0.00%   0.00%  0.00%   
freq       1     1                 1      29    922    458     269     70   

           Sector Mkt Cap  
count        2408    2408  
unique         12    2144  
top     Materials  $1.18B  
freq          818       5  


In [5]:
# Check how many unique items are in the "Sector" column
unique_sectors_count = df['Sector'].nunique()
print(f"Number of unique sectors: {unique_sectors_count}")

# List the unique sectors
unique_sectors = df['Sector'].unique()
print("Unique sectors:")
for sector in unique_sectors:
    print(sector)

Number of unique sectors: 12
Unique sectors:
Materials
Financials
Health Care
Energy
Consumer Discretionary
Communication Services
Consumer Staples
Real Estate
Industrials
Information Technology
Utilities
N/A


In [6]:
# Remove the "$" and "B" from the "Mkt Cap" column
df['Mkt Cap'] = df['Mkt Cap'].str.replace('$', '')

# Rename the "Mkt Cap" column to "Mkt Cap ($)"
df.rename(columns={'Mkt Cap': 'Mkt Cap ($)'}, inplace=True)

In [7]:
df.head()

Unnamed: 0,Rank,Code,Company,Price,1 Day,1 Week,1 Month,1 Year,Sector,Mkt Cap ($)
0,1,BHP,BHPBHP Group Ltd,$45.58,+0.44%,+1.45%,+2.20%,+20.01%,Materials,231.03B
1,2,CBA,CBACommonwealth Bank of Australia,$99.84,+1.43%,+2.93%,-0.01%,-4.46%,Financials,167.28B
2,3,CSL,CSLCSL Ltd,$243.445,+2.25%,+3.89%,-1.18%,-12.54%,Health Care,117.6B
3,4,NAB,NABNational Australia Bank Ltd,$28.98,+1.44%,+2.77%,+0.07%,-9.86%,Financials,90.68B
4,5,ANZ,ANZANZ Group Holdings Ltd,$25.605,+0.93%,+3.29%,+0.45%,-0.29%,Financials,76.95B


In [8]:
#### Export to csv file

In [9]:
# Export the DataFrame to a CSV file
df.to_csv('Resources/ASX_Basic_Info.csv', index=False)