### Financial Data Importation and Processing
- This notebook is the precursor to the 'Stock_Clustering_Algo' notebook. This notebook is provided to present the method that I used for web-scraping important financial information for different companies from Yahoo Finance.

Note: This web-scraping script may take some time to gather and process the data that we are interested in. If you want to run the 'Stock_Clustering_Algo' notebook in this repository, you can simply download the 'yf_nzx50.csv' file in the same repository. Note that the csv file uploaded contains financial information extracted at a previous date, whereas this script will extract financial information from the date that it is run. This is an important point as the results will differ depending on the time in which the data is captured.

In [1]:
# Import necessary libraries
import pandas as pd
import yfinance as yf

In [2]:
# Reading html table from Wikipedia with NZX50 stock information
data = pd.read_html('https://en.wikipedia.org/wiki/NZX_50_Index')[1]

# Print first few rows of web-scraped html table
data.head()

Unnamed: 0,Ticker symbol,Company,Sector
0,AIA.NZ,Auckland International Airport Limited,Airport Services
1,AIR.NZ,Air New Zealand Limited,Airlines
2,ANZ.NZ,Australia and New Zealand Banking Group Limited,Diversified Banks
3,ARG.NZ,Argosy Property Limited,Diversified REITs
4,ARV.NZ,Arvida Group Limited,Health Care Facilities


In [3]:
# Converting tickers in table to a list, in preparation for downloading data from yahoo finance
tickers = data['Ticker symbol'].to_list()

# Removing FSF and MET tickers because yahoo finance does not provide financials for these stocks
tickers.remove('FSF.NZ')
tickers.remove('MET.NZ')

In [4]:
# Testing our data extraction method for our first ticker AIA (Auckland International Airport)
AIA = yf.Ticker('AIA.NZ')

# Printing all of the scraped data for this ticker
AIA.info

{'zip': '2022',
 'sector': 'Industrials',
 'fullTimeEmployees': 527,
 'longBusinessSummary': 'Auckland International Airport Limited provides airport facilities, supporting infrastructure, and aeronautical services in Auckland, New Zealand. The company operates through three segments: Aeronautical, Retail, and Property. The Aeronautical segment offers services that facilitate the movement of aircraft, passengers, and cargo, as well as utility services, which support the airport; and leases space for facilities, such as terminals. The Retail segment provides services to the retailers within the terminals; and car parking facilities for passengers, visitors, and airport staff. The Property segment leases cargo buildings, hangars, and stand-alone investment properties. The company was founded in 1988 and is based in Manukau, New Zealand.',
 'city': 'Manukau',
 'phone': '64 9 275 0789',
 'country': 'New Zealand',
 'companyOfficers': [],
 'website': 'http://www.aucklandairport.co.nz',
 'max

In [5]:
# Creating a final dataframe that will hold all of the information of each ticker in our list
finaldf = pd.Series(AIA.info, name = 'AIA').to_frame().T
finaldf

Unnamed: 0,zip,sector,fullTimeEmployees,longBusinessSummary,city,phone,country,companyOfficers,website,maxAge,...,dateShortInterest,pegRatio,lastCapGain,shortPercentOfFloat,sharesShortPriorMonth,impliedSharesOutstanding,category,fiveYearAverageReturn,regularMarketPrice,logo_url
AIA,2022,Industrials,527,Auckland International Airport Limited provide...,Manukau,64 9 275 0789,New Zealand,[],http://www.aucklandairport.co.nz,1,...,,,,,,,,,7.05,https://logo.clearbit.com/aucklandairport.co.nz


In [6]:
# Loop for importing financial information and metrics of all tickers
for i in tickers:
    try:
        finaldf.loc["{}".format(i)] = pd.Series(yf.Ticker(i).info)  
    except:
        print("Error for the following symbol : {} !".format(i))

In [7]:
# Uncomment the following code to save csv file
# finaldf.to_csv('yf_nzx50.csv')