<a href="https://colab.research.google.com/github/JerryChenz/Screener_Proc_v1/blob/master/clean_tickers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
Process an Excel file containing HK stock symbols to extract unique tickers.

This function reads an Excel file, filters the data based on specific criteria,
and extracts unique tickers into a JSON file. The JSON file can then be used
by another program as a list of tickers.

Steps:
1. Filter the data by "Category", selecting only rows where the value is 'Equity'.
2. Filter the data by "Trading Currency", selecting only rows where the value is 'HKD'.
3. Extract unique tickers from the filtered data.
4. Save the unique tickers to a JSON file.

Args:
    input_file (str): Path to the input Excel file.
    output_file (str): Path to the output JSON file.

Returns:
    None

Raises:
    FileNotFoundError: If the input file does not exist.
    ValueError: If the input file is not a valid Excel file or if the required columns are missing.

Note:
    This function assumes that the input Excel file contains columns named 'Category' and 'Trading Currency'.
    The function also assumes that the ticker symbols are stored in a column named 'Ticker'.
"""

import pandas as pd
import json

input_file_url = 'https://github.com/JerryChenz/Screener_Proc_v1/blob/master/data/ticker_library/source/HKEX_ListOfSecurities.xlsx'

In [None]:
def process_hk_stock_symbols(input_file_url, output_file):
    """
    Process an Excel file containing HK stock symbols to extract unique tickers.

    This function reads an Excel file from a URL or local path, filters the data based on specific criteria,
    and extracts unique tickers into a JSON file. The JSON file can then be used by another program as a list of tickers.

    Args:
        input_file_url (str): URL or path to the input Excel file.
        output_file (str): Path to the output JSON file.

    Returns:
        None

    Raises:
        Exception: If there's an issue reading the Excel file or processing the data.
    """

    try:
        # Read the Excel file from the URL or local path
        df = pd.read_excel(input_file_url, engine='openpyxl', header=2)

        # Filter by "Category" - select only 'Equity'
        equity_df = df[df['Category'] == 'Equity']

        # Filter by "Trading Currency" - select only 'HKD'
        hkd_df = equity_df[equity_df['Trading Currency'] == 'HKD']

        # Extract unique tickers
        unique_tickers = [f"{str(int(ticker)).zfill(4)}.HK"
                          for ticker in hkd_df['Stock Code'].unique()]

        # Save the unique tickers to a JSON file
        with open(output_file, 'w') as json_file:
            json.dump(unique_tickers, json_file, indent=4)

        print(f"Successfully saved {len(unique_tickers)} unique tickers to {output_file}")

    except Exception as e:
        print(f"An error occurred: {str(e)}")

In [None]:
hk_tickers_source = 'https://github.com/JerryChenz/Screener_Proc_v1/blob/master/data/ticker_library/HKEX_ListOfSecurities.xlsx?raw=true'
output_json = 'hk_unique_tickers.json'
process_hk_stock_symbols(hk_tickers_source, output_json)

Successfully saved 2646 unique tickers to hk_unique_tickers.json
