Name: Wing Him (Kinen) Kao
Net ID: wk229

Research Question: To what extent is it possible to accurately forecast windspeeds at different locations of Hong Kong during tropical cyclone impacts through analysis of previous data?

Raw Data Sources:
1. Hong Kong Observatory (HKO) Tropical Cyclone Reports
    - Maximum 1-hour sustained windspeeds
        - At all 32 stations during 76 of the tropical cyclones (TCs) between 2007-2020
    - Timing of highest windspeeds reached
        - At all 32 stations during 76 of the TCs between 2007-2020
    - Wind directions at timing of highest windspeeds reached
        - At all 32 stations during 76 of the TCs between 2007-2020
    - Link: https://www.hko.gov.hk/en/informtc/mangkhut18/tabwind.htm
        - And 76 other links with the same format
2. Hong Kong Observatory Tropical Cyclone Publications
    - Maximum 1-hour sustained windspeeds
        - At all 32 stations during 26 of the TCs between 2001-2006
    - Timing of highest windspeeds at 32 stations during 26
        - At all 32 stations during 26 of the TCs between 2001-2006
    - Wind directions at time of highest windspeeds at 32 stations
        - At all 32 stations during 26 of the TCs between 2001-2006
    - Maximum 10-minute sustained windspeeds
        - At 3 of the stations: King's Park, Hong Kong International Airport, Waglan Island
        - During all 102 TCs between 2001-2020
    - Timing, intensity, distance and bearing from Hong Kong (HK) at closest point of approach (CPA)
        - Of all 102 TCs between 2001-2020
    - Longitude, latitude and intensity at 6-hour time intervals
        - Of all 102 TCs between 2001-2020
    - Link: https://www.hko.gov.hk/en/publica/tc/files/TC2006.pdf
        - And 19 other links with the same format
3. Hong Kong Observatory Tropical Cyclone Warning Signals Database
    - Highest tropical cyclone signal issued by HKO
        - During all 102 TCs between 2001-2020
    - Link: https://www.hko.gov.hk/en/wxinfo/climat/warndb/warndb1.shtml
        - Set start year month as 200101 and end year month as 202010
4. Wikipedia Page of Strong Wind Signal No. 3
    - Maximum 10-minute sustained windspeeds
        - At 8 of the stations: 
            - Cheung Chau, HK Airport, Sai Kung, Kai Tak, Tsing Yi Shell Oil Depot, Lau Fa Shan, Sha Tin, Ta Kwu Ling
        - During 54 of the TCs that caused HKO to issue Strong Wind Signal No.3 or above between 2007-2020
    - Link: https://zh.wikipedia.org/wiki/三號強風信號
5. Joint Typhoon Warning Center (JTWC) Northwest Pacific Ocean Best Track Data
    - Radius of maximum winds (R_max) at 6-hour time intervals
        - Of all 102 TCs between 2001-2020
        - A variable used as a measure of the size of TCs
    - Link: https://www.metoc.navy.mil/jtwc/jtwc.html?western-pacific
        - Containing 19 folders for use
        - 102 separate txt files for use for each TC

Raw Data Sets:
1. 102 csv files of max 1-hour sustained windspeeds, timing and wind directions at 32 stations during all 102 TCs
    - 1 separate csv file for each of the 102 tropical cyclones
    - Either copied table from HKO Tropical Cyclone Reports into csv file
        - Table available in webpage format, instead of excel format
        - Manual work needed to fix formatting and edit out irregularities in the csv file
    - Or manually entered values found from HKO Tropical Cyclone Publications into csv file
        - Unable to copy table from a PDF file
        - Manual work needed to enter values from the PDF file into the csv file
    - File Name: TC Code_TC Name.csv
        - e.g. 1822_Mangkhut, 1713_Hato, 1208_Vicente, etc.
2. 1 csv file of max 10-min sustained windspeeds at 10 stations during 54 TCs
    - Firstly copied table from Wikipedia page of Strong Wind Signal No. 3 containing 8 refernce windspeed stations
        - Cheung Chau, HK Airport, Sai Kung, Kai Tak, Tsing Yi Shell Oil Depot, Lau Fa Shan, Sha Tin, Ta Kwu Ling
        - Only data of 54 of the TCs that caused Strong Wind Signal No.3 or above from 2007 onward available 
    - Then manually entered values of 2 other windspeed stations found from HKO Tropical Cyclone Publications
        - King's Park, Waglan Island
        - Hong Kong International Airport values already entered from the Wikipedia page
        - Unable to copy table from a PDF file
        - Manual work needed to enter values from the PDF file into the csv file
    - File Name: Ten_Min_Windspeed_Data.csv
3. 1 csv file of highest TC signal issued, timing, intensity, radius of max winds, distance and bearing at CPA
    - Highest TC signal issued by HKO manually entered from HKO Tropical Cyclone Warning Signals Database
        - Formatting within that webpage unsuitable for copying entire table
        - Manual work needed to enter values from the table into the csv file
    - Timing, intensity, distance and bearing from Hong Kong (HK) at closest point of approach (CPA)
        - Unable to copy table from a PDF file
        - Manual work needed to enter values from the PDF file into the csv file
    - Timing, intensity, distance and bearing from Hong Kong (HK) at point of highest windspeeds in Hong Kong
        - Replace data at CPA with data at point of highest winds if its timing significantly differs from CPA timing
            - Significantly differs defined as more than 6 hours apart normally
            - Or more than 3 hours apart if CPA distance is less than 100km from Hong Kong
            - Eliminate scenario of CPA data not representative of conditions during TC's highest impact to Hong Kong
            - Timing of highest windspeeds obtained by averaging all timings of highest windspeeds at each station
            - Use another Python program to calculate distance and bearing from HK using longitude and latitude
        - Unable to copy table from a PDF file
        - Manual work needed to enter values from the PDF file into the csv file
    - Radius of maximum winds at closest point of approach or point of highest windspeeds in Hong Kong
        - Manual work needed to enter values from the 102 separate txt files into the csv file

Data Cleaning Process Part 1:
The first part of the data cleaning process is organizing the 1-hour sustained windspeeds data from 102 csv files into one single file containing all the 1-hour sustained windspeeds, and one single file containing all the wind directions. This part of the data cleaning process is done in the Jupyter file named One_Hour_Windspeed_Data.ipynb.

Firstly, I open the csv file using pandas.read.csv, and transfer the data inside the dataframe, including windspeed station names, 1-hour sustained windspeeds, wind directions and timing of highest windspeed into lists. Then, by matching these windspeed station names with a predefined list of available windspeed stations, I append each windspeed data into a 2D list that contains 32 sub-lists of windspeeds of each windspeed station, each sub-list containing windspeed data of each tropical cyclone at a specific station. For windspeed stations that have missing data, I append numpy.nan into the list. After that, I find the average of all the timings of highest windspeeds at each station, in order to check whether this timing is close enough to the timing of the closest point of approach. If not, then I replace the intensity, distance and bearing from Hong Kong at CPA with those variables at the timing of highest windspeeds, finding out the distance and bearing from HK using another Python program called Distance_Bearing_Calculator.ipynb.

Then, I repeat this process for all 102 csv files containing data for all 102 tropical cyclones, by copying and pasting the same three cells of code, replacing the names of input files and lists to the next tropical cyclone.

After finishing this process for all 102 csv files, I output all these 1-hour sustained windspeeds data into a new csv file named One_Hour_Windspeed_Data_Output.csv, with the rows being each of the 102 tropical cyclones and each column being each of the 32 windspeed stations. I also count the number of times each wind direction occur for each tropical cyclone, then output the wind directions count data into another new csv file named Wind_Direction_Data_Output.csv, with the rows again being each of the 102 tropical cyclones and each column being each of the 16 wind directions.

Data Cleaning Process Part 2:
The second part of the data cleaning process is converting the 1-hour sustained windspeeds data into 10-minute sustained windspeeds data. Even though most of the available data are expressed in 1-hour sustained windspeeds, which means finding the average windspeed over a 1 hour time period, but Hong Kong Observatory actually use 10-minute sustained windspeeds, which means finding the average windspeed over a 10 minute time period, for issuing tropical cyclone signal purposes. Therefore, I have to convert these 1-hour sustained windspeeds data into 10-minute sustained windspeeds data, outputting the 10-minute sustained windspeeds data into a new csv file. This part of the data cleaning process is done in the Jupyter file named Ten_Min_Windspeed_Data.ipynb.



In [None]:
import numpy
import random
from matplotlib import pyplot
from matplotlib import colors
import matplotlib.dates as mdates
import matplotlib.ticker as mticker
import datetime
import cartopy.crs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
from PIL import Image, ImageDraw
import pandas