<a href="https://www.kaggle.com/code/chriszhengao/data-processing-for-tropical-cyclone-size-dataset?scriptVersionId=146561176" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## *Downloading Latest Version of Data from Tropical Cyclone Data Centre, China Meteorological Administration Website*

In [1]:
#imports
import requests
import os
import pandas as pd

Download the CSV file directly from the official website.

In [2]:
file_url = 'https://tcdata.typhoon.org.cn/data/CMASATdata/1980_2016_retrieved_TCsize_2Pub_v2.csv'

response = requests.get(file_url)

save_folder = "/kaggle/working/"

if response.status_code == 200:
   
    file_path = os.path.join(save_folder, "/kaggle/working/1980_2016_retrieved_TCsize_2Pub_v2.csv") 

    
    with open(file_path, "wb") as file:
        file.write(response.content)

    print(f"Downloaded in path: {file_path}")

    
    df = pd.read_csv(file_path)

    
    df.head()
else:
    print("Cannot Download")

Downloaded in path: /kaggle/working/1980_2016_retrieved_TCsize_2Pub_v2.csv


According to the explaination from the website, organize the CSV file

| Field   | Description                                      |
|---------|--------------------------------------------------|
| YYYY    | Year                                             |
| NN      | Tropical cyclone number, including tropical depressions |
| MMDDHH  | 2-digit month, 2-digit day, 2-digit hour (UTC)   |
| LAT     | Latitude of the tropical cyclone center, IBTrACS v03r02 |
| LONG    | Longitude of the tropical cyclone center, IBTrACS v03r02 |
| PRS     | Minimum central pressure of the tropical cyclone, IBTrACS v03r02 |
| WND     | Maximum sustained wind speed near the tropical cyclone center, obtained from IBTrACS v03r02 |
| SiR34   | Scale of the tropical cyclone (km, based on the 34-knot wind radius) |
| SATSer  | Satellite used for inversion, including GOES-1 to 13, Meteosat-2 to 9, GMS-1 to 5, MTSAT-1R, MTS-2, and FY2-C/E |


In [3]:
df.columns = ['Time', 'Latitude', 'Longitude', 'Pressure', 'Wind Speed', 'SiR34', 'SATSer']

Since the Time column concludes 2-digit of serial number, we are going to extract them and add on the last 2 digit of the year to make it as Cyclone number column.

In [4]:
def process_time(time_value):
    time_str = str(time_value)
    if len(time_str) >= 6:
        third_fourth_digits = time_str[2:4]
        fifth_sixth_digits = time_str[4:6]
        new_serial_number = third_fourth_digits + fifth_sixth_digits
        return new_serial_number
    else:
        return None

df['Cyclone Number'] = df['Time'].apply(process_time)

Delete the serial number.

In [5]:
def detele_serial_num(time_value):
    time_str = str(time_value)
    if len(time_str) >= 6:
        new_time_str = time_str[:4] + time_str[6:]
        return new_time_str
    else:
        return None
df['Time'] = df['Time'].apply(detele_serial_num)

Make it in Time format.

In [6]:
def process_time(time_value):
    time_str = str(time_value)
    if len(time_str) >= 10:
        formatted_time_str = f"{time_str[:4]}-{time_str[4:6]}-{time_str[6:8]} {time_str[8:10]}:00:00"
        return pd.to_datetime(formatted_time_str, format='%Y-%m-%d %H:%M:%S')
    else:
        return None

df['Time'] = df['Time'].apply(process_time)

Organize the order of the columns.

In [7]:
columns = df.columns.tolist()


columns.insert(0, 'Cyclone Number')
columns = columns[:-1]

df = df[columns]

df.sample(5)

Unnamed: 0,Cyclone Number,Time,Latitude,Longitude,Pressure,Wind Speed,SiR34,SATSer
8397,9906,1999-06-02 18:00:00,14.18,129.5,993,23.5,140.0,GMS-5
14930,1621,2016-09-27 12:00:00,24.05,120.45,960,39.3,217.3,MET-7
4545,9034,1990-11-08 18:00:00,8.65,140.8,980,26.0,175.6,GMS-4
14166,1327,2013-10-25 18:00:00,30.9,137.3,982,24.5,156.3,MTS-1
6953,9515,1995-09-06 00:00:00,17.53,114.15,995,18.0,233.9,GMS-5


In [8]:
print(df['Cyclone Number'].dtype)

object


In [9]:
df.to_csv('tropical_cyclone_size.csv', index=False)