Skip to content

An end-to-end data science project using historical weather data from Singapore

Notifications You must be signed in to change notification settings

chuachinhon/weather_singapore_cch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An end-to-end data science project using historical weather data from Singapore

By Chua Chin Hon

Github Repo Title: weather_singapore_cch

SUMMARY

For data science students in Singapore, it is hard to find detailed, yet publicly available local datasets for lessons or personal projects. I came across a multi-decade collection of weather data on the Singapore Met Service's website by chance, and decided to assemble it for future use, or in case the data is taken offline.

I'm also using the dataset for a series of self-assigned data science projects, starting with visualisation. I will include time series and machine learning forecasts in future updates to this project.

TABLE OF CONTENT

There are 5 sections so far. The CSV files containing the daily and monthly weather data are in the raw folder. Those who want to assemble their own datasets should head there first.

I. DATA COLLECTION-PREPROCESSING

What you'll find in the raw folder:

  • 444 CSV files containing daily weather data for Singapore from 1983 - 2019 (Dec)

  • A "monthly_data" sub-folder containing monthly average data for rainfall, maximum and mean temperatures.

What you'll find in the data folder:

  • 4 CSV files processed in the notebook 1.0_data_cleaning_cch

  • 2 CSV files related to outlier detection, as processed in the notebook 3.0_outlier_detection_cch.ipynb

  • 1 CSV file related to the Q3 2019 scorcher in Singapore

  • 1 CSV file related to the notebooks for machine learning and deep learning, as processed in notebook5.0 and 1 validation dataset.

II. EDA & DATA VISUALISATION

The lack of seasonal variations lull many into thinking that Singapore's weather is predictable and unchanging. Nothing is further from the truth, with climate change making the city state's weather even more unpredictable.

In notebook 2.0_visualisation_cch, I'll attempt to illustrate the changing weather patterns in Singapore using classic as well as new visualisation libraries/techniques like Plotly Express.

Medium post: Visualising Singapore’s Changing Weather Patterns: 1983–2019

III. OUTLIER DETECTION

Data visualisation provide an easy way to spot outliers. But when you have 36 years of weather data, it won't be enough or efficient to rely solely on charts to accurately pick out the outliers.

In the third section of this project, I'll use Scikit-learn's Isolation Forest model as well as the PyOD library (Python Outlier Detection) to try to pinpoint anomalies in the dataset. This is also important pre-work for Part IV of the project - time series forecasting, where removal of the outliers would be key to more accurate predictions.

Medium post: Detecting Abnormal Weather Patterns With Data Science Tools

IV. Scorcher: Q3 2019 temperature records

This fourth notebook is a short follow-up of sorts to Part II, looking at how temperatures during the three months between July and September 2019 were among the warmest Singapore had experienced over the last 36 years, as global temperature records tumbled around the world.

Medium post: SCORCHER: As Global Records Tumbled, S’pore Baked Under One Of The Warmest Q3 Ever

V. Weather Predictions: ‘Classic’ Machine Learning Models Vs Keras

You are ready to dip your toes into deep learning but not sure where to start. One way is to build on what you've been doing in Scikit-learn, and apply useful features like pipelines and grid search via the Keras wrappers.

This fifth series of notebooks starts with a simple example on pipeline construction and grid search for a binary classification problem, using the Logistic Regression and XGBoost Classifier.

In notebook 5.2, I tackled the same problem using the Keras Classifier, which introduces the concept of defining and building a Keras sequential model.

In notebook 5.3, I experimented with the relatively new Keras Tuner as an alternative to the Scikit-learn/grid search approach.

Data preparation for this section of the project are in notebook 5.1. The validation dataset is here.

Medium Post: https://bit.ly/2QJdrpD

CONTACT

Twitter: @chinhon

About

An end-to-end data science project using historical weather data from Singapore

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published