WiDS-2023-NEUDolphins: Top 4.4% Global, Top 5 Vietnam

WiDS 2023 Competition was organized by the WiDS Worldwide team at Stanford University, Harvard University IACS, Arthur, and the WiDS Datathon Committee

Extreme weather events are sweeping the globe and range from heat waves, wildfires and drought to hurricanes, extreme rainfall and flooding. These weather events have multiple impacts on agriculture, energy, transportation, as well as low resource communities and disaster planning in countries across the globe.

Accurate long-term forecasts of temperature and precipitation are crucial to help people prepare and adapt to these extreme weather events. Currently, purely physics-based models dominate short-term weather forecasting. But these models have a limited forecast horizon. The availability of meteorological data offers an opportunity for data scientists to improve sub-seasonal forecasts by blending physics-based forecasts with machine learning. Sub-seasonal forecasts for weather and climate conditions (lead-times ranging from 15 to more than 45 days) would help communities and industries adapt to the challenges brought on by climate change.

This repository is included models, code and dataset we use in this competition. We reached the position 31/697 teams, at the top 4.4% best results over the globe.

Dataset

We are using a pre-prepared dataset consisting of weather and climate information for a number of US locations, for a number of start dates for the two-week observation, as well as the forecasted temperature and precipitation from a number of weather forecast models. Each row in the data corresponds to a single location and a single start date for the two-week period. Our task is to predict the arithmetic mean of the maximum and minimum temperature over the next 14 days, for each location and start date.

There are provided with two datasets:

train_data.csv: the training dataset, where contest-tmp2m-14d__tmp2m, the arithmetic mean of the max and min observed temperature over the next 14 days for each location and start date, is provided
test_data.csv: the test dataset, where we withhold the true value of contest-tmp2m-14d__tmp2m for each row.

The detailed information of variables can be found at: https://github.com/maixbach/WiDS-2023-NEUDolphins/tree/main/datasets

You can download the dataset at: https://www.kaggle.com/competitions/widsdatathon2023/data

Data Inspection & EDA & Feature Engineering & Modeling

The detailed processes can be found in our notebook.

Data inspection: https://github.com/maixbach/WiDS-2023-NEUDolphins/tree/main/data%20inspection
Detailed EDA & Feature Engineering: https://github.com/maixbach/WiDS-2023-NEUDolphins/blob/main/notebooks/WiD2023_Datathon_TabNet.ipynb
Modeling: Can be found in each notebook for each model we utilized.

Experimental result

Model	RMSE
Random Forest	2.304
XGBoost	1.32
TabNet	0.984
CatBoost	0.98
LightGBM	0.907
Ensemble model	0.739

Achievement

With 85 submissions only, our team of 4 reached position 31/697 teams from all over the world, became the top 5 team in Vietnam leaderboard.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data inspection		data inspection
datasets		datasets
models		models
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WiDS-2023-NEUDolphins: Top 4.4% Global, Top 5 Vietnam

Dataset

Data Inspection & EDA & Feature Engineering & Modeling

Experimental result

Achievement

About

Releases

Packages

Languages

dchatca/WiDS-2023-NEUDolphins

Folders and files

Latest commit

History

Repository files navigation

WiDS-2023-NEUDolphins: Top 4.4% Global, Top 5 Vietnam

Dataset

Data Inspection & EDA & Feature Engineering & Modeling

Experimental result

Achievement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages