This model utilizes a Deep Neural Network to predict flood that could happen in Samarinda City. It includes 9 features to ensure accurate analysis and provides output in the form of probabilities across three labels: Safe, Alert, and Danger, helping the people in Samarinda City understand the level of flood risk effectively.
- What are the main weather factors influencing “danger” conditions in Samarinda, such as rainfall, humidity, air pressure, or wind speed?
- What type of weather conditions (condition_type) most often occur before, during, and after a flood, and how does the combination of humidity and air pressure affect visibility in each of these conditions?
- How can data-driven insights improve community awareness and preparedness for flood risks in Samarinda?
| ID | City | Temperature (°C) | Humidity (%) | Pressure (hPa) | Wind Speed (m/s) | Wind Direction (°) | Rain | Snow | Cloudiness (%) | Visibility (m) | Description | Condition Type | Timestamp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Samarinda | 24.38 | 93 | 1010 | 1.54 | 200 | 0 | 0 | 72 | 10000 | broken clouds | Clouds | 2024-10-25 13:09:08 |
| 4 | Samarinda | 24.41 | 93 | 1011 | 1.32 | 201 | 0 | 0 | 55 | 10000 | broken clouds | Clouds | 2024-10-25 14:09:08 |
| 5 | Samarinda | 24.34 | 92 | 1010 | 1.47 | 207 | 0 | 0 | 60 | 10000 | broken clouds | Clouds | 2024-10-25 15:09:08 |
For the full dataset, click here.
- Separating Data
in this step we separate the data based on theraincolumn whose value is greater than 0. - Labelling
after that we labelled the data with some considerations. - Oversampling
Next, since the data is skewed, we apply oversampling using theSMOTEtechnique.
from the correlation matrix above, there are several pairs of columns with values close to 1 (related):
- Wind Speed & Temperature
- Rain & Condition Type
- Rain & Description
- Cloudiness & Description
- Description & Condition Type
The 'aman' category is the most dominant with a percentage of 87.6%, followed by the 'waspada' category with a value of 7.9%, and the 'bahaya' category which has the smallest percentage of 4.5%.
In this case, we use a DNN for our machine learning model, with 9 feature inputs, 1 normalization layer, 4 hidden layers, and 1 softmax output.
A significant improvement is observed around epoch 44, with the training accuracy reaching 95% and validation accuracy at 94%
After completing the training, we made predictions using the existing model that had been exported with the most recent training data, and we obtained these results, with a total of 31 data points being mispredicted.





