DSW-Danthon-2021

Make sure to download the dataset here before running my code. The dataset is too big to be uploaded here.

This model is able to predict the jammed correctly with f1 score of 65,21 % and ranked top 32% of the total contestants.

This dataset consist of 4 files.

Train (70k++ rows)
Irregularities (350k++rows) --> after deduplication become (190k++ rows)
Alerts (7m++rows) --> after deduplication become (80k++rows)
Test

First Step :

Irregularities consist of information such as speed gap, time gap between jammed and normal condition, jam level, etc

Join train and Irregularities on s2 location
See the correlation between features and the labels
List some importance features and delete unimportant features

Second Step :

Alerts consist of information such as weather, accident, street type, etc

Join train and Alerts on s2 location
See the correlation between features and the labels
List some importance features and delete unimportant features

Third Step :

Join Alerts and Irregularities based on 'supergabungan' which is combination of s2location, time, and hour
See that we know important features in each table, we can make new features and delete unimportant features New features generated based on my observations are :

Road Type (Whether the road is main street or not)
Condition Type (Accident, Bad Weather, etc)
Relialibility
Rating Rate
Jam Trend
Jam Level
Alerts Count

Let's say the new dataset from alerts and irregularities calles 'combination'

We also construct new features such as isweekend and isbusyhour from the date and time

Fourth Step :

Join train and combination on day_hour by taking the day and hour from s2idtoken

Cleaning the Data :

There are several way to clean the data. Here's some method that I've tried :

fillna with bfil, ffil, pad, backfill
Impute with mode
Try to fill the nan value by search the similarity each location (very tryhard) But after all, the best score by training is using fillna ffil.

We are using 5 models to compare, which are :

Random Forest Classifier
XG Boost Classifier
Logistic Regression
Decison Tree Classifier
Naive Bayes

F1 Score Before Tuning :

Hence, we tried to do hyperparameter tuning on Random Forest, Decision Tree, and Naive Bayes

F1 Score after tuning result :

Random Forest Tuned : 0.757 (increased 0.0002 XD)
Decision Tree : 0.756 (increased 0.001)
Naive Bayes : 0.797 (increased 0.01)

Fifth Step :

Join test and combination on day_hour
Find the prediction by fit the model
Make sure the format is same with sample_submission
Post in Kaggle and try your luck !

For you who wants to try this problem, you can get the dataset here : Kaggle

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Random_Forest_Johan.ipynb		Random_Forest_Johan.ipynb
Result1.PNG		Result1.PNG
submission_jo3.csv		submission_jo3.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Random_Forest_Johan.ipynb

Random_Forest_Johan.ipynb

Result1.PNG

Result1.PNG

submission_jo3.csv

submission_jo3.csv

Repository files navigation

DSW-Danthon-2021

Make sure to download the dataset here before running my code. The dataset is too big to be uploaded here.

This dataset consist of 4 files.

First Step :

Second Step :

Third Step :

Fourth Step :

Cleaning the Data :

Hence, we tried to do hyperparameter tuning on Random Forest, Decision Tree, and Naive Bayes

Fifth Step :

Thankyou for reading !

Happy Coding

About

Releases

Packages

Languages

Johanklemantan/DSW-Danthon-2021

Folders and files

Latest commit

History

Repository files navigation

DSW-Danthon-2021

Make sure to download the dataset here before running my code. The dataset is too big to be uploaded here.

This dataset consist of 4 files.

First Step :

Second Step :

Third Step :

Fourth Step :

Cleaning the Data :

Hence, we tried to do hyperparameter tuning on Random Forest, Decision Tree, and Naive Bayes

Fifth Step :

Thankyou for reading !

Happy Coding

About

Resources

Stars

Watchers

Forks

Languages