GitHub - Satoru-Shibata-JPN/Kaggle: 16 optimizing insights on ensemble learning with Python.

Awarded 16 analytical reports with Python at 16GB Kernel of Kaggle.

About the Author

Name: Satoru Shibata / 柴田怜
Job: Sr. Data Scientist
Titles:
- 3x Kaggle Expert
  - Retired from Kaggle at 2021 to focus on business as a data scientist.
- 4x Certified Professional
  - IBM Data Science Professional Certificate at 2022.
  - IBM Applied Data Science Specialization Certificate at 2023.
  - SAS Statistical Business Analyst Professional Certificate at 2024.
  - IBM Generative AI for Data Scientists Specialization Certificate at 2024.

Score Table

Departments	Top Levels	Highest Rank	Awarded Medals
Code	0.2%	317/161,898	3 Silver 13 Bronze
Discussion	0.3%	588/188,433	100 Bronze
Datasets	1%	354/34,643	3 Bronze
Competitions	20-30%

Abstracts

3 Silver Medals

Optimized LightGBM with Optuna adding SAKT Model
- Lead sentences
  - Submitted code using two Ensemble Learning Methods.
  - Used 100 million rows of training data for prediction on a 16GB Kernel removing unnecessary objects.
- Issue
  - Algorithms for TOEIC Learning Applications
- Significance
  - Predict percentage of correct answers based on user's behavioral history.
  - User's percentage of correct answers will increase with the number of problems solved.
- Purpose
  - Optimize Binary Classification for AUC.
- Methodology:
  - Ensemble Learning of LightGBM and SAKT.
  - Hyperparameter Optimization with Optuna.
- Results
  - Score: AUC = 0.781.
  - Code: 31 Points.
- Considerations
  - Obsessed with Models, Feature Engineering Remains a Challenge.
  - Systematizing Multiple Models will also be a challenge in the future.
- Conclusion
  - Code Silver Medal
LightGBM Classifier and Logistic Regression Report
- Lead sentences
  - Optimized Classification of Anonymized Raw Data from Stock Market on 16GB Kernel.
  - Contributed code that systematizes Ensemble Learning and Logistic Regression.
- Issue
  - Utility Function Optimization of Supply and Demand Forecasting in Securities Markets.
- Significance
  - Calculate based on indicators of absence or degree of stock returns.
  - Optimize the behavior of whether to trade or not.
- Purpose
  - AI Dev for Profit Maximization.
- Methodology
  - Optimal classification of LightGBM.
  - Logit Transformation of Purpose Variables Based on Probability Distributions.
- Results
  - Score: 3741.118 (Outside of Medal Zone).
  - Code: 33 Points.
- Considerations
  - Utility function was not fully deciphered
  - Which left some issues for paper surveys.
  - The report was appreciated by other Kagglers.
- Conclusion
  - Code Silver Medal
Optimize LightGBM HyperParameter with Optuna and GPU
- Lead sentences
  - Unprecedented LightGBM Hyperparameter Optimization on GPU.
  - Procedure was annexed and highly evaluated.
- Issues
  - Preliminaries of “LightGBM Classifier and Logistic Regression Report“ .
- Significance
  - High Parameter Optimization.
  - There were few precedents for LightGBM.
- Purpose
  - Code submission for optimizing LightGBM Hyperparameter on GPU.
- Methodology
  - A survey of prior case studies using Optuna for LightGBM.
  - Procedure of ssubmissions.
- Results
  - Run: 953.9s on GPU
  - Code: 31 Points.
- Consideration
  - Available hyperparameter optimization of futures.
- Conclusion
  - Code Silver Medal.

13 Bronze Medals

Optimized Logit LightGBM Classifier and CNN Models
- Lead sentences
  - Submitted a simulation of Multiple Model Systematization.
  - Based on this failure, I was able to concentrate on LightGBM Optimization and Inference.
- Issue
  - Exploring Optimization Models
- Significance
  - Simulation iterations of Optimization Model.
- Purpose
  - Optimize Utility Function by systematizing Multiple Models.
- Methodology
  - Applying the Logit Transform to LightGBM.
  - Explore combining with CNN.
- Results
  - Score: 3344.738 (Outside of Medal Zone).
  - Code: 15 Points.
- Considerations
  - This code does LightGBM and CNN at the same time, which was prone to overflow.
  - From now on, I will focus on one Model Optimization.
- Conclusion
  - Code Bronze Medal
Optimized LightGBM with Optuna
- Lead sentences
  - Dev Baseline Model for Code Competition to process 100 million rows of training data.
  - The minimum performance was predicted to be 16GB.
- Issue
  - 100 million rows of training data must be predicted on a 16GB Kernel.
- Significance
  - This is the cornerstone of the final submission model.
  - Preprocess and Feature Engineering were adjusted for further optimization.
- Purpose
  - Baseline Model Dev
- Methodology
  - Binary Classification by LightGBM Optimization.
- Results
  - Score: AUC = 0.774.
  - Code: 12 Points.
- Considerations
  - Policy of additional development to Baseline Model.
  - The improvement of AUC by the additional development was only 0.07, which left some issues.
- Conclusion
  - Code Bronze Medal
LightGBM on GPU with Feature Engineering, Optuna, and Visualization
- Lead sentence
  - Code Bronze Medal for first attempt at submitting code.
- Issue
  - This was my first real effort at Kaggle.
- Significance
  - Visualize in a timely manner, and features were studied.
  - Optuna was also used for the first time and applied later.
- Purpose
  - Work on Feature Engineering.
- Methodology
  - I read and referred to posted code by Kaggle Grandmaster.
- Results
  - Code: 11 Points.
- Consideration
  - I could gain experiences in implementing LightGBM with Optuna on GPU.
- Conclusion
  - Code Bronze Medal.
LightGBM with the Inference and Empirical Analysis
- Lead sentences
  - In the first scored submission code, AUC = 0.76.
  - The challenges were used as the cornerstone of development experiences.
- Issue
  - Scoring by developing additions to the submitted code for my first challenge.
- Significance
  - A single process was limited to Model Object Generation.
- Purpose
  - To further improve the performance of Prediction Model.
- Methodology
  - Inference was added to improve Score.
  - Empirical Analysis between raw data and predicted results.
  - Detected significant differences in Gaussian Distribution.
- Results
  - Score: AUC = 0.76.
  - Code: 12 Points.
- Consideration
  - This submitted code left insufficient understanding of inference as an issue.
- Conclusion
  - Code Bronze Medal.
Submission and the Inference of LightGBM
- Lead sentences
  - My first scoring submission code prototype
  - Few examples of Empirical Analysis, I won Code Bronze Medal.
- Issue
  - Prototype version of submission code for first scoring.
- Significance
  - Implementing the scoring submission code.
- Purpose
  - Gaining development experiences.
- Methodology
  - Model objects were coded for scoring.
  - Empirical Analysis detected a significant difference in Gaussian Distribution.
- Result
  - Code: 7 Points.
- Considerations
  - Actual scoring submission code became a separate file.
  - This was an opportunity for me to feel the challenge of coding.
  - Focused on its afterwards.
- Conclusion
  - Code Bronze Medal.
Market Prediction XGBoost with GPU Modified
- Lead sentences
  - Performance comparison with LightGBM by XGBoost Optimization.
  - LightGBM takes the cake.
- Issue
  - I seen good results with XGBoost sometimes.
- Significance
  - Simulate on Models other than LightGBM and search for Optimized Model.
- Purpose
  - Score improvement by XGBoost.
- Methodology
  - GPU Implementation into XGBoost Optimization.
- Results
  - Score: 3308.824 (Outside of Medal Zone).
  - Code: 8 Points.
- Considerations
  - XGBoost is easy to implement due to its many precedents.
  - LightGBM is superior in performance comparison, which led me to focus on LightGBM.
- Conclusion
  - Code Bronze Medal.
Cassava Leaf Disease Best Keras CNN Tuning
- Lead sentences
  - I also participated in a competition on image analysis, challenging myself with raw data of various properties.
  - I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.
- Issue
  - I would like to try my hand at image analysis and find out what I am good at.
- Significance
  - I want to gain experience in Keras implementation.
  - Deepen my understanding CNN.
- Purpose
  - I learn to understand and implement acoustic analysis and image analysis.
- Methodology
  - I complemented the advanced submission code.
- Results
  - Score: Accuracy = 0.885.
  - Code: 18 Points.
- Considerations
  - Theoretical aspects of acoustic analysis and image analysis remained a challenge.
  - An opportunity to raise awareness to need to start with a survey of theoretical papers.
- Conclusion
  - Code Bronze Medal
RFCX Residual Network with TPU Customized
- Lead sentences
  - I also participated in a competition for acoustic analysis, and tried my hand at raw data of various properties.
  - I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.
- Issue
  - I would like to try my hand at acoustic analysis and find out what I am good at.
- Significance
  - I want to gain experience in Keras implementation.
  - Deepen my understanding CNN.
- Purpose
  - I learn to understand and implement acoustic analysis and image analysis.
- Methodology
  - I complemented the advanced submission code.
- Results
  - Score: 0.772.
  - Code: 12 Points.
- Considerations
  - Theoretical aspects of acoustic analysis and image analysis remained a challenge.
  - An opportunity to raise awareness to need to start with a survey of theoretical papers.
- Conclusion
  - Code Bronze Medal.
Research with Customized Sharp Weighted
- Lead sentences
  - Work on Custom Metrics Clarification and systematization of Hyperparameters Optimization in LightGBM.
  - An each milestone optimization object generation is still important.
- Issue
  - Private Custom Metrics were used as an Evaluation Function.
- Significance
  - Improve prediction accuracy by elucidating Private Custom Metrics.
  - Reproducibility will be determined based on the Evaluation Function.
- Purpose
  - Custom Metrics Clarification.
- Methodology
  - LightGMB High Parameter Optimization.
  - Systematization with Custom Metrics Decoding Examples.
- Results
  - Generate each Parameter Optimization Object.
  - Code: 6 Points.
- Consideration
  - Importance of each milestone optimization object generation was reaffirmed.
- Conclusion
  - Code Bronze Medal.
Optimize CatBoost HyperParameter with Optuna and GPU
- Lead sentences
  - Performance comparison was performed on optimized Ensemble Learning.
  - LightGBM won the prediction accuracy.
- Issue
  - I was new to CatBoost and wanted to compare performance with LightGBM.
- Significance
  - Performance comparison of Ensemble Learning: LightGBM, XGBoost, CatBoost, etc.
- Purpose
  - Algorithm selection for Prediction Models.
- Methodology
  - Hyper-parameter optimization.
  - CatBoost implementation.
- Results
  - Score: AUC = 0.500.
  - Code: 17 Points.
- Consideration
  - At the baseline model stage, I gave the edge to LightGBM.
- Conclusion
  - Code Bronze Medal.
LightGBM on Lyft Tabular Data added Inference and Tuning
- Lead sentences
  - Regression Prediction of LightGBM with Grid Search and Multiple Evaluation Functions.
  - A harvest that uncovered all sorts of challenges!.
- Issue
  - Regression Problem for Table Data Related to Automated Driving.
- Significance
  - I want to work on Regression Prediction with LightGBM.
  - Gain further development experiences.
  - Implement multiple evaluation functions to improve accuracy.
- Purpose
  - Improving accuracy of Regression Prediction.
- Methodology
  - Set evaluation functions of LightGBM in MSE and RMSE.
  - Parameter search by grid search.
- Results
  - Score: 356.084.
  - Code: 10 Points.
- Considerations
  - Grid search shown that hyperparameter optimization is inefficient.
  - I reaffirmed the need to use feature engineering and inference.
- Conclusion
  - Code Bronze Medal
COVID-19 with H2OAutoML Baseline Model
- Lead sentences
  - Experimented with AutoML performance, but found the original to be more powerful.
  - This led to the original development of the LightGBM optimization.
- Issue
  - COVID-19 infection explosion and new global challenges.
- Significance
  - Improvement of coding techniques for anonymized Table data.
  - Accumulate experiences using AutoML.
- Purpose
  - Optimization Regression Prediction with AutoML.
- Methodology
  - Set RMSLE as evaluation function for Regression Prediction with H2O.
  - Extract the optimized Regression Prediction Models: Deep Learning, XGBoost, GLM, GBM, etc.
- Results
  - Score: RMSLE = 0.086.
  - Code: 6 Points.
- Considerations
  - Original development was more powerful than H2OAutoML.
  - Opportunity to work on Optimized Regression Prediction with LightGBM.
- Conclusion
  - Code Bronze Medal.
Optimized Predictive Model with H2OAutoML
- Lead sentences
  - Even in Binary Classification, AutoML was found to be inferior to proprietary.
  - It is thought that the difference was due to Preprocess and Feature Engineering.
- Issue
  - Regression Prediction by H2OAutoML was inferior to original development.
- Significance
  - It was unclear whether results would be similar to Regression Prediction.
- Purpose
  - Experiment on H2OAutoML in Binary Classification.
- Methodology
  - Set RMSLE as the evaluation function for Binary Classification with H2O.
  - Extract the Optimized Binary Classification Models: Deep Learning, XGBoost, GLM, GBM. ey/tc.
- Results
  - Score: AUC = 0.850.
  - Code: 5 Points.
- Considerations
  - The performance was higher than that of Regression Prediction Case.
  - Process and Feature Engineering itself is not automated.
  - It has to be developed independently.
- Conclusion
  - Code Bronze Medal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awarded 16 analytical reports with Python at 16GB Kernel of Kaggle.

About the Author

Score Table

Abstracts

3 Silver Medals

13 Bronze Medals

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
20201222_Evidence_posted_on_Kaggle_for_first_time.pdf		20201222_Evidence_posted_on_Kaggle_for_first_time.pdf
Evidence_ Kaggle_Notebook_Expert.pdf		Evidence_ Kaggle_Notebook_Expert.pdf
Evidence_3x_Kaggle_Expert.pdf		Evidence_3x_Kaggle_Expert.pdf
Evidence_Kaggle_Dataset_Expert.pdf		Evidence_Kaggle_Dataset_Expert.pdf
Kaggle_Python3_Bronze_Medal_COVID-19_with_H2OAutoML_Baseline.ipynb		Kaggle_Python3_Bronze_Medal_COVID-19_with_H2OAutoML_Baseline.ipynb
Kaggle_Python3_Bronze_Medal_Cassava_Leaf_Disease_Best_Keras_CNN_Tuning.ipynb		Kaggle_Python3_Bronze_Medal_Cassava_Leaf_Disease_Best_Keras_CNN_Tuning.ipynb
Kaggle_Python3_Bronze_Medal_LGBM_on_Lyft_Tabular_Data_Inference_Tuning.ipynb		Kaggle_Python3_Bronze_Medal_LGBM_on_Lyft_Tabular_Data_Inference_Tuning.ipynb
Kaggle_Python3_Bronze_Medal_LightGBM_on_GPU_with_Feature_Engineering_Optuna_and_Visualization.ipynb		Kaggle_Python3_Bronze_Medal_LightGBM_on_GPU_with_Feature_Engineering_Optuna_and_Visualization.ipynb
Kaggle_Python3_Bronze_Medal_LightGBM_with_the_Inference_and_Empirical_Analysis.ipynb		Kaggle_Python3_Bronze_Medal_LightGBM_with_the_Inference_and_Empirical_Analysis.ipynb
Kaggle_Python3_Bronze_Medal_Market_Prediction_XGBoost_with_GPU_Modified.ipynb		Kaggle_Python3_Bronze_Medal_Market_Prediction_XGBoost_with_GPU_Modified.ipynb
Kaggle_Python3_Bronze_Medal_Optimize_CatBoost_HyperParameter_with_Optuna_and_GPU.ipynb		Kaggle_Python3_Bronze_Medal_Optimize_CatBoost_HyperParameter_with_Optuna_and_GPU.ipynb
Kaggle_Python3_Bronze_Medal_Optimized_LightGBM_with_Optuna.ipynb		Kaggle_Python3_Bronze_Medal_Optimized_LightGBM_with_Optuna.ipynb
Kaggle_Python3_Bronze_Medal_Optimized_Logit_LightGBM_Classifier_and_CNN_Model.ipynb		Kaggle_Python3_Bronze_Medal_Optimized_Logit_LightGBM_Classifier_and_CNN_Model.ipynb
Kaggle_Python3_Bronze_Medal_Optimized_Predictive_Model_with_H2OAutoML.ipynb		Kaggle_Python3_Bronze_Medal_Optimized_Predictive_Model_with_H2OAutoML.ipynb
Kaggle_Python3_Bronze_Medal_RFCX_Residual_Network_with_TPU_Customized.ipynb		Kaggle_Python3_Bronze_Medal_RFCX_Residual_Network_with_TPU_Customized.ipynb
Kaggle_Python3_Bronze_Medal_Research_with_Customized_Sharp_Weighted.ipynb		Kaggle_Python3_Bronze_Medal_Research_with_Customized_Sharp_Weighted.ipynb
Kaggle_Python3_Bronze_Medal_Submission_and_the_Inference_of_LightGBM.ipynb		Kaggle_Python3_Bronze_Medal_Submission_and_the_Inference_of_LightGBM.ipynb
Kaggle_Python3_Silver_Medal_LightGBM_Classifier_and_Logistic_Regression_Report.ipynb		Kaggle_Python3_Silver_Medal_LightGBM_Classifier_and_Logistic_Regression_Report.ipynb
Kaggle_Python3_Silver_Medal_Optimize_LightGBM_HyperParameter_with_Optuna_and_GPU.ipynb		Kaggle_Python3_Silver_Medal_Optimize_LightGBM_HyperParameter_with_Optuna_and_GPU.ipynb
Kaggle_Python3_Silver_Medal_Optimized_LightGBM_with_Optuna_adding_SAKT_Model.ipynb		Kaggle_Python3_Silver_Medal_Optimized_LightGBM_with_Optuna_adding_SAKT_Model.ipynb
README.md		README.md

Satoru-Shibata-JPN/Kaggle

Folders and files

Latest commit

History

Repository files navigation

Awarded 16 analytical reports with Python at 16GB Kernel of Kaggle.

About the Author

Score Table

Abstracts

3 Silver Medals

13 Bronze Medals

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages