- Name: Satoru Shibata / 柴田 怜
- Job: Sr. Data Scientist
- Titles:
- 3x Kaggle Expert
- Retired from Kaggle at 2021 to focus on business as a data scientist.
- 4x Certified Professional
- 3x Kaggle Expert
Departments | Top Levels | Highest Rank | Awarded Medals |
---|---|---|---|
Code | 0.2% | 317/161,898 | 3 Silver 13 Bronze |
Discussion | 0.3% | 588/188,433 | 100 Bronze |
Datasets | 1% | 354/34,643 | 3 Bronze |
Competitions | 20-30% |
- Optimized LightGBM with Optuna adding SAKT Model
- Lead sentences
- Submitted code using two Ensemble Learning Methods.
- Used 100 million rows of training data for prediction on a 16GB Kernel removing unnecessary objects.
- Issue
- Algorithms for TOEIC Learning Applications
- Significance
- Predict percentage of correct answers based on user's behavioral history.
- User's percentage of correct answers will increase with the number of problems solved.
- Purpose
- Optimize Binary Classification for AUC.
- Methodology:
- Ensemble Learning of LightGBM and SAKT.
- Hyperparameter Optimization with Optuna.
- Results
- Score: AUC = 0.781.
- Code: 31 Points.
- Considerations
- Obsessed with Models, Feature Engineering Remains a Challenge.
- Systematizing Multiple Models will also be a challenge in the future.
- Conclusion
- Code Silver Medal
- Lead sentences
- LightGBM Classifier and Logistic Regression Report
- Lead sentences
- Optimized Classification of Anonymized Raw Data from Stock Market on 16GB Kernel.
- Contributed code that systematizes Ensemble Learning and Logistic Regression.
- Issue
- Utility Function Optimization of Supply and Demand Forecasting in Securities Markets.
- Significance
- Calculate based on indicators of absence or degree of stock returns.
- Optimize the behavior of whether to trade or not.
- Purpose
- AI Dev for Profit Maximization.
- Methodology
- Optimal classification of LightGBM.
- Logit Transformation of Purpose Variables Based on Probability Distributions.
- Results
- Score: 3741.118 (Outside of Medal Zone).
- Code: 33 Points.
- Considerations
- Utility function was not fully deciphered
- Which left some issues for paper surveys.
- The report was appreciated by other Kagglers.
- Conclusion
- Code Silver Medal
- Lead sentences
- Optimize LightGBM HyperParameter with Optuna and GPU
- Lead sentences
- Unprecedented LightGBM Hyperparameter Optimization on GPU.
- Procedure was annexed and highly evaluated.
- Issues
- Preliminaries of “LightGBM Classifier and Logistic Regression Report“ .
- Significance
- High Parameter Optimization.
- There were few precedents for LightGBM.
- Purpose
- Code submission for optimizing LightGBM Hyperparameter on GPU.
- Methodology
- A survey of prior case studies using Optuna for LightGBM.
- Procedure of ssubmissions.
- Results
- Run: 953.9s on GPU
- Code: 31 Points.
- Consideration
- Available hyperparameter optimization of futures.
- Conclusion
- Code Silver Medal.
- Lead sentences
- Optimized Logit LightGBM Classifier and CNN Models
- Lead sentences
- Submitted a simulation of Multiple Model Systematization.
- Based on this failure, I was able to concentrate on LightGBM Optimization and Inference.
- Issue
- Exploring Optimization Models
- Significance
- Simulation iterations of Optimization Model.
- Purpose
- Optimize Utility Function by systematizing Multiple Models.
- Methodology
- Applying the Logit Transform to LightGBM.
- Explore combining with CNN.
- Results
- Score: 3344.738 (Outside of Medal Zone).
- Code: 15 Points.
- Considerations
- This code does LightGBM and CNN at the same time, which was prone to overflow.
- From now on, I will focus on one Model Optimization.
- Conclusion
- Code Bronze Medal
- Lead sentences
- Optimized LightGBM with Optuna
- Lead sentences
- Dev Baseline Model for Code Competition to process 100 million rows of training data.
- The minimum performance was predicted to be 16GB.
- Issue
- 100 million rows of training data must be predicted on a 16GB Kernel.
- Significance
- This is the cornerstone of the final submission model.
- Preprocess and Feature Engineering were adjusted for further optimization.
- Purpose
- Baseline Model Dev
- Methodology
- Binary Classification by LightGBM Optimization.
- Results
- Score: AUC = 0.774.
- Code: 12 Points.
- Considerations
- Policy of additional development to Baseline Model.
- The improvement of AUC by the additional development was only 0.07, which left some issues.
- Conclusion
- Code Bronze Medal
- Lead sentences
- LightGBM on GPU with Feature Engineering, Optuna, and Visualization
- Lead sentence
- Code Bronze Medal for first attempt at submitting code.
- Issue
- This was my first real effort at Kaggle.
- Significance
- Visualize in a timely manner, and features were studied.
- Optuna was also used for the first time and applied later.
- Purpose
- Work on Feature Engineering.
- Methodology
- I read and referred to posted code by Kaggle Grandmaster.
- Results
- Code: 11 Points.
- Consideration
- I could gain experiences in implementing LightGBM with Optuna on GPU.
- Conclusion
- Code Bronze Medal.
- Lead sentence
- LightGBM with the Inference and Empirical Analysis
- Lead sentences
- In the first scored submission code, AUC = 0.76.
- The challenges were used as the cornerstone of development experiences.
- Issue
- Scoring by developing additions to the submitted code for my first challenge.
- Significance
- A single process was limited to Model Object Generation.
- Purpose
- To further improve the performance of Prediction Model.
- Methodology
- Inference was added to improve Score.
- Empirical Analysis between raw data and predicted results.
- Detected significant differences in Gaussian Distribution.
- Results
- Score: AUC = 0.76.
- Code: 12 Points.
- Consideration
- This submitted code left insufficient understanding of inference as an issue.
- Conclusion
- Code Bronze Medal.
- Lead sentences
- Submission and the Inference of LightGBM
- Lead sentences
- My first scoring submission code prototype
- Few examples of Empirical Analysis, I won Code Bronze Medal.
- Issue
- Prototype version of submission code for first scoring.
- Significance
- Implementing the scoring submission code.
- Purpose
- Gaining development experiences.
- Methodology
- Model objects were coded for scoring.
- Empirical Analysis detected a significant difference in Gaussian Distribution.
- Result
- Code: 7 Points.
- Considerations
- Actual scoring submission code became a separate file.
- This was an opportunity for me to feel the challenge of coding.
- Focused on its afterwards.
- Conclusion
- Code Bronze Medal.
- Lead sentences
- Market Prediction XGBoost with GPU Modified
- Lead sentences
- Performance comparison with LightGBM by XGBoost Optimization.
- LightGBM takes the cake.
- Issue
- I seen good results with XGBoost sometimes.
- Significance
- Simulate on Models other than LightGBM and search for Optimized Model.
- Purpose
- Score improvement by XGBoost.
- Methodology
- GPU Implementation into XGBoost Optimization.
- Results
- Score: 3308.824 (Outside of Medal Zone).
- Code: 8 Points.
- Considerations
- XGBoost is easy to implement due to its many precedents.
- LightGBM is superior in performance comparison, which led me to focus on LightGBM.
- Conclusion
- Code Bronze Medal.
- Lead sentences
- Cassava Leaf Disease Best Keras CNN Tuning
- Lead sentences
- I also participated in a competition on image analysis, challenging myself with raw data of various properties.
- I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.
- Issue
- I would like to try my hand at image analysis and find out what I am good at.
- Significance
- I want to gain experience in Keras implementation.
- Deepen my understanding CNN.
- Purpose
- I learn to understand and implement acoustic analysis and image analysis.
- Methodology
- I complemented the advanced submission code.
- Results
- Score: Accuracy = 0.885.
- Code: 18 Points.
- Considerations
- Theoretical aspects of acoustic analysis and image analysis remained a challenge.
- An opportunity to raise awareness to need to start with a survey of theoretical papers.
- Conclusion
- Code Bronze Medal
- Lead sentences
- RFCX Residual Network with TPU Customized
- Lead sentences
- I also participated in a competition for acoustic analysis, and tried my hand at raw data of various properties.
- I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.
- Issue
- I would like to try my hand at acoustic analysis and find out what I am good at.
- Significance
- I want to gain experience in Keras implementation.
- Deepen my understanding CNN.
- Purpose
- I learn to understand and implement acoustic analysis and image analysis.
- Methodology
- I complemented the advanced submission code.
- Results
- Score: 0.772.
- Code: 12 Points.
- Considerations
- Theoretical aspects of acoustic analysis and image analysis remained a challenge.
- An opportunity to raise awareness to need to start with a survey of theoretical papers.
- Conclusion
- Code Bronze Medal.
- Lead sentences
- Research with Customized Sharp Weighted
- Lead sentences
- Work on Custom Metrics Clarification and systematization of Hyperparameters Optimization in LightGBM.
- An each milestone optimization object generation is still important.
- Issue
- Private Custom Metrics were used as an Evaluation Function.
- Significance
- Improve prediction accuracy by elucidating Private Custom Metrics.
- Reproducibility will be determined based on the Evaluation Function.
- Purpose
- Custom Metrics Clarification.
- Methodology
- LightGMB High Parameter Optimization.
- Systematization with Custom Metrics Decoding Examples.
- Results
- Generate each Parameter Optimization Object.
- Code: 6 Points.
- Consideration
- Importance of each milestone optimization object generation was reaffirmed.
- Conclusion
- Code Bronze Medal.
- Lead sentences
- Optimize CatBoost HyperParameter with Optuna and GPU
- Lead sentences
- Performance comparison was performed on optimized Ensemble Learning.
- LightGBM won the prediction accuracy.
- Issue
- I was new to CatBoost and wanted to compare performance with LightGBM.
- Significance
- Performance comparison of Ensemble Learning: LightGBM, XGBoost, CatBoost, etc.
- Purpose
- Algorithm selection for Prediction Models.
- Methodology
- Hyper-parameter optimization.
- CatBoost implementation.
- Results
- Score: AUC = 0.500.
- Code: 17 Points.
- Consideration
- At the baseline model stage, I gave the edge to LightGBM.
- Conclusion
- Code Bronze Medal.
- Lead sentences
- LightGBM on Lyft Tabular Data added Inference and Tuning
- Lead sentences
- Regression Prediction of LightGBM with Grid Search and Multiple Evaluation Functions.
- A harvest that uncovered all sorts of challenges!.
- Issue
- Regression Problem for Table Data Related to Automated Driving.
- Significance
- I want to work on Regression Prediction with LightGBM.
- Gain further development experiences.
- Implement multiple evaluation functions to improve accuracy.
- Purpose
- Improving accuracy of Regression Prediction.
- Methodology
- Set evaluation functions of LightGBM in MSE and RMSE.
- Parameter search by grid search.
- Results
- Score: 356.084.
- Code: 10 Points.
- Considerations
- Grid search shown that hyperparameter optimization is inefficient.
- I reaffirmed the need to use feature engineering and inference.
- Conclusion
- Code Bronze Medal
- Lead sentences
- COVID-19 with H2OAutoML Baseline Model
- Lead sentences
- Experimented with AutoML performance, but found the original to be more powerful.
- This led to the original development of the LightGBM optimization.
- Issue
- COVID-19 infection explosion and new global challenges.
- Significance
- Improvement of coding techniques for anonymized Table data.
- Accumulate experiences using AutoML.
- Purpose
- Optimization Regression Prediction with AutoML.
- Methodology
- Set RMSLE as evaluation function for Regression Prediction with H2O.
- Extract the optimized Regression Prediction Models: Deep Learning, XGBoost, GLM, GBM, etc.
- Results
- Score: RMSLE = 0.086.
- Code: 6 Points.
- Considerations
- Original development was more powerful than H2OAutoML.
- Opportunity to work on Optimized Regression Prediction with LightGBM.
- Conclusion
- Code Bronze Medal.
- Lead sentences
- Optimized Predictive Model with H2OAutoML
- Lead sentences
- Even in Binary Classification, AutoML was found to be inferior to proprietary.
- It is thought that the difference was due to Preprocess and Feature Engineering.
- Issue
- Regression Prediction by H2OAutoML was inferior to original development.
- Significance
- It was unclear whether results would be similar to Regression Prediction.
- Purpose
- Experiment on H2OAutoML in Binary Classification.
- Methodology
- Set RMSLE as the evaluation function for Binary Classification with H2O.
- Extract the Optimized Binary Classification Models: Deep Learning, XGBoost, GLM, GBM. ey/tc.
- Results
- Score: AUC = 0.850.
- Code: 5 Points.
- Considerations
- The performance was higher than that of Regression Prediction Case.
- Process and Feature Engineering itself is not automated.
- It has to be developed independently.
- Conclusion
- Code Bronze Medal.
- Lead sentences