You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This talk covers ensemble learning, a paradigm of machine learning. Ensemble learning consists of combining multiple learning models for predictions, and is a popular method for predictions and analysis. I will go through the various kinds of ensemble methods and applications in various real-world problems, and also share tips and resources for building and deploying good ensembles.
About
Introduction: the advent of machine learning and data analytics as the future in making, why they contain much potential and promise to look forward to. Solve problems from cancer to movie recommendations and planet detection.
Ensemble Learning: Failure/inefficiency/excess time and/or memory consumption of several good machine learning algorithms needs a solution. Ensemble learning, the concept and origins. Basic introduction, why it can be better than several other prevalent methods. Motive behind building ensembles. Introduction to hyperparameters, and why models with too many hyperparameters to tune take too much time and a better method is instead needed.
Types of ensemble learning methods: many classes and categories, but they all talk about the same basic types. We'll stick to the ones specified in scikit-learn and commonly used terminology.
-Bagging: Short for bootstrap aggregating. Introduction, idea: training several similar base estimators on different subsets of data and aggregating their results for final predictions. Methods of drawing random samples: Random Subspace, Pasting, Random Patches. Examples (available in scikit-learn): Random Forest and Extra Trees Classifiers/Regressors. Introduction, analysis of working principles, randomness of splits, applications, reducing variance and increasing bias, variations and unsupervised learning. Examples, detailed explanation.
-Boosting: Introduction and motivation: converting a bunch of weak learners to strong learners by iteratively learning from previous mistakes. Concept of weak learners: estimators that perform only slightly better than random estimators. Examples of weak learners: Decision Stumps. AdaBoost, the most important and useful boosting algorithm: introduction, origins, authors. Detailed explanation and analysis of algorithm, variants and applications. Gradient Boosting, a generalization of AdaBoost for multi-class classification, regression etc., available as Gradient Tree Boosting in scikit-learn: introduction, explanation, applications. Variants of boosting algorithms, a brief introduction: LogitBoost, BrownBoost, CatBoost. XGBoost: the secret recipe behind several Kaggle data science competitions, what makes it superior to other ensemble learning methods. Tips for using ensemble algorithms, resources and ideas.
-Stacking/blending: Introduction, motivation: using several base estimators to make predictions, then using these predictions in a second estimator called meta-estimator for final predictions. Stacking classifiers and regressors together. Difference between stacking and blending, advantages and disadvantages of blending over stacking. Cross validation in the case of stacking and blending. Number one method for top Kaggle competitions and other data science problems including the Netflix Prize. Tips and resources for ensembles, why having a lot of diversity in estimators is better than identical estimators stacked up/blended.
-Voting ensembles: Introduction and motivation: giving more weightage to better-performing base estimators and less to poorly performing ones, or assigning final predictions on majority rule. Two kinds: hard voting (majority-based) and soft voting (averages of predicted probabilities). Usage and applications, the Otto Group Product Classification Challenge. Conclusion: more and more active research required, ensemble learning the future of machine learning and data science research.
Pre-requisites
Basic knowledge of data science and machine learning using Python.
Expected duration
20-25 minutes.
Level
Intermediate
Resources
Speaker Bio
I am a third year engineering undergrad at Delhi Technological University (DTU). I have done several projects in the field of Machine Learning, and authored a research paper as well, on the task of human activity recognition from accelerometer data. I am currently working on speaker-independent speech emotion recognition, and other projects. I am passionate about math, artificial intelligence and literature, and believe that there is much potential to be harnessed still in Machine Learning.
Abstract
This talk covers ensemble learning, a paradigm of machine learning. Ensemble learning consists of combining multiple learning models for predictions, and is a popular method for predictions and analysis. I will go through the various kinds of ensemble methods and applications in various real-world problems, and also share tips and resources for building and deploying good ensembles.
About
-Bagging: Short for bootstrap aggregating. Introduction, idea: training several similar base estimators on different subsets of data and aggregating their results for final predictions. Methods of drawing random samples: Random Subspace, Pasting, Random Patches. Examples (available in scikit-learn): Random Forest and Extra Trees Classifiers/Regressors. Introduction, analysis of working principles, randomness of splits, applications, reducing variance and increasing bias, variations and unsupervised learning. Examples, detailed explanation.
-Boosting: Introduction and motivation: converting a bunch of weak learners to strong learners by iteratively learning from previous mistakes. Concept of weak learners: estimators that perform only slightly better than random estimators. Examples of weak learners: Decision Stumps. AdaBoost, the most important and useful boosting algorithm: introduction, origins, authors. Detailed explanation and analysis of algorithm, variants and applications. Gradient Boosting, a generalization of AdaBoost for multi-class classification, regression etc., available as Gradient Tree Boosting in scikit-learn: introduction, explanation, applications. Variants of boosting algorithms, a brief introduction: LogitBoost, BrownBoost, CatBoost. XGBoost: the secret recipe behind several Kaggle data science competitions, what makes it superior to other ensemble learning methods. Tips for using ensemble algorithms, resources and ideas.
-Stacking/blending: Introduction, motivation: using several base estimators to make predictions, then using these predictions in a second estimator called meta-estimator for final predictions. Stacking classifiers and regressors together. Difference between stacking and blending, advantages and disadvantages of blending over stacking. Cross validation in the case of stacking and blending. Number one method for top Kaggle competitions and other data science problems including the Netflix Prize. Tips and resources for ensembles, why having a lot of diversity in estimators is better than identical estimators stacked up/blended.
-Voting ensembles: Introduction and motivation: giving more weightage to better-performing base estimators and less to poorly performing ones, or assigning final predictions on majority rule. Two kinds: hard voting (majority-based) and soft voting (averages of predicted probabilities). Usage and applications, the Otto Group Product Classification Challenge. Conclusion: more and more active research required, ensemble learning the future of machine learning and data science research.
Pre-requisites
Basic knowledge of data science and machine learning using Python.
Expected duration
20-25 minutes.
Level
Intermediate
Resources
Speaker Bio
I am a third year engineering undergrad at Delhi Technological University (DTU). I have done several projects in the field of Machine Learning, and authored a research paper as well, on the task of human activity recognition from accelerometer data. I am currently working on speaker-independent speech emotion recognition, and other projects. I am passionate about math, artificial intelligence and literature, and believe that there is much potential to be harnessed still in Machine Learning.
- Can be done after the talk/workshop -
Include link to slides here
Include link to video here
The text was updated successfully, but these errors were encountered: