CORE Skills Data Science Springboard - Day 8 - Machine Learning I - Fundamentals and Supervised Techniques
The aim of today's session will be to introduce the main concepts of Machine Learning (ML) and present some of the most used supervised techniques: K-Nearest Neighbours, Support Vector Machines (SVM) and Random Forest. We will introduce the ML landscape, types of ML algorithms and crucial concepts as overfitting and undercutting in ML projects. We will also discuss the pros and cons of these supervised techniques and have an overview of performance measures for supervised classification and regression.
You should aim to understand:
- The importance of machine learning and how to categorise the algorithms.
- The general concepts of a supervised classification and directions on how to evaluate the performance of the models.
- How to detect overfitting and underfitting and have in mind some directions to overcome them.
- How SVM, Random Forests and K-NNs work and have an overview of the advantages and disadvantages of each one.
We will use python notebooks exercises on different datasets to have a practical go on the discussed methods in practise.
We will use Scikit-learn a lot! You can have a reading about the K-NN, SVM and Random Forest techniques through its documentation.
We will also use the book "Hands-on Machine Learning with Scikit-learn and Tensorflow" as the main reference. There is a collection of python notebooks related to the book where you can find all the python implementations: https://github.com/ageron/handson-ml. You can also purchase a copy of the book from O'Reilly Books here: http://shop.oreilly.com/product/0636920052289.do
Stack Overflow is a useful website to find solutions for when you get stuck with python and/or scikit-learn: https://stackoverflow.com/questions/tagged/scikit-learn.
These are some free good textbooks in ML that you may find interesting.
-
Understanding Machine Learning: From Theory to Algorithms: http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf
-
Mining of Massive Datasets: http://mmds.org/#ver21
-
Scikit-learn tutorial: statistical-learning for scientific data processing: http://gael-varoquaux.info/scikit-learn-tutorial/
Finally, for you to get inspired on the intriguing and promising field of Machine Learning, we recommend you to read this nice story from HBR on Nokia's company-wide ML approach: https://hbr.org/2018/10/the-chairman-of-nokia-on-ensuring-every-employee-has-a-basic-understanding-of-machine-learning-including-him
The lecture video mentioned in the article is also very interesting. You can find it here: https://www.youtube.com/watch?v=KNMy7NCQDgk&t=33s