In this repository, you can find the course material - including jupyter notebooks - concerning my lectures of Data Mining (JBI030), valid for the Data Science program (TUE/UvT).
In particular, my part of the course will cover:
- Data Preprocessing
- Model Selection and Evaluation
- Logistic Regression
- Linear/Kernelized SVM
- Decision Trees
- K-Nearest-Neighbors
- Neural Networks
- Ensemble Learning
During the lectures, I will present the theory of the listed models/techniques. Jupyter notebooks contain the relevant python
code needed to run such methods. In particular, we will make use of the scikit-learn
package (and Keras
, for the Neural
Networks part). Notice that scikit-learn
requires the installation of other packages, among which the main ones are:
- numpy
- pandas
- matplotlib
See the file JBI030_course_software.pdf
for further information.
The main reference for the course is the scikit-learn documentation, which contains an excellent theoretical introduction to the various methodologies, as well as a detailed technical explanation of its functions.
Other suggested readings for more detailed insights are:
- DATA MINING: Practical Machine Learning Tools and Techniques
- Introduction to Statistical Learning - With applications in R
- Introuction to Machine Learning with Python
- The Elements of Statistical Learning
To clone the repository into your local machine, you can run from terminal:
git clone https://github.com/davidevdt/datamining_jbi030
New jupyter notebooks related to the correponding course lectures will be progressively added at the end of each class; to fetch the new lectures into the local folder, place your terminal into the folder directory and type
git pull