This is a DataKolektiv repository for our INTRODUCTION TO ML IN PYTHON Data Science School 2022.
contact: hello@datakolektiv.com
The School will be hosted at TBA every TBA.
Asynchronous consultations and work will be carried on via Slack and GitHub TBA.
- Saturday, December 3, 09:00 - 18:00 CET, Startit center, Belgrade
- 09:00 - 12:30. Introduction to Numpy and Pandas packages: data types, vectorization, working with
pandas.DataFrame
class. - 14:30 - 18:00. Data organization and arrangement (i.e. data wrangling) in Pandas for analytics and machine learning. An introduction to probability theory and mathematical statistics in Numpy and Scipy.
- 09:00 - 12:30. Introduction to Numpy and Pandas packages: data types, vectorization, working with
- Asynchronous (Slack, GitHub), Monday, December 5 - Friday, December 9
- Data visualization in Matplotlib and Seaborn packages
- Exploratory Data Analysis (EDA) in Pandas
- The method of least squares and optimization of a simple linear model.
- Saturday December 10, 09:00 - 18:00 CET, Startit centar, Belgrade
- 09:00 - 12:30. Linear and multiple linear regression
- 14:30 - 18:00. Introduction to generalized linear models: binomial logistic and multinomial logistic regression for classification problems
- Asynchronous (Slack, GitHub), Monday, December 12 - Friday, June 16
- Case Study 1: Churn Prediction
- How to control model overfit 1: regularization of linear and generalized linear models.
- Saturday, June 18, 09:00 - 18:00 CET, Startit centar, Belgrade
- 09:00 - 12:30. Cross-validation and regularization in classification problems; model selection (ROC analysis)
- 14:30 - 18:00. Decision Tree (CART)
- Asynchronous (Slack, GitHub), Monday, December 19 - Friday, December 23
- Case study 2: Price Prediction in the Real Estate Market
- How to control overfit of model 2: cross-validation and regularization in regression problems.
- Saturday, December 24, 09:00 - 18:00 CET, Startit centar, Belgrade
- 09:00 - 12:30. Random Forest model for regression and classification problems
- 14:30 - 18:00. Gradient Boosting: An XGBoost Model for Regression and Classification Problems
- Asynchronous (Slack, GitHub), Monday, December 26 - Friday, December 30
- Case Study 3: Web Content Popularity Prediction
- Case study 4: Complete model setup and fine-tuning parameters with the XGBoost algorithm for regression and classification problems.
it's an intensive ML in Python course, so...
-
if you have prior knowledge in the field of statistics, it would be good if it was at least an introductory college course; we will refresh with you the basics of probability theory and statistics and provide you with materials for your review of the basics of those areas to the extent sufficient to follow the course;
-
it would not be bad to remind yourself what functions are, what is the maximum and what is the minimum of some functions, how to draw their graphs, etc., but for that you will have clear online materials and we will repeat the necessary;
-
you should have at least elementary, working knowledge of the Python programming language: flow control, data types, classes, understanding what list comprehension is, etc.; again, and we'll have review materials for that, but this course shouldn't be your first time seeing the Python programming language.