Skip to content

DATA SCIENCE SCHOOL: Introduction to Machine Learning in Python

License

Notifications You must be signed in to change notification settings

datakolektiv/DSS_IntroMLPython

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSS_IntroMLPython

DATA SCIENCE SCHOOL: Introduction to Machine Learning in Python

This is a DataKolektiv repository for our INTRODUCTION TO ML IN PYTHON Data Science School 2022.
contact: hello@datakolektiv.com

The School will be hosted at TBA every TBA.
Asynchronous consultations and work will be carried on via Slack and GitHub TBA.

Data Science School Intro ML Python

PROGRAM

Week 1.

  • Saturday, December 3, 09:00 - 18:00 CET, Startit center, Belgrade
    • 09:00 - 12:30. Introduction to Numpy and Pandas packages: data types, vectorization, working with pandas.DataFrame class.
    • 14:30 - 18:00. Data organization and arrangement (i.e. data wrangling) in Pandas for analytics and machine learning. An introduction to probability theory and mathematical statistics in Numpy and Scipy.
  • Asynchronous (Slack, GitHub), Monday, December 5 - Friday, December 9
    • Data visualization in Matplotlib and Seaborn packages
    • Exploratory Data Analysis (EDA) in Pandas
    • The method of least squares and optimization of a simple linear model.

Week 2.

  • Saturday December 10, 09:00 - 18:00 CET, Startit centar, Belgrade
    • 09:00 - 12:30. Linear and multiple linear regression
    • 14:30 - 18:00. Introduction to generalized linear models: binomial logistic and multinomial logistic regression for classification problems
  • Asynchronous (Slack, GitHub), Monday, December 12 - Friday, June 16
    • Case Study 1: Churn Prediction
    • How to control model overfit 1: regularization of linear and generalized linear models.

Week 3.

  • Saturday, June 18, 09:00 - 18:00 CET, Startit centar, Belgrade
    • 09:00 - 12:30. Cross-validation and regularization in classification problems; model selection (ROC analysis)
    • 14:30 - 18:00. Decision Tree (CART)
  • Asynchronous (Slack, GitHub), Monday, December 19 - Friday, December 23
    • Case study 2: Price Prediction in the Real Estate Market
    • How to control overfit of model 2: cross-validation and regularization in regression problems.

Week 4.

  • Saturday, December 24, 09:00 - 18:00 CET, Startit centar, Belgrade
    • 09:00 - 12:30. Random Forest model for regression and classification problems
    • 14:30 - 18:00. Gradient Boosting: An XGBoost Model for Regression and Classification Problems
  • Asynchronous (Slack, GitHub), Monday, December 26 - Friday, December 30
    • Case Study 3: Web Content Popularity Prediction
    • Case study 4: Complete model setup and fine-tuning parameters with the XGBoost algorithm for regression and classification problems.

Prerequisites

it's an intensive ML in Python course, so...

  • if you have prior knowledge in the field of statistics, it would be good if it was at least an introductory college course; we will refresh with you the basics of probability theory and statistics and provide you with materials for your review of the basics of those areas to the extent sufficient to follow the course;

  • it would not be bad to remind yourself what functions are, what is the maximum and what is the minimum of some functions, how to draw their graphs, etc., but for that you will have clear online materials and we will repeat the necessary;

  • you should have at least elementary, working knowledge of the Python programming language: flow control, data types, classes, understanding what list comprehension is, etc.; again, and we'll have review materials for that, but this course shouldn't be your first time seeing the Python programming language.

About

DATA SCIENCE SCHOOL: Introduction to Machine Learning in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published