MAT388E-Data Analysis in Fund. Sciences course by Gül İnan.

MAT388E-Data Analysis in Fundamental Sciences

Course Instructor: Gül İnan

Course Summary:

MAT388E is an undergraduate level course which aims to provide an introduction to commonly used statistical methods for inference and prediction problems in data analysis. This course is designed such that:

The methods covered will include supervised learning algorithms with a focus on regression and classification problems and unsupervised learning algorithms with a focus on clustering problems,
Application of these methods to data analysis problems and their software implementation will be done via Python.

At the end of the semester, the students are expected:

To be fluent in the fundamental principles behind several statistical methods,
To be able to apply statistical methods to real life problems and data sets, and
To be prepared for more advanced coursework or industrial internship in machine learning and related fields.

Course GitHub Organization: https://github.com/MAT388E-Fall23.

Course Prerequisites:

Since the course also touches on the mathematical and statistical theory behind the methods and uses Python for implementation, this course requires the following background:

Knowledge of linear algebra, probability, statistics, and optimization,
Familiarity with Python's Numpy, Pandas, Matplotlib, Seaborn, Statsmodels, and Scikit-Learn libraries,
Familiarity with at least one computational document such as Jupyter Notebook, Google Colab, Visual Studio Code, or RStudio Quarto, and
Familiarity with Git commands and GitHub interface.

Course Tentative Plan

We will closely follow the weekly schedule given below. However, weekly class schedules are subject to change depending on the progress we make as a class.

Week 1. Introduction to statistical learning. Supervised and unsupervised learning. Introduction to simple linear regression. Basic optimization concepts used in simple linear regression analysis. Models evaluation metrics for regression problems.

Week 2. Multiple linear regression. Basic optimization concepts used in multiple linear regression analysis.

Week 3. Polynomial regression. Bias-variance trade-off. Over-fitting and under-fitting.

Week 4. Feature selection approaches. Feature Engineering (scaling, encoding).

Week 5. Regularization methods for regression problems. Ridge and lasso regression.

Week 6. Cross-validation. Grid search and hyper-parameter tuning. Pipelines.

Week 7. Introduction to classification. Logistic regression. Gradient descent algorithm. Evaluation metrics for binary classification algorithms. Decision boundary concept.

Week 8. Multi-class classification. Evaluation metrics for multi-class classification algorithms.

Week 9. Naive Bayes. K-nearest neighbors. Remaining topics related to classification such as under-sampling, over-sampling.

Week 10. Decision trees.

Week 11. Bootstrapping, Bagging, Ensemble methods (Random forests and Boosting).

Week 12. Unsupervised learning. Principal component analysis.

Week 13. Clustering methods. K-means algorithm.

Week 14. Hierarchical clustering, DBSCAN.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAT388E-Data Analysis in Fund. Sciences course by Gül İnan.

MAT388E-Data Analysis in Fundamental Sciences

Popular repositories Loading

Repositories

People

Top languages

Most used topics