- The WeChat group will be created by TA. (No 1-to-1 chat please.)
- Email is the preferred method of communication. The class mailing list will be created as PHBS.MLF@allmail.net.
- Course slides: Intro | Regression | SVM/KNN/Tree | SVD/PCA/LDA | Hyperparameter | Sentiment / Word Embedding | Neural Network | Graphical Model | Finance Research
- Project: Current | 2022 | 2021 | 2019 | 2018 | 2017 | 2016
- Past years' exam: 2023 | 2022 | 2021 | 2019 (online take-home) | 2018 | 2017 | Exams from Tom Michell's ML course (Carnegie Mellon University)
No | Date | Contents |
---|---|---|
01 | 2.20 Tue | Course overview (Syllabus) | Required software (Python, Github, PyCharm) | Python crash course (Basic, Numpy (Notebook Shorcut Keys), Pandas. Also see Datacamp, CheatSheet) |
02 | 2.23 Fri | PML Ch. 1: Intro (Slides) | Notations, Regression, Weight update (Slides) |
03 | 2.27 Tue | PML Ch. 2: Perceptron, Adaline, Gradient descent, Stochastic Gradient Descent |
04 | 3.01 Fri | PML Ch. 3: Logistic Regression (LR) (Slides) and Support Vector Machine (SVM) (Slides) |
05 | 3.05 Tue | PML Ch. 3: KNN (Slides, Code), Decision Tree (Slides). |
06 | 3.08 Fri | PML Ch. 4: Data Preprocessing, PML Ch. 5: SVD/PCA (Slides) |
07 | 3.12 Tue | PML Ch. 5: LDA (Slides), PML Ch. 6: Bias-Variance, Cross-validation (Slides) |
08 | 3.15 Fri | PML Ch. 6: Hyperparameter tuning, Evaluation Metric, Class imbalance (Slides) |
09 | 3.19 Tue | PML Ch. 7: Ensenble Learning (Slides), Kernel Method (Slides, PML Ch 3, 5) |
14 | 3.22 Fri | HSBC Guest Lecture [1/4]: Overview and data introduction. |
10 | 3.26 Tue | PML Ch. 8: Sentiment Analysis (Slides) |
11 | 3.29 Fri | Topics in Finance ML: Recession prediction (Slides), ML in Finance Research (Slides), Collaborative Filtering (Slides) |
12 | 4.02 Tue | Neural Network, Deep Learning, CNN (Slides, PML Ch. 12-15) |
13 | 4.07 Sun | Midterm Exam (In Class) |
15 | 4.09 Tue | HSBC Guest Lecture [2/4] |
16 | 4.12 Fri | HSBC Guest Lecture [3/4] |
17 | 4.16 Tue | HSBC Guest Lecture [4/4] |
18 | 4.19 Fri | Course Project Presentation (may be scheduled later) |
-
- Register on Github.com and let TA know your ID. Make sure to use your full real name in your profile. Accept TA's invitation to the PHBS organization.
- Create a designated repository
GITHUB_ID/PHBS_MLF_2023
for your HW and project. TickInitialize this repository with a README
and selectpython
under.gitignore
- Fork PML repository to your repository.
- Create a designated repository
- Install Github Desktop. Then clone the PML repository to your local storage.
- Install Anaconda Python distribution (3.X version, not 2.X version). Anaconda distribution is core Python + useful scientific computation libraries (e.g., numpy, scipy, pandas) + package management system (pip or conda)
- Install PyCharm Community version. (Or Professional version after applying for free student license)
- Send to TA the screenshots of (1) Github Desktop (showing the PML repository) (2) Jupyter Notebook (Anaconda) (3) PyCharm (See my examples: Github Desktop, Anaconda Spyder).
- Register on Github.com and let TA know your ID. Make sure to use your full real name in your profile. Accept TA's invitation to the PHBS organization.
-
- The goal of this HW is to be familiar with the basic classifiers PML Ch 3.
- For this HW, we will use Give Me Some Credit on Kaggle. You may download it from the Kaggle link or CMS.
- Load
cs-training.csv
into a Pandas dataframe. - Fill-in the missing values (
nan
) with the column means. (Usepd.fillna()
or See Ch 4 ofPML
) - Select the 2 most important features using LogisticRegression with L1 penalty. (Adjust C until you see 2 features)
- Using the 2 selected features, apply LR / SVM / decision tree. Try your own hyperparameters (C, gamma, tree depth, etc) to maximize the prediction accuracy. (Just try several values. You don't need to show your answer is the maximum.)
- Visualize your classifiers using the
plot_decision_regions
function from PML Ch. 3 - Put your result in
YOUR_GITHUB_ID/Give-Me-Some-Credit/code/Classifiers.ipynb
-
- The goal of this HW is to be familiar with PCA (feature extraction), grid search, pipeline, k-fold CV.
- For this HW, we continue to use Give Me Some Credit on Kaggle.
- Extract a few (>2) features using PCA method.
- Using the selected features from above, we are going to apply LR / SVM / decision tree (or any other algorithm).
- Implement the methods using pipeline. (PML p185)
- Use grid search for finding optimal hyperparameters. (PML p199). In the search, apply 5-fold cross-validation.
- Lectures: Tuesday & Friday 1:30 – 3:20 PM
- Venue: PHBS Building, Room 313
Instructor: Jaehyuk Choi
- Office: PHBS Building, Room 755
- Phone: 86-755-2603-0568
- Email: jaehyuk@phbs.pku.edu.cn
- Office Hour: Monday 7-9 PM
- Email: sunan@stu.pku.edu.cn
- TA Office Hour (Room 213/214): TBA
With the advent of computation power and big data, machine learning (ML) recently became one of the most spotlighted research fields in industry and academia. This course provides a broad introduction to ML in theoretical and practical perspectives. Through this course, students will learn the intuition and implementation behind the popular ML methods and gain hands-on experience in using ML software packages such as SK-learn and Tensorflow. This course will also explore the possibility of applying ML to finance and business. Each student is required to complete a final course project. This year, the compliance analytics team in HSBC bank (Gunagzhou) will give 4 guest lectures to demonstrate how ML is developed and shared in banking industry.
This course assumes prior knowkedge in probability/statistics and experience in Python. This course is ideally recommended for those who have taken introductory ML/AI courses from an undergraduate program.
- PML (primary textbook): Python Machine Learning 3rd Ed. by Sebastian Raschka.
- Github (PHBS fork)
- ISLR: An Introduction to Statistical Learning (with Applications in R) by James, Witten, Hastie, and Tibshirani
- Python Implementation: PHBS/ISLR-python (PHBS fork)
- Bishop: Pattern Recognition and Machine Learning by Bishop (Microsoft)
- ESL: The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
- CML: Coursera Machine Learning by Andrew Ng
- DL: Deep Learning by Goodfellow, Bengio, and Courville
- AFML: Advances in financial machine learning by López de Prado
- Attendance 20%, Mid-term exam 30%, Assignments 20%, Course Project 30%
- Attendance: Randomly checked. The score is calculated as
20 – 2x(#of absence)
. Leave requests should be made 24 hours before with supporting documents, except for emergencies. Job interview/internship cannot be a valid reason for leave. - Mid-term exam: 11.1 Mon. In-class open-book without computer/phone/calculator
- Course project: Data Proposal and Presentation. Group of up to ?? people.
- Grade in letters (e.g., A+, A-, ... ,D+, D, F). A- or above < 30% and B- or below > 10%.