Author: James Chen
Repository: https://github.com/Niche-Squad/ADSA_modelvalidation
- ADSA NANP Modeling Workshop - Model Validation
Make sure you can successfully run the verification code block in the notebook verify.ipynb
before the workshop on June 25, 2023
This repository contains materials for the ADSA NANP Modeling Workshop on model validation. This workshop is designed to offer a hands-on introduction to the prevalent risks involved in evaluating predictive models. We will run through several simulations to demonstrate the importance of model validation, aiming to answer the following questions:
- Why is it necessary to split our data in model validation?
- How to appropriately tune hyperparameters in a model?
- How does different splitting methods influence our conclusions during model validation?
Please make sure to set up your environment before the workshop. You can choose to run the workshop materials on your local machine or on Anaconda Cloud.
To ensure the best quality of this workshop and to avoid any internet connectivity issues, we recommend that you run the workshop materials on your local machine. To do so, you will need to install Python
and Jupyter Notebook
, which is an interactive coding environment, on your machine.
Anaconda
is the easiest way to get started with Python
and Jupyter Notebook
. Go to https://www.anaconda.com/download to download the Anaconda
distribution and install it on your machine. You DO NOT need to change any settings during the installation.
GO to https://github.com/Niche-Squad/ADSA_modelvalidation and click the green button Code
-> Download ZIP
to download the workshop repository. Unzip the downloaded file and save it to a location on your machine.
Open the Anaconda Navigator
and launch Jupyter Notebook
.
In the Jupyter Notebook
, navigate to the workshop repository folder that you just unzipped.
Open the file verify.ipynb
. Follow the instruction in the notebook to run the verification code block. If you can run the code block without any errors, you are all set for the workshop!
If you encounter any issues with the local environment setup, you can choose to run the workshop materials online. The advantage of this option is that you do not need to install any software on your machine. However, you will need to have a stable internet connection during the workshop.
Anaconda offers a free cloud service that allows you to run Jupyter Notebook
on the cloud. It requires you to register an account for the service. Go to https://www.anaconda.com/code-in-the-cloud and follow the instructions to register an account.
Once you log in to your Anaconda Cloud account, you will see a dashboard. Go to Other
section and open Terminal
.
In Terminal
, run the following command to clone the workshop repository to your cloud account:
git clone https://github.com/Niche-Squad/ADSA_modelvalidation
On the left panel of the dashboard, click ADSA_modelvalidation
and open the file verify.ipynb
. Follow the instruction in the notebook to run the verification code block. If you can run the code block without any errors, you are all set for the workshop!
Now you are ready to start the workshop! We will cover three important topics in model validation. Please use the jupyter notebooks in this repository to follow along with the following topics:
Why is it necessary to split our data in model validation?
How to appropriately tune hyperparameters in a model?
How does different splitting methods influence our conclusions during model validation?