GitHub - SDS322E/Project2: Project 2

Overview

The goal of this project is to build a prediction model and comparing the performance of different prediction modeling methods. You will do this in a dataset of your choosing (there is a list of suggested datasets below). In this project, you will need to

Build a base prediction model using linear or logistic regression (depending on the nature of the outcome);
Tune your base model to find the optimal version;
Compare your prediction model to a nonparametric machine learning method;
Tune the machine learning model’s tuning parameters to find the optimal configuration
Determine the best performing model of the ones you have tried

Working independently or in a group

For the project, you can work independently or in a group of at most 6 students (including yourself). You can choose the members of your group (i.e. you do not have to use your lab group or the group you used for Project 1). If working in a group, you will only produce one unique report for the group.

Preliminary Report

The project will be done in two parts. The first part will be the Preliminary Report where you will need to answer some basic questions about your dataset and your project. Use the file Project2_Preliminary.Rmd in the Project 2 repository for your preliminary report and follow the prompts in that file.

The preliminary report will be graded on completion only. Furthermore, if you decided to make a change to your project after completing the preliminary report, that is okay.

The preliminary report will be submitted as a PDF in Gradescope.

Final Report

The final report will contain your main prediction model analysis. Use the file Project2_Report.Rmd in the Project 2 repository for your final report and follow the prompts in that file.

The final report will be submitted as a PDF in Gradescope. Make sure to mark the pages that correspond to each part of the project.

Suggested Datasets

The following datasets are suggestions. The data files are located in the Project 2 repository already and their file names are listed below.

Fine Particulate Matter Air Pollution in the United States – This is an expanded version of the air pollution dataset that you used in Lab 11 (this version has more variables). You can learn more about the variables in this dataset at the Open Case Studies web site.
- Repository file: pm25_data.csv.gz
Austin Crash Report Data – This dataset contains traffic crash records for crashes which have occurred in Austin, TX in the last ten years
- Repository file: Austin_Crash_Report_Data_-_Crash_Level_Records_20250407.csv.gz
Austin Animal Center Intakes – Animal Center Intakes from Oct, 1st 2013 to present. Intakes represent the status of animals as they arrive at the Animal Center.
- Repository file: Austin_Animal_Center_Intakes_20250407.csv.gz
Barton Springs Salamanders DO and Flow – Data collected to assess water quality conditions in the natural creeks, aquifers and lakes in the Austin area.
- Repository file: Barton_Springs_Salamanders_DO_and_Flow_20250407.csv.gz

All of the suggested datasets are in CSV format and can be read using the read_csv() function in the tidyverse.

You are welcome to find and use your own dataset if you do not want to use one of the suggested datasets. If you choose to use another dataset, you will need to download it separately and copy it into the project folder.

Download the RStudio Project

Open RStudio and click on “New Project…” in the drop down menu in the upper right.
In the New Project Wizard, select Version Control.
In the next menu titled “Create Project from Version Control”, select Git.
Under “Repository URL”, enter the web site URL for this lab: https://github.com/SDS322E/Project2
Under “Project directory name”, enter Lab10. (RStudio may automatically put this into the box for you.)
You may choose to select a directory to store the project or you can use the default.
Click “Create Project”.

Once you are in the RStudio project for Project2 you will find

Project2_Instructions.pdf – the instructions for the project
Project2_Preliminary.Rmd – the template for the preliminary report
Project2_Report.Rmd – use this file for your final report
The four suggested dataset files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Working independently or in a group

Preliminary Report

Final Report

Suggested Datasets

Download the RStudio Project

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Austin_Animal_Center_Intakes_20250407.csv.gz		Austin_Animal_Center_Intakes_20250407.csv.gz
Austin_Crash_Report_Data_-_Crash_Level_Records_20250407.csv.gz		Austin_Crash_Report_Data_-_Crash_Level_Records_20250407.csv.gz
Barton_Springs_Salamanders_DO_and_Flow_20250407.csv.gz		Barton_Springs_Salamanders_DO_and_Flow_20250407.csv.gz
Project2_Instructions.pdf		Project2_Instructions.pdf
Project2_Preliminary.Rmd		Project2_Preliminary.Rmd
Project2_Preliminary.html		Project2_Preliminary.html
Project2_Report.Rmd		Project2_Report.Rmd
Project2_Report.html		Project2_Report.html
README.md		README.md
pm25_data.csv.gz		pm25_data.csv.gz

Folders and files

Latest commit

History

Repository files navigation

Overview

Working independently or in a group

Preliminary Report

Final Report

Suggested Datasets

Download the RStudio Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages