# Predicting Ski Resort Lift Ticket Prices
### By Group 28

## Introduction
Anyone who has skied or snowboarded in the past few years will almost certainly have noticed the rapid increase in lift ticket prices - what used to be 70-100 dollars a day can now reach as high as 220 or more for a single day’s lift ticket at modern, large (often Vail Resorts owned) ski resort around North America and Europe. 

In this investigation we plan on trying to find the link - comparing factors of many European ski resorts like Max Altitude, Min Altitude, Total Runs, Run Difficulties, Number of Lifts, Snowmaking Capacity and more to determine what factors really affect the lift ticket price the most. We will then use this data to predict lift ticket prices for new resorts to match fair market price. 

This will all be completed using Kaggle user thomasnibb’s “European Ski Resorts” dataset, found at https://www.kaggle.com/datasets/thomasnibb/european-ski-resorts.

## Preliminary exploratory data analysis
Here we will demonstrate how the data can be read in to R via the web, clean and tidy the data, then summarize the data in both a table and visualization format.

In [4]:
# Run First
library(tidyverse)

In [12]:
# Reading the Data
url <- "https://raw.githubusercontent.com/alextdart/dsci100-group28-2022wt2/main/European_Ski_Resorts.csv"
raw_data <- read_csv(url, show_col_types = FALSE)

In [13]:
# Cleaning the Data (removing unneeded columns, otherwise is clean already)
head(raw_data)

ski_data <- raw_data |>
    select(HighestPoint:SnowCannons)

head(ski_data)

Row,Resort,Country,HighestPoint,LowestPoint,DayPassPriceAdult,BeginnerSlope,IntermediateSlope,DifficultSlope,TotalSlope,Snowparks,NightSki,SurfaceLifts,ChairLifts,GondolaLifts,TotalLifts,LiftCapacity,SnowCannons
<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,Alpendorf (Ski amedé),Austria,1980,740,52,30,81,4,115,Yes,No,22,16,11,49,75398,600
2,Soldeu-Pas de la Casa/​Grau Roig/​El Tarter/​Canillo/​Encamp (Grandvalira),Andorra,2640,1710,47,100,77,33,210,Yes,Yes,37,28,7,72,99017,1032
3,Oberau (Wildschönau),Austria,1130,900,30,1,0,1,2,No,No,2,0,0,2,1932,0
4,Dachstein West,Austria,1620,780,42,15,33,3,51,Yes,Yes,25,8,3,36,32938,163
5,Rosa Khutor,Southern Russia,2320,940,22,30,26,21,77,Yes,No,6,11,10,27,49228,450
6,Białka Tatrzańska-Kotelnica-​Kaniówka-​Bania,Poland,910,680,23,12,3,0,16,Yes,Yes,7,1,0,8,28020,0


HighestPoint,LowestPoint,DayPassPriceAdult,BeginnerSlope,IntermediateSlope,DifficultSlope,TotalSlope,Snowparks,NightSki,SurfaceLifts,ChairLifts,GondolaLifts,TotalLifts,LiftCapacity,SnowCannons
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1980,740,52,30,81,4,115,Yes,No,22,16,11,49,75398,600
2640,1710,47,100,77,33,210,Yes,Yes,37,28,7,72,99017,1032
1130,900,30,1,0,1,2,No,No,2,0,0,2,1932,0
1620,780,42,15,33,3,51,Yes,Yes,25,8,3,36,32938,163
2320,940,22,30,26,21,77,Yes,No,6,11,10,27,49228,450
910,680,23,12,3,0,16,Yes,Yes,7,1,0,8,28020,0


In [None]:
# Summary Table
#    TODO

In [None]:
# Summary Visualization
#    TODO

## Methods
We are going to use the numeric and boolean columns (all but Resort Name, Country and Row number) to help us predict DayPassPriceAdult - the price of a Day Pass for a resort. Country is too varied to help us gain accuracy in prediction and is thus excluded. Row number, similarly, isn't relevant and the resort name is unique and most likely has no impact on ticket price.

There are many ways we can use the linear regression model we learnt in Week 8 to potentially predict the price of a lift ticket in a new ski resort, with, of course, the Day Pass Price being the variable we're predicting. We can create multiple recipes — each with a different variables we're using to predict — before plotting them to see if there was any sort of correlation. As an example; we can create a recipe that predicts the price of tickets by using Total Slopes as the predictor. We can then see if there are any regularities between all the created plots.

We will also create an average case with the mean of all columns as a reference point. Average price can then be compared with the average factors to help asertain the significance of each factor in tandem with the individual prediction cases. Finally, we will graph the data set in terms of its main factors to aid in interpreting the outcomes.

## Expected outcomes and significance

What do you expect to find?
What impact could such findings have?
What future questions could this lead to?

TODO