# Group 10: The Potential of Professional Tennis Players: A Predictive Model for Player's Best Rank.

### Background information:
    The ATP (Association of Tennis Professionals) Tour is a worldwide tennis tour for men’s professional tennis organized annually by the ATP since 1990. The tour is divided into several tiers of tournaments, with higher tiers offering more ranking points. The PIF ATP Rankings is a merit-based method used by the ATP to rank players, determine entry qualification, and organize tournament seeding. Points are awarded based on the stage of the tournament reached and the tier. The rankings are updated weekly and rewarded points are dropped 52 weeks after being awarded, with some exceptions.


### Our research question:
    We plan to predict the best ranking of players based on various predictors. What is a player’s predicted best rank?

### The dataset:
    We will be using a data set containing player stats for the top 500 ATP players from 2017-2019. The dataset contains 37 variables, including both quantitative and qualitative values. 




## Preliminary Exploratory Data Analysis

In [1]:
# Retrieving nessesary packages
library(dplyr)
library(tidyverse)
library(rvest)
library(tidymodels)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mpurrr    [39m 1.0.2     [32m✔[39m [34mtidyr    [39m 1.3.0
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors

Attaching 

In [33]:
# Reading the data file
url = "https://drive.google.com/uc?export=download&id=1_MECmUXZuuILYeEOfonSGqodW6qVdhsS"
player_data = read_csv(url)

# Cleaning the data and getting what we need: Best Rank, 
processed_player_data = player_data |>
                        separate(Age, c("Age", "Birth date"), sep = " ") |>
                        mutate(Age = as.numeric(Age)) |>
                        separate(`Current Rank`, c("Current Rank", "Current Rank (Other)"), sep = " ") |>
                        mutate(`Current Rank` = as.numeric(`Current Rank`)) |>

                        separate(`Best Rank`, c("Best Rank", "Best Rank (Date)"), sep = " ") |>
                        mutate(`Best Rank` = as.numeric(`Best Rank`)) |>

                        separate(`Height`, c("Height (cm)", "cm REMOVE"), sep = " ") |>
                        mutate(`Height (cm)` = as.numeric(`Height (cm)`)) |>
                        separate(`Weight`, c("Weight (kg)", "kg REMOVE"), sep = " ") |>
                        mutate(`Weight (kg)` = as.numeric(`Weight (kg)`)) |>
                        mutate(Country = as.factor(Country)) |>
                        mutate(Backhand = as.factor(Backhand)) |>
                        mutate(Plays = as.factor(Plays)) |>
                        select(Age, Country, Plays, `Current Rank`, `Best Rank`, Backhand, `Height (cm)`, `Turned Pro`, Seasons, Titles, `Weight (kg)`)
                        
                    
processed_player_data

[1m[22mNew names:
[36m•[39m `` -> `...1`
[1mRows: [22m[34m500[39m [1mColumns: [22m[34m38[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (25): Age, Country, Plays, Wikipedia, Current Rank, Best Rank, Name, Bac...
[32mdbl[39m (13): ...1, Turned Pro, Seasons, Titles, Best Season, Retired, Masters, ...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Age,Country,Plays,Current Rank,Best Rank,Backhand,Height (cm),Turned Pro,Seasons,Titles,Weight (kg)
<dbl>,<fct>,<fct>,<dbl>,<dbl>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
26,Brazil,Right-handed,378,363,,,,,,
18,United Kingdom,Left-handed,326,316,Two-handed,,,,,
32,Slovakia,Right-handed,178,44,Two-handed,185,2005,14,,
21,"Korea, Republic of",Right-handed,236,130,Two-handed,,,2,,
27,Australia,Right-handed,183,17,Two-handed,193,2008,11,4,
22,Poland,Right-handed,31,31,Two-handed,,2015,5,1,
28,United States,Right-handed,307,213,Two-handed,,2010,1,,
21,"Taiwan, Province of China",Right-handed,232,229,Two-handed,,,1,,
25,Uzbekistan,Right-handed,417,253,Two-handed,,,5,,
20,Finland,Right-handed,104,104,Two-handed,,,3,,


In [None]:
# Summary for data
data_summary = data

data_summary

In [None]:
# Visualization for data
options(repr.plot.width=20, repr.plot.height=20)

data_plot = data |>
            ggplot(aes(x=..,y=..)) +
            geom_point() +
            xlab("x") +
            ylab("y") +
            ggTitle("x vs y") +
            theme(text = element_text(size=20))

data_plot

## Methods

### Relevant variables:
Predicted Variable: Best Rank
  
Predictors:
- Age
- Current Rank
- Height
- Year Turned pro
- Current ELO Rank
- Best ELO Rank
- Peak ELO Rating
- Weight
- Plays (Left-handed or Right-handed)
- Backhand (Two-Handed or One-Handed)
- Seasons Played
- Country


### Outlined procedure:
1. one
2. two
3. three
4. four
5. five

### Visualization:
We plan to graph a scatter + line plot to visualize our regression model.


## Expected Outcomes and Significance

### Expected findings:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam in sagittis velit. Pellentesque eget pretium ipsum. Proin sed dui neque. Nulla venenatis enim sit amet ipsum consequat, placerat euismod odio mattis. Nunc sed lobortis magna, sed ultricies justo. Etiam aliquet dictum metus non lobortis. Aenean sed suscipit arcu. Sed nunc tellus, condimentum id facilisis elementum, egestas ut libero. Donec sit amet elementum enim, eu luctus mi.v

### Relevancy of findings:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam in sagittis velit. Pellentesque eget pretium ipsum. Proin sed dui neque. Nulla venenatis enim sit amet ipsum consequat, placerat euismod odio mattis. Nunc sed lobortis magna, sed ultricies justo. Etiam aliquet dictum metus non lobortis. Aenean sed suscipit arcu. Sed nunc tellus, condimentum id facilisis elementum, egestas ut libero. Donec sit amet elementum enim, eu luctus mi.v

### Future studies:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam in sagittis velit. Pellentesque eget pretium ipsum. Proin sed dui neque. Nulla venenatis enim sit amet ipsum consequat, placerat euismod odio mattis. Nunc sed lobortis magna, sed ultricies justo. Etiam aliquet dictum metus non lobortis. Aenean sed suscipit arcu. Sed nunc tellus, condimentum id facilisis elementum, egestas ut libero. Donec sit amet elementum enim, eu luctus mi.v