## Group Proposal

**Predicting whether an NBA player will make it to the All-Star team based on their performance statistics in the 2023 season using the 2021-2022 seasons.**


For our group proposal, we want to predict which NBA players will make it to the All-Star selections based on their performance. The NBA (National Basketball Association) is a professional basketball league in North America that features 30 teams. Each team consists of players who compete against each other in regular season games, with the goal of making it to the playoffs and eventually winning the NBA championship.

The NBA All-Star Game is an annual exhibition game that features the best players from each conference. The players are selected based on a combination of fan, player, and media voting. Being selected to the All-Star team is a significant accomplishment for NBA players and is often seen as a sign of their individual success and impact on the league.

The NBA tracks various performance statistics for each player, including points per game, rebounds per game, assists per game, field goal percentage, and many others. These statistics are used to evaluate a player's performance and value to their team.

In recent years, there has been an increasing interest in using machine learning and data analysis techniques to predict various outcomes in sports, including player performance, team success, and player awards such as All-Star selections. The NBA 2021-2022 per-game statistics dataset provides a rich source of data that can be used to build predictive models for various outcomes, including predicting whether a player will make it to the All-Star team in the 2023 season. 


In [1]:
### Loading of base packages manually 

library(datasets)   # loading of library datasets
library(tidyverse)  # loading of tidyverse library
library(tidymodels)
library(tibble)
library(httr)
options(repr.matrix.max.rows = 6)
# source('tests.R')

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.0     [32m✔[39m [34mrsample     [39m 1.0.0
[32m✔[39m [34mdials       [39m 1.0.0     [32m✔[39m [34mtune        [39m 1.0.0
[32m✔[39m [34minfer       [39m 1.0.2     [32m✔[39m [34mworkflows   [39m 1.0.0
[32m✔

In [2]:
# Read the data from website
# URL of the team statistics page for the 2021-2022 NBA season and 2022-2023 NBA season.
# data pulled from: url1 <- "https://www.basketball-reference.com/leagues/NBA_2023_per_game.html"
#                   url2 <- "https://www.basketball-reference.com/leagues/NBA_2022_per_game.html"
basket_2023 <- read_csv("https://raw.githubusercontent.com/DrakenRaptor/Section006-28-Proposal/main/NBA%202022-2023(1).csv")
basket_2023

basket_2022 <- read_csv("https://raw.githubusercontent.com/DrakenRaptor/Section006-28-Proposal/main/2021-2022%20NBA.csv")
basket_2022

[1mRows: [22m[34m644[39m [1mColumns: [22m[34m31[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (4): Player, Pos, Tm, Player-additional\
[32mdbl[39m (27): Rk, Age, G, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, 2P, 2PA, 2P%, eFG%...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,⋯,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Player-additional\
<dbl>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,Precious Achiuwa,C,23,TOR,42,11,947,156,325,⋯,81,184,265,42,26,26,49,85,408,achiupr01\
2,Steven Adams,C,29,MEM,42,42,1133,157,263,⋯,214,271,485,97,36,46,79,98,361,adamsst01\
3,Bam Adebayo,C,25,MIA,61,61,2137,509,943,⋯,154,434,588,200,75,50,153,172,1295,adebaba01\
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
514,Trae Young,PG,24,ATL,58,58,2037,486,1137,⋯,44,129,173,585,65,9,237,86,1545,youngtr01\
515,Cody Zeller,C,30,MIA,7,0,96,14,22,⋯,11,8,19,5,1,3,5,17,42,zelleco01\
516,Ivica Zubac,C,25,LAC,61,61,1780,245,396,⋯,203,413,616,65,24,81,104,179,620,zubaciv01}


[1mRows: [22m[34m813[39m [1mColumns: [22m[34m31[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (5): Sn, Player, Pos, Tm, Player-additional\
[32mdbl[39m (26): Age, G, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, 2P, 2PA, 2P%, eFG%, FT...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Sn,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,⋯,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Player-additional\
<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,Precious Achiuwa,C,22,TOR,73,28,1725,265,603,⋯,146,327,473,82,37,41,84,151,664,achiupr01\
2,Steven Adams,C,28,MEM,76,75,1999,210,384,⋯,349,411,760,256,65,60,115,153,528,adamsst01\
3,Bam Adebayo,C,24,MIA,56,56,1825,406,729,⋯,137,427,564,190,80,44,148,171,1068,adebaba01\
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
604,Cody Zeller,C,29,POR,27,0,355,51,90,⋯,50,75,125,22,8,6,19,56,140,zelleco01\
605,Ivica Zubac,C,24,LAC,76,76,1852,310,495,⋯,217,427,644,120,36,77,114,203,785,zubaciv01\
},,,,,,,,,,⋯,,,,,,,,,,


In [3]:
# Select the GS(Game Started), eFG%(Effective Field Goal Precentage), and PTS (Points Per Game) to be the three variables we will focused on in this assignment
selected_2023 <- basket_2023 |>
                 select(Player, GS, "eFG%", PTS) 
selected_2023

selected_2022 <- basket_2022 |>
                 select(Player, GS, "eFG%", PTS) 
selected_2022

Player,GS,eFG%,PTS
<chr>,<dbl>,<dbl>,<dbl>
Precious Achiuwa,11,0.512,408
Steven Adams,42,0.597,361
Bam Adebayo,61,0.540,1295
⋮,⋮,⋮,⋮
Trae Young,58,0.482,1545
Cody Zeller,0,0.636,42
Ivica Zubac,61,0.619,620


Player,GS,eFG%,PTS
<chr>,<dbl>,<dbl>,<dbl>
Precious Achiuwa,28,0.486,664
Steven Adams,75,0.547,528
Bam Adebayo,56,0.557,1068
⋮,⋮,⋮,⋮
Cody Zeller,0,0.567,140
Ivica Zubac,76,0.626,785
,,,


**Given that the datsets consists of various rows, there are various data that are not require for our analysis. Hence, we used select functions in R, to manipulate to give us 4 columns that we required: the Player name, GS, eFG%, Points as shown above.**

*The code as shown in the next cell was manually created to filter the names.* 

In [4]:
# Create a vector for basketball player that is all-star in 2023
all_stars_2023 <- c("Kyrie Irving", "Donovan Mitchell", "Giannis Antetokounmpo", "Kevin Durant", "Jayson Tatum", "Jaylen Brown", 
                  "DeMar DeRozan", "Tyrese Haliburton", "Jrue Holiday", "Julius Randle", "Bam Adebayo", "Joel Embiid", "Pascal Siakam", 
                  "Stephen Curry", "Luka Dončić", "Nikola Jokić", "Lebron James", "Zion Williamson", "Shai Gilgeous-Alexander", "Damian Lillard", 
                  "Ja Morant", "Paul George", "Jaren Jackson Jr.", "Lauri Markkanen", "Domantas Sabonis","Anthony Edwards", "De'Aaron Fox")
all_stars_2023

# Create a vector for basketball player that is all-star in 2022
all_stars_2022 <- c("Stephen Curry", "LeBron James", "Giannis Antetokounmpo", "DeMar DeRozan", "Nikola Jokić", "Luka Dončić", 
                    "Darius Garland", "Jarrett Allen", "Fred VanVleet", "Jimmy Butler", "Chris Paul", "Joel Embiid", "Jayson Tatum", 
                    "Trae Young", "Ja Morant", "Andrew Wiggins", "Devin Booker", "Dejounte Murray", "LaMelo Ball", "Khris Middleton", 
                    "Karl-Anthony Towns", "Rudy Gobert", "Zach LaVine")         
all_stars_2022

# Mutate a new column to determine the player is all-star or not.
basket_2023_mutate <- selected_2023 |>
    mutate(all_star = "No")

basket_2022_mutate <- selected_2022 |>
    mutate(all_star = "No")

print(basket_2023_mutate, n = 10)
print(basket_2022_mutate, n = 10)

[90m# A tibble: 644 × 5[39m
   Player                      GS `eFG%`   PTS all_star
   [3m[90m<chr>[39m[23m                    [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   
[90m 1[39m Precious Achiuwa            11  0.512   408 No      
[90m 2[39m Steven Adams                42  0.597   361 No      
[90m 3[39m Bam Adebayo                 61  0.54   [4m1[24m295 No      
[90m 4[39m Ochai Agbaji                 6  0.558   232 No      
[90m 5[39m Santi Aldama                18  0.567   562 No      
[90m 6[39m Nickeil Alexander-Walker     3  0.582   287 No      
[90m 7[39m Nickeil Alexander-Walker     3  0.591   228 No      
[90m 8[39m Nickeil Alexander-Walker     0  0.55     59 No      
[90m 9[39m Grayson Allen               59  0.581   648 No      
[90m10[39m Jarrett Allen               60  0.651   869 No      
[90m# … with 634 more rows[39m
[90m# A tibble: 813 × 5[39m
   Player                      

In [5]:
# Change the value of the all-stars column for all-star players in 2023 to true
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Kyrie Irving"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Donovan Mitchell"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Giannis Antetokounmpo"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Kevin Duran"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Jayson Tatum"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Jaylen Brown"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "DeMar DeRozan"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Tyrese Haliburton"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Jrue Holiday"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Julius Randle"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Bam Adebayo"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Joel Embiid"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Pascal Siakam"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Stephen Curry"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Luka Dončić"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Nikola Jokić"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Lebron James"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Zion Williamson"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Shai Gilgeous-Alexander"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "De'Aaron Fox"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Anthony Edwards"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Domantas Sabonis"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Lauri Markkanen"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Jaren Jackson Jr."] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Paul George"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Ja Morant"] = "Yes"
basket_2023_mutate$all_star[basket_2023_mutate$Player == "Damian Lillard"] = "Yes"
print(basket_2023_mutate, n = 10)

[90m# A tibble: 644 × 5[39m
   Player                      GS `eFG%`   PTS all_star
   [3m[90m<chr>[39m[23m                    [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   
[90m 1[39m Precious Achiuwa            11  0.512   408 No      
[90m 2[39m Steven Adams                42  0.597   361 No      
[90m 3[39m Bam Adebayo                 61  0.54   [4m1[24m295 Yes     
[90m 4[39m Ochai Agbaji                 6  0.558   232 No      
[90m 5[39m Santi Aldama                18  0.567   562 No      
[90m 6[39m Nickeil Alexander-Walker     3  0.582   287 No      
[90m 7[39m Nickeil Alexander-Walker     3  0.591   228 No      
[90m 8[39m Nickeil Alexander-Walker     0  0.55     59 No      
[90m 9[39m Grayson Allen               59  0.581   648 No      
[90m10[39m Jarrett Allen               60  0.651   869 No      
[90m# … with 634 more rows[39m


In [6]:
# Change the value of the all-stars column for all-star players in 2022 to true
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Stephen Curry"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "LeBron James"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Giannis Antetokounmpo"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "DeMar DeRozan"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Nikola Jokić"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Luka Dončić"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Darius Garland"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Jarrett Allen"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Fred VanVleet"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Jimmy Butler"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Chris Paul"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Joel Embiid"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Jayson Tatum"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Trae Young"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Ja Morant"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Andrew Wiggins"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Devin Booker"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Dejounte Murray"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "LaMelo Ball"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Khris Middleton"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Karl-Anthony Towns"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Rudy Gobert"] = "Yes"
basket_2022_mutate$all_star[basket_2022_mutate$Player == "Zach LaVine"] = "Yes"

print(basket_2022_mutate, n = 10)

[90m# A tibble: 813 × 5[39m
   Player                      GS `eFG%`   PTS all_star
   [3m[90m<chr>[39m[23m                    [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   
[90m 1[39m Precious Achiuwa            28  0.486   664 No      
[90m 2[39m Steven Adams                75  0.547   528 No      
[90m 3[39m Bam Adebayo                 56  0.557  [4m1[24m068 No      
[90m 4[39m Santi Aldama                 0  0.424   132 No      
[90m 5[39m LaMarcus Aldridge           12  0.566   607 No      
[90m 6[39m Nickeil Alexander-Walker    21  0.449   692 No      
[90m 7[39m Nickeil Alexander-Walker    19  0.45    639 No      
[90m 8[39m Nickeil Alexander-Walker     2  0.438    53 No      
[90m 9[39m Grayson Allen               61  0.588   733 No      
[90m10[39m Jarrett Allen               56  0.678   904 Yes     
[90m# … with 803 more rows[39m


In [7]:
# Identify duplicated rows based on column x
duplicated_player_2023 <- duplicated(basket_2023_mutate$Player) | duplicated(basket_2023_mutate$Player, fromLast = TRUE)
duplicated_player_2022 <- duplicated(basket_2022_mutate$Player) | duplicated(basket_2022_mutate$Player, fromLast = TRUE)
# Exclude duplicated rows
basket_2023_np <- subset(basket_2023_mutate, !duplicated_player_2023)
basket_2022_np <- subset(basket_2022_mutate, !duplicated_player_2022)
# table(df_new$Player)
table(basket_2023_np$all_star)
table(basket_2022_np$all_star)


 No Yes 
428  24 


 No Yes 
486  23 

In [8]:
# Calculate the propotion between all-star player and normal player
prop.table(table(basket_2023_np$all_star))
prop.table(table(basket_2022_np$all_star))
table(basket_2023_np$all_star)
table(basket_2022_np$all_star)


        No        Yes 
0.94690265 0.05309735 


        No        Yes 
0.95481336 0.04518664 


 No Yes 
428  24 


 No Yes 
486  23 

In [12]:
asfactor_all_star <- basket_2023_mutate |>
                     mutate(all_star = as_factor(all_star))
asfactor_all_star

Player,GS,eFG%,PTS,all_star
<chr>,<dbl>,<dbl>,<dbl>,<fct>
Precious Achiuwa,11,0.512,408,No
Steven Adams,42,0.597,361,No
Bam Adebayo,61,0.540,1295,Yes
⋮,⋮,⋮,⋮,⋮
Trae Young,58,0.482,1545,No
Cody Zeller,0,0.636,42,No
Ivica Zubac,61,0.619,620,No


In [None]:
# new_observation <- nearest_neighbor(weight_func = "rectangular", neighbors = 10) |>
#         set_engine("kknn") |>
#        set_mode("classification")

# new_observation

# new_observation_recipe <- recipe(Player ~ PTS + GS, data = basket_2023_mutate)   # Determine if data is basketball_player_2023_mutate or all_stars_2022
#     # step_scale(all_predictors()) |>
#     # step_center(all_predictors())

# new_observation_recipe

# new_observation_fit <- workflow() |>
#     add_recipe(new_observation_recipe) |>
#     add_model(new_observation) |>
#     fit(data = basket_2023_mutate)

# new_observation_fit




#dont need this stuff now 

In [14]:
split <- initial_split(asfactor_all_star, prop = 0.75, strata = all_star) 
new_obs_training <- training(split)
new_obs_training
new_obs_testing <- testing(split)
new_obs_testing

Player,GS,eFG%,PTS,all_star
<chr>,<dbl>,<dbl>,<dbl>,<fct>
Precious Achiuwa,11,0.512,408,No
Steven Adams,42,0.597,361,No
Bam Adebayo,61,0.540,1295,Yes
⋮,⋮,⋮,⋮,⋮
McKinley Wright IV,1,0.510,57,No
Trae Young,58,0.482,1545,No
Cody Zeller,0,0.636,42,No


Player,GS,eFG%,PTS,all_star
<chr>,<dbl>,<dbl>,<dbl>,<fct>
Jarrett Allen,60,0.651,869,No
Giannis Antetokounmpo,52,0.559,1621,Yes
Thanasis Antetokounmpo,0,0.273,16,No
⋮,⋮,⋮,⋮,⋮
Delon Wright,8,0.561,256,No
Thaddeus Young,9,0.566,240,No
Ivica Zubac,61,0.619,620,No


In [None]:
options(repr.plot.width = 15, repr.plot.height = 15) 

position_plot <- all_stars_2022 |>
    ggplot(aes(x = PTS, y = `eFG%`, color = Player)) +
    geom_point() +
    labs(x = "Points", y = "eFG%", color = "Position", title = "Points vs Position") +
    theme(text = element_text(size = 20))

position_plot

In [None]:
options(repr.plot.width = 15, repr.plot.height = 15) 

rk_plot <- page |>
    ggplot(aes(x = PTS, y = GS)) +
    geom_point(alpha = 0.8) +
    labs(x = "Points", y = "GS", color = "Position", title = "Age vs Pos") +
    theme(text = element_text(size = 20))

rk_plot

Rk, PTS, GS, FG%
<font color = "blue"> Our research will be predicting whether an NBA player will make it to the All-Star Team based on their performance statisitics in the 2023 season. Based Based on 2021-2022 years dataset, mutate a new column all star(binary, yes or no), select three variables, make the predictive model, split to train-test dataset, verify accuracy, retest on 2023 data
</font>

In [None]:
# new_observation <- nearest_neighbor(weight_func = "rectangular", neighbors = 10) |>
#         set_engine("kknn") |>
#        set_mode("classification")

# new_observation

# new_observation_recipe <- recipe(Pos ~ Rk + GS, data = basketball_player_2023) |>
#     step_scale(all_predictors()) |>
#     step_center(all_predictors())

# new_observation_recipe

# new_observation_fit <- workflow() |>
#     add_recipe(new_observation_recipe) |>
#     add_model(new_observation) |>
#     fit(data = basketball_player_2023)

# new_observation_fit

Our research question is to ...

The scatterplot as shown above shows the relationship between sepal length and petal length which is distinguished through its species Iris.