<a href="https://colab.research.google.com/github/Dselph28/Evaluating-First-Round-Pitchers-vs.-Position-Players-with-Prospects/blob/main/R_Project_for_Baseball_Projecting_Japanese_Players_to_MLB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to the Project: NPB to MLB - Donovan Selph
The goal of this project is to analyze and project the performance of current Nippon Professional Baseball (NPB) players to determine their potential contributions to Major League Baseball (MLB). With a rich history of talented players transitioning from Japan to MLB, such as Shohei Ohtani, Ichiro Suzuki, and Yu Darvish, the ability to predict how NPB players’ statistical outputs translate to MLB environments is a valuable tool for scouts and analysts.

This coding project will involve:

1. Data Collection: Gathering player performance metrics from NPB, such as batting averages (BA), earned run averages (ERA), and other key statistics.
2. Data Normalization: Adjusting for league difficulty and ballpark factors to standardize NPB stats against MLB standards.
3. Model Development: Utilizing historical player transition data to build predictive models for projecting future MLB performance.
4. Visualization: Creating graphs and charts to clearly illustrate how NPB players might perform in the MLB based on their current metrics.
5. Insights: Generating actionable insights for baseball operations teams to evaluate the potential value of signing or drafting Japanese players.


In [None]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [None]:
if (!requireNamespace('pacman', quietly = TRUE)){
  install.packages('pacman')
}
pacman::p_load_current_gh("billpetti/baseballr")

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



curl         (6.0.1 -> 6.1.0   ) [CRAN]
RcppParallel (NA    -> 5.1.9   ) [CRAN]
parallelly   (NA    -> 1.41.0  ) [CRAN]
listenv      (NA    -> 0.9.1   ) [CRAN]
globals      (NA    -> 0.16.3  ) [CRAN]
plogr        (NA    -> 0.2.0   ) [CRAN]
plyr         (NA    -> 1.8.9   ) [CRAN]
BH           (NA    -> 1.87.0-1) [CRAN]
stringfish   (NA    -> 0.16.0  ) [CRAN]
RApiSeria... (NA    -> 0.1.4   ) [CRAN]
future       (NA    -> 1.34.0  ) [CRAN]
snakecase    (NA    -> 0.11.1  ) [CRAN]
zoo          (NA    -> 1.8-12  ) [CRAN]
RSQLite      (NA    -> 2.3.9   ) [CRAN]
reshape2     (NA    -> 1.4.4   ) [CRAN]
qs           (NA    -> 0.27.2  ) [CRAN]
progressr    (NA    -> 0.15.1  ) [CRAN]
ggrepel      (NA    -> 0.9.6   ) [CRAN]
furrr        (NA    -> 0.3.1   ) [CRAN]
janitor      (NA    -> 2.2.1   ) [CRAN]
[36m──[39m [36mR CMD build[39m [36m─────────────────────────────────────────────────────────────────[39m
* checking for file ‘/tmp/RtmpPeq6m7/remotesf7218dc13/BillPetti-baseballr-ec1af2f/DESCRIP

In [None]:
install.packages("rvest")
install.packages("dplyr")
install.packages("readr")

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



In [None]:
if (!require("caret")) install.packages("caret", repos = "http://cran.us.r-project.org")
if (!require("ggplot2")) install.packages("ggplot2", repos = "http://cran.us.r-project.org")
if (!require("httr")) install.packages("httr", repos = "http://cran.us.r-project.org")

Loading required package: caret

“there is no package called ‘caret’”
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘shape’, ‘future.apply’, ‘numDeriv’, ‘SQUAREM’, ‘diagram’, ‘lava’, ‘prodlim’, ‘proxy’, ‘iterators’, ‘clock’, ‘gower’, ‘hardhat’, ‘ipred’, ‘timeDate’, ‘e1071’, ‘foreach’, ‘ModelMetrics’, ‘pROC’, ‘recipes’


Loading required package: httr



In [None]:
library(tidyverse)
library(rvest)
library(httr)

# Example URL (you need to find actual data sources)
npb_data_url <- "https://www.baseball-reference.com/register/leader.cgi?type=bat&id=5e1f8b77"
mlb_data_url <- "https://www.mlb.com/stats/batting-average"

# Function to scrape data
get_player_data <- function(url) {
  webpage <- read_html(url)

# Adjust the selectors to your data source
player_stats <- webpage %>%
    html_nodes('table') %>%
    html_table()

  player_stats_df <- player_stats[[1]] # Assuming the first table is our desired one
  return(player_stats_df)
}

npb_stats <- get_player_data(npb_data_url)
mlb_stats <- get_player_data(mlb_data_url)


Attaching package: ‘rvest’


The following object is masked from ‘package:readr’:

    guess_encoding




We are going to take a few of the top hitters from the NPB Central League last season and project their "Win Shares" Total, which is a helpful projection to see what they are worth in the MLB based on a method developed by Jim Albright in a BaseballGuru.com article.

In [None]:
# Provided dataset
npb_stats <- tribble(
  ~Rk, ~Name, ~Age, ~Tm, ~Lev, ~Aff, ~G, ~PA, ~AB, ~R, ~H, ~`2B`, ~`3B`, ~HR, ~RBI, ~SB, ~CS, ~BB, ~SO, ~BA, ~OBP, ~SLG, ~OPS, ~TB, ~GDP, ~HBP, ~SH, ~SF, ~IBB,
  1, "Tyler Austin", 32, "YKO", "Fgn", NA, 106, 445, 396, 66, 125, 34, 2, 25, 69, 0, 1, 45, 88, 0.316, 0.382, 0.601, 0.983, 238, 12, 0, 0, 4, 2,
  2, "Domingo Santana", 31, "YKU", "Fgn", NA, 122, 484, 419, 57, 132, 29, 0, 17, 70, 2, 1, 57, 101, 0.315, 0.399, 0.506, 0.905, 212, 8, 4, 0, 4, 1,
  3, "Hiroki Fukunaga", 27, "CNI", "Fgn", NA, 111, 402, 363, 40, 111, 22, 2, 6, 32, 9, 3, 27, 82, 0.306, 0.362, 0.427, 0.789, 155, 4, 6, 4, 2, 1,
  4, "Shingo Usami", 31, "CNI", "Fgn", NA, 61, 164, 152, 7, 46, 8, 0, 3, 17, 0, 0, 9, 38, 0.303, 0.337, 0.415, 0.752, 63, 5, 0, 1, 2, 0,
  5, "Elier Hernandez", 29, "YOM", "Fgn", NA, 56, 240, 221, 34, 65, 11, 0, 8, 30, 0, 1, 15, 58, 0.294, 0.346, 0.453, 0.798, 100, 8, 3, 0, 1, 1
)

In [None]:
# Function to calculate Win Shares
calculate_win_shares <- function(df) {
  df <- df %>%
    mutate(
      Hits = H,
      Singles = H - `2B` - `3B` - HR,
      Walks = BB,
      Outs = AB - H,
      Win_Shares = round(0.5 * (Singles / 6 + `2B` / 4 + `3B` / 3 + HR / 2 + Walks / 9 - Outs / 36), 1)
    )
    return(df)
}

# Calculate Win Shares
npb_with_win_shares <- calculate_win_shares(npb_stats)

# View the data with calculated Win Shares
npb_with_win_shares %>%
  select(Name, G, PA, AB, R, H, `2B`, `3B`, HR, RBI, SB, CS, BB, SO, BA, OBP, SLG, OPS, TB, Win_Shares) %>%
  head()

Name,G,PA,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,BA,OBP,SLG,OPS,TB,Win_Shares
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Tyler Austin,106,445,396,66,125,34,2,25,69,0,1,45,88,0.316,0.382,0.601,0.983,238,14.9
Domingo Santana,122,484,419,57,132,29,0,17,70,2,1,57,101,0.315,0.399,0.506,0.905,212,14.2
Hiroki Fukunaga,111,402,363,40,111,22,2,6,32,9,3,27,82,0.306,0.362,0.427,0.789,155,9.3
Shingo Usami,61,164,152,7,46,8,0,3,17,0,0,9,38,0.303,0.337,0.415,0.752,63,3.7
Elier Hernandez,56,240,221,34,65,11,0,8,30,0,1,15,58,0.294,0.346,0.453,0.798,100,5.9


Now, I am going to do the same thing for the MLB hitters and compare them.

In [None]:
# Sample data for the top 5 MLB hitters
mlb_stats <- tribble(
  ~Rk, ~Name, ~Pos, ~Team, ~G, ~PA, ~H, ~`2B`, ~`3B`, ~HR, ~RBI, ~SB, ~BB, ~SO, ~CS, ~HBP, ~BA, ~OBP, ~SLG, ~OPS,
  1, "B Witt Jr.", "SS", "KC", 161, 636, 125, 211, 45, 11, 32, 109, 57, 106, 31, 12, 0.332, 0.389, 0.588, 0.977,
  2, "V Guerrero Jr.", "1B", "TOR", 159, 616, 98, 199, 44, 1, 30, 103, 72, 96, 2, 2, 0.323, 0.396, 0.544, 0.940,
  3, "A Judge", "CF", "NYY", 158, 559, 122, 180, 36, 1, 58, 144, 133, 171, 10, 0, 0.322, 0.458, 0.701, 1.159,
  4, "L Arraez", "1B", "SD", 150, 637, 83, 200, 32, 3, 4, 46, 24, 29, 9, 3, 0.314, 0.346, 0.392, 0.738,
  5, "S Ohtani", "DH", "LAD", 159, 636, 134, 197, 38, 7, 54, 130, 81, 162, 59, 4, 0.310, 0.390, 0.646, 1.036
)
mlb_stats

Rk,Name,Pos,Team,G,PA,H,2B,3B,HR,RBI,SB,BB,SO,CS,HBP,BA,OBP,SLG,OPS
<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,B Witt Jr.,SS,KC,161,636,125,211,45,11,32,109,57,106,31,12,0.332,0.389,0.588,0.977
2,V Guerrero Jr.,1B,TOR,159,616,98,199,44,1,30,103,72,96,2,2,0.323,0.396,0.544,0.94
3,A Judge,CF,NYY,158,559,122,180,36,1,58,144,133,171,10,0,0.322,0.458,0.701,1.159
4,L Arraez,1B,SD,150,637,83,200,32,3,4,46,24,29,9,3,0.314,0.346,0.392,0.738
5,S Ohtani,DH,LAD,159,636,134,197,38,7,54,130,81,162,59,4,0.31,0.39,0.646,1.036


In [None]:
# Function to calculate Win Shares and round to one decimal place
calculate_win_shares <- function(df) {
  df <- df %>%
    mutate(
      Hits = H,
      Singles = H - `2B` - `3B` - HR,
      Walks = BB,
      Outs = PA - H - BB - HBP,
      Win_Shares = round(0.5 * (Singles / 6 + `2B` / 4 + `3B` / 3 + HR / 2 + Walks / 9 - Outs / 36), 1)
    )
  return(df)
}

# Calculate Win Shares for MLB hitters
mlb_with_win_shares <- calculate_win_shares(mlb_stats)

In [None]:
# View the data with calculated and rounded Win Shares
mlb_with_win_shares %>%
  select(Name, Pos, Team, G, PA, H, `2B`, `3B`, HR, RBI, SB, BB, SO, CS, HBP, BA, OBP, SLG, OPS, Win_Shares) %>%
  head()

Name,Pos,Team,G,PA,H,2B,3B,HR,RBI,SB,BB,SO,CS,HBP,BA,OBP,SLG,OPS,Win_Shares
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
B Witt Jr.,SS,KC,161,636,125,211,45,11,32,109,57,106,31,12,0.332,0.389,0.588,0.977,21.8
V Guerrero Jr.,1B,TOR,159,616,98,199,44,1,30,103,72,96,2,2,0.323,0.396,0.544,0.94,18.1
A Judge,CF,NYY,158,559,122,180,36,1,58,144,133,171,10,0,0.322,0.458,0.701,1.159,24.0
L Arraez,1B,SD,150,637,83,200,32,3,4,46,24,29,9,3,0.314,0.346,0.392,0.738,12.4
S Ohtani,DH,LAD,159,636,134,197,38,7,54,130,81,162,59,4,0.31,0.39,0.646,1.036,22.4


This is the MLB win shares and the NPB win shares put together in the same data table to show the comparisons in the leagues.


In [None]:
# Standardize the columns for merging and remove unnecessary columns
npb_with_win_shares <- npb_with_win_shares %>%
  select(Name, G, PA, H, `2B`, `3B`, HR, RBI, SB, CS, BB, SO, BA, OBP, SLG, OPS, Win_Shares) %>%
  mutate(League = "NPB")

mlb_with_win_shares <- mlb_with_win_shares %>%
  select(Name, G, PA, H, `2B`, `3B`, HR, RBI, SB, CS, BB, SO, BA, OBP, SLG, OPS, Win_Shares) %>%
  mutate(League = "MLB")

combined_win_shares <- bind_rows(npb_with_win_shares, mlb_with_win_shares)

# View the combined data with Win Shares
combined_win_shares

Name,G,PA,H,2B,3B,HR,RBI,SB,CS,BB,SO,BA,OBP,SLG,OPS,Win_Shares,League
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
Tyler Austin,106,445,125,34,2,25,69,0,1,45,88,0.316,0.382,0.601,0.983,14.9,NPB
Domingo Santana,122,484,132,29,0,17,70,2,1,57,101,0.315,0.399,0.506,0.905,14.2,NPB
Hiroki Fukunaga,111,402,111,22,2,6,32,9,3,27,82,0.306,0.362,0.427,0.789,9.3,NPB
Shingo Usami,61,164,46,8,0,3,17,0,0,9,38,0.303,0.337,0.415,0.752,3.7,NPB
Elier Hernandez,56,240,65,11,0,8,30,0,1,15,58,0.294,0.346,0.453,0.798,5.9,NPB
B Witt Jr.,161,636,125,211,45,11,32,109,31,57,106,0.332,0.389,0.588,0.977,21.8,MLB
V Guerrero Jr.,159,616,98,199,44,1,30,103,2,72,96,0.323,0.396,0.544,0.94,18.1,MLB
A Judge,158,559,122,180,36,1,58,144,10,133,171,0.322,0.458,0.701,1.159,24.0,MLB
L Arraez,150,637,83,200,32,3,4,46,9,24,29,0.314,0.346,0.392,0.738,12.4,MLB
S Ohtani,159,636,134,197,38,7,54,130,59,81,162,0.31,0.39,0.646,1.036,22.4,MLB


According to Jim Albright of BaseballGuru.com, there is a conversion factor that can be used for BA, OBP, SLG, and other statistics. For example, he works on an article that projects one of the greatest japanese born players, Sadaharu Oh, that looks at his entire career and projects it to the MLB around a 160 game average, so most of the season. The conversion rates are shown below as a projection, and then you can see how many Win Shares are projected for that player in the MLB based on this past season.

In [None]:
# Project NPB to MLB conversion factors
conversion_factors <- list(
  BA = 0.904918,
  OBP = 0.903082,
  SLG = 0.743477
)

# Function to project NPB stats to MLB equivalents
project_npb_to_mlb <- function(df, conversion_factors) {
  df <- df %>%
    mutate(
      Projected_BA = round(BA * conversion_factors$BA, 3),
      Projected_OBP = round(OBP * conversion_factors$OBP, 3),
      Projected_SLG = round(SLG * conversion_factors$SLG, 3),
      Projected_OPS = round(Projected_OBP + Projected_SLG, 3)
    )
  return(df)
}

# Project NPB hitters' performance to MLB
npb_projected_to_mlb <- project_npb_to_mlb(npb_with_win_shares, conversion_factors)
npb_projected_to_mlb

Name,G,PA,H,2B,3B,HR,RBI,SB,CS,⋯,BA,OBP,SLG,OPS,Win_Shares,League,Projected_BA,Projected_OBP,Projected_SLG,Projected_OPS
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
Tyler Austin,106,445,125,34,2,25,69,0,1,⋯,0.316,0.382,0.601,0.983,14.9,NPB,0.286,0.345,0.447,0.792
Domingo Santana,122,484,132,29,0,17,70,2,1,⋯,0.315,0.399,0.506,0.905,14.2,NPB,0.285,0.36,0.376,0.736
Hiroki Fukunaga,111,402,111,22,2,6,32,9,3,⋯,0.306,0.362,0.427,0.789,9.3,NPB,0.277,0.327,0.317,0.644
Shingo Usami,61,164,46,8,0,3,17,0,0,⋯,0.303,0.337,0.415,0.752,3.7,NPB,0.274,0.304,0.309,0.613
Elier Hernandez,56,240,65,11,0,8,30,0,1,⋯,0.294,0.346,0.453,0.798,5.9,NPB,0.266,0.312,0.337,0.649


From the projections on Jim Albright's "Fun with the Oh Projections" article, for every 162 games Oh plays - the total amount played in the MLB - Oh averaged about 550 plate appearances, and with that average, we can calculate the amount of hits each player in the NPB would get in the MLB this past year.

In [None]:
# Function to project NPB stats to MLB equivalents w/ABs and Hits
project_npb_to_mlb <- function(df, conversion_factors) {
  df <- df %>%
    mutate(
      Projected_BA = round(BA * conversion_factors$BA, 3),
      Projected_OBP = round(OBP * conversion_factors$OBP, 3),
      Projected_SLG = round(SLG * conversion_factors$SLG, 3),
      Projected_OPS = round(Projected_OBP + Projected_SLG, 3),
      Projected_AB = 550,
      Projected_H = round(Projected_BA * Projected_AB)
    )
  return(df)
}
npb_projected_to_mlb <- project_npb_to_mlb(npb_with_win_shares, conversion_factors)
npb_projected_to_mlb

Name,G,PA,H,2B,3B,HR,RBI,SB,CS,⋯,SLG,OPS,Win_Shares,League,Projected_BA,Projected_OBP,Projected_SLG,Projected_OPS,Projected_AB,Projected_H
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Tyler Austin,106,445,125,34,2,25,69,0,1,⋯,0.601,0.983,14.9,NPB,0.286,0.345,0.447,0.792,550,157
Domingo Santana,122,484,132,29,0,17,70,2,1,⋯,0.506,0.905,14.2,NPB,0.285,0.36,0.376,0.736,550,157
Hiroki Fukunaga,111,402,111,22,2,6,32,9,3,⋯,0.427,0.789,9.3,NPB,0.277,0.327,0.317,0.644,550,152
Shingo Usami,61,164,46,8,0,3,17,0,0,⋯,0.415,0.752,3.7,NPB,0.274,0.304,0.309,0.613,550,151
Elier Hernandez,56,240,65,11,0,8,30,0,1,⋯,0.453,0.798,5.9,NPB,0.266,0.312,0.337,0.649,550,146


With these stats in mind, we are going to calculate the other values that contribute to Win Shares in the MLB to see how valuable these top hitters in the NPB could be in the MLB - excluding their actual peformance in the MLB before for some. We will calculate 2Bs, 3Bs, HRs, BBs, and then the Outs to get the Win Shares as a hitter.

We will use the fact that Oh's Projection for 2B's is around the same as they were in the NPB (when accounting for about 30-40 more games) which makes sense, so the 2Bs will stay the same. With the triples, Albright is taking into account the bigger fields and more opportunites to take the extra base, so for every season Oh has more than 1 triple, the 3Bs are multipled by about 2.71 times the amount in the NPB.

In [None]:
# Project NPB to MLB conversion factors
conversion_factors <- list(
  BA = 0.904918,
  OBP = 0.903082,
  SLG = 0.743477,
  Triples = 2.71333
)


# Function to project NPB stats to MLB equivalents w/2Bs and 3Bs
project_npb_to_mlb <- function(df, conversion_factors) {
  df <- df %>%
    mutate(
      Projected_BA = round(BA * conversion_factors$BA, 3),
      Projected_OBP = round(OBP * conversion_factors$OBP, 3),
      Projected_SLG = round(SLG * conversion_factors$SLG, 3),
      Projected_OPS = round(Projected_OBP + Projected_SLG, 3),
      Projected_AB = 550,
      Projected_H = round(Projected_BA * Projected_AB),
      Projected_2B = `2B`,
      Projected_3B = ifelse(`3B` == 0, 0, round(`3B` * conversion_factors$Triples))
    )
  return(df)
}
npb_projected_to_mlb <- project_npb_to_mlb(npb_with_win_shares, conversion_factors)
npb_projected_to_mlb

Name,G,PA,H,2B,3B,HR,RBI,SB,CS,⋯,Win_Shares,League,Projected_BA,Projected_OBP,Projected_SLG,Projected_OPS,Projected_AB,Projected_H,Projected_2B,Projected_3B
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Tyler Austin,106,445,125,34,2,25,69,0,1,⋯,14.9,NPB,0.286,0.345,0.447,0.792,550,157,34,5
Domingo Santana,122,484,132,29,0,17,70,2,1,⋯,14.2,NPB,0.285,0.36,0.376,0.736,550,157,29,0
Hiroki Fukunaga,111,402,111,22,2,6,32,9,3,⋯,9.3,NPB,0.277,0.327,0.317,0.644,550,152,22,5
Shingo Usami,61,164,46,8,0,3,17,0,0,⋯,3.7,NPB,0.274,0.304,0.309,0.613,550,151,8,0
Elier Hernandez,56,240,65,11,0,8,30,0,1,⋯,5.9,NPB,0.266,0.312,0.337,0.649,550,146,11,0


With the BBs, they are the relatively the same from league to league with more outs projected, but Albright has a projection system that has walks to be about 1.148 times the NPB number for the MLB, so we will use that. For Singles and Home Runs, Singles are going to be found with the total hits minus the rest of the hits in the Win Shares. The Home Runs are hard to project depending on where the player goes, but according to Jim Albright in his Cooperstown argument for Oh, "The overall totals are 23,817 matched at bats (for both NPB and MLB around the 70s and 80s), 575.0 major league home runs, and 1071.9 Central League homers. Thus, we will multiply Central League homers by 575.0/1071.9 or     0.536 to account for this difference." So Home Runs will be multipled by 0.536.

In [None]:
# Project NPB to MLB conversion factors
conversion_factors <- list(
  BA = 0.904918,
  OBP = 0.903082,
  SLG = 0.743477,
  Triples = 2.71333,
  HR = 0.536431,
  BB = 1.148
)


# Function to project NPB stats to MLB equivalents w/HRs and BBs
project_npb_to_mlb <- function(df, conversion_factors) {
  df <- df %>%
    mutate(
      Projected_BA = round(BA * conversion_factors$BA, 3),
      Projected_OBP = round(OBP * conversion_factors$OBP, 3),
      Projected_SLG = round(SLG * conversion_factors$SLG, 3),
      Projected_OPS = round(Projected_OBP + Projected_SLG, 3),
      Projected_AB = 550,
      Projected_H = round(Projected_BA * Projected_AB),
      Projected_2B = `2B`,
      Projected_3B = ifelse(`3B` == 0, 0, round(`3B` * conversion_factors$Triples)),
      Projected_HR = round(HR * conversion_factors$HR),
      Projected_BB = round(BB * conversion_factors$BB)
    )
  return(df)
}
npb_projected_to_mlb <- project_npb_to_mlb(npb_with_win_shares, conversion_factors)
npb_projected_to_mlb

Name,G,PA,H,2B,3B,HR,RBI,SB,CS,⋯,Projected_BA,Projected_OBP,Projected_SLG,Projected_OPS,Projected_AB,Projected_H,Projected_2B,Projected_3B,Projected_HR,Projected_BB
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Tyler Austin,106,445,125,34,2,25,69,0,1,⋯,0.286,0.345,0.447,0.792,550,157,34,5,13,52
Domingo Santana,122,484,132,29,0,17,70,2,1,⋯,0.285,0.36,0.376,0.736,550,157,29,0,9,65
Hiroki Fukunaga,111,402,111,22,2,6,32,9,3,⋯,0.277,0.327,0.317,0.644,550,152,22,5,3,31
Shingo Usami,61,164,46,8,0,3,17,0,0,⋯,0.274,0.304,0.309,0.613,550,151,8,0,2,10
Elier Hernandez,56,240,65,11,0,8,30,0,1,⋯,0.266,0.312,0.337,0.649,550,146,11,0,4,17


Now, we can take the calculation of Win Shares Above and project what these players would produce in the MLB with these projected statistics.

In [30]:
# Function to calculate Win Shares from NPB to MLB
calculate_win_shares_to_mlb <- function(df2) {
  df2 <- df2 %>%
    mutate(
      Hits = Projected_H,
      Singles = Projected_H - Projected_2B - Projected_3B - Projected_HR,
      Walks = Projected_BB,
      Outs = Projected_AB - Projected_H,
      Projected_Win_Shares = round(0.5 * (Singles / 6 + Projected_2B / 4 + Projected_3B / 3 + Projected_HR / 2 + Walks / 9 - Outs / 36), 1)
    )
    return(df2)
}

# Calculate Win Shares
npb_with_win_shares_to_mlb <- calculate_win_shares_to_mlb(npb_projected_to_mlb)

# View the data with calculated Win Shares
npb_with_win_shares_to_mlb %>%
  select(Name, Projected_AB, Projected_H, Projected_2B, Projected_3B, Projected_HR, Projected_BB, Projected_BA, Projected_OBP, Projected_SLG, Projected_OPS, Projected_Win_Shares) %>%
  head()

Name,Projected_AB,Projected_H,Projected_2B,Projected_3B,Projected_HR,Projected_BB,Projected_BA,Projected_OBP,Projected_SLG,Projected_OPS,Projected_Win_Shares
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Tyler Austin,550,157,34,5,13,52,0.286,0.345,0.447,0.792,14.5
Domingo Santana,550,157,29,0,9,65,0.285,0.36,0.376,0.736,13.9
Hiroki Fukunaga,550,152,22,5,3,31,0.277,0.327,0.317,0.644,10.7
Shingo Usami,550,151,8,0,2,10,0.274,0.304,0.309,0.613,8.3
Elier Hernandez,550,146,11,0,4,17,0.266,0.312,0.337,0.649,8.6


These Win Shares show the players value over an entire season in the MLB, while for some of these players, their season in the NPB were much lower so their Win Shares were only for that part of the season, kind of like how War is projected over a small part of the season and the whole season. Taking into account these projections, it is clear that while they were some of the best hitters in the NPB, in the MLB, it seems that the best hitters, like Tyler Austin, Domingo Santana, and Hiroki Fukunaga, would compare more to the production and WAR total of Luis Arraez rather than the best hitters in the league like Aaron Judge or Bobby Witt Jr.