# Using with `pybaseball`

[`pybaseball`](https://github.com/jldbc/pybaseball) is a very useful baseball data library that scraps Baseball Reference, Baseball Savant, and FanGraphs. The library can be used to retrieve statcast data, pitching stats, batting stats, division standings/team records, awards data, and more. This notebook will show you how easy it is to integrate with `diamondfp` fingerprints with the library. 

In [1]:
from pybaseball import batting_stats
from diamondfp.fingerprints import binaryfp
from diamondfp.utils.features import generate_quantiles
from diamondfp.scoring import tanimoto

## Calling `pybaseball`

For here, we will just be looking at batting stats but this can be used for pitching and fielding stats as well. We will also be focusing on just the current 2025 season, but this could be used to calculate similartiy scores between players of multiple season (e.g. compare the last 5 years for Aaron Judge and see if there are any outlier seasons).

In [2]:
df = batting_stats(2025)

## Generating Feature Quantiles

`pybaseball` scraping from Baseball Savant gives us access to more advanced stats like wOBA, Barrel%, Exit Velocity, and Launch Angle that more historical records do not have or have as easily available (does have to be 2015 season to present for certain stats).

In [3]:
stat_features = {
    "HR": [0.9, 0.99],
    "K%": [0.1, 0.25],
    "BB%": [0.75, 0.99],
    "AVG": [0.5, 0.75, 0.9, 0.95],
    "OBP": [0.5, 0.75, 0.9, 0.95],
    "SLG": [0.5, 0.75, 0.9, 0.95],
    "OPS": [0.5, 0.75, 0.9, 0.95],
    "wOBA": [0.5, 0.75, 0.9, 0.95],
    "Barrel%": [0.5, 0.75, 0.9, 0.95],
    "EV": [0.5, 0.75, 0.9, 0.95], # exit velocity
    "LA": [0.5, 0.75, 0.9, 0.95], # launch angle
}

feat_quants = generate_quantiles(df, stat_features)
df['Fingerprint'] = df.apply(lambda x: binaryfp(x, feat_quants), axis=1)

# Comparing Two Players

For this example notebook, let's just compare the seasons that Cal Raleigh and Aaron Judge have been having.

In [4]:
cal_raleigh = df.Fingerprint[df["Name"] == "Cal Raleigh"].iloc[0]
aaron_judge = df.Fingerprint[df["Name"] == "Aaron Judge"].iloc[0]
sim_score = tanimoto(cal_raleigh, aaron_judge)
print(f"Tanimoto Similarity: {sim_score:0.2f}")  

Tanimoto Similarity: 0.68


## Wrap Up

There we go! It is pretty easy to incorporate `pybaseball` with `diamondfp` and allows for us to have larger, more sophisticated fingeprints fairly readily. 