In [None]:
%%capture
!pip install anthropic

In [None]:
import anthropic, re
from google.colab import userdata

# API Key here!
ANTHROPIC_API_KEY = "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"
# Using Claude Opus, Anthropic's most powerful model. State-of-the-art LLM as of 03/31/24.
MODEL_NAME = "claude-3-opus-20240229"
# "claude-3-haiku-20240307", "claude-3-sonnet-20240229", and "claude-3-opus-20240229"
# where Haiku is the cheapest and Opus is the most powerful/expensive

# Running this for one player with Haiku (approximately) takes 35 seconds and costs $0.005
# Running this for one player with Opus (approximately) takes 205 seconds and costs $0.30

CLIENT = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

## Files we're working with

### `examples.csv`

- **Row 1** (topmost row): The name of each category.
- **Row 2**: Name of the player whose review is an example of a 0.0 for this category.
- **Row 3**: Name of the player whose review is an example of a 0.5 for this category.
- **Row 4**: Name of the player whose review is an example of a 1.0 for this category.

### `reviews.csv`

- **Column A** (leftmost column): Name of the player.
- **Column B**: Review of the player.

## Making the files workable

1. **Save `reviews.csv` as a dictionary called `reviews_dict`, where:**
   - The keys are from Column A.
   - The values are from Column B.

2. **Create a dictionary called `examples_dict`, where:**
   - The keys are the names of each category.
   - The values are a list of length 3, with each index containing:
     - **Index 0**: Review of the player whose name was given as an example of a 0.0 in this category.
     - **Index 1**: Review of the player whose name was given as an example of a 0.5 in this category.
     - **Index 2**: Review of the player whose name was given as an example of a 1.0 in this category.

To do Step 2, we search the name in the keys of `reviews_dict` and get the corresponding value (review).

In [None]:
import pandas as pd
from pandas import DataFrame

# Loading total reviews.csv into a dictionary
reviews_df = pd.read_csv('reviews.csv', header = None)
reviews_dict = pd.Series(reviews_df.iloc[:, 1].values,index=reviews_df.iloc[:, 0]).to_dict()
print(len(reviews_dict))

# Loading desired testing reviews.csv into a dictionary
reviews1718cont_df = pd.read_csv('reviews1920updated.csv', header = None)
reviews1718cont_dict = pd.Series(reviews1718cont_df.iloc[:, 1].values,index=reviews1718cont_df.iloc[:, 0]).to_dict()
print(len(reviews1718cont_dict))

# Loading the categories_examples.csv
categories_examples_df = pd.read_csv('examples.csv')
categories = categories_examples_df.columns.tolist()

# Creating the desired dictionary
examples_dict = {}
for category in categories:
    # Extracting the player names for 0.0, 0.5, and 1.0 examples
    player_names = categories_examples_df[category].tolist()
    # Getting the corresponding reviews from the reviews dictionary
    reviews_list = [reviews_dict.get(name, "Review not found") for name in player_names]
    examples_dict[category] = reviews_list

137
3


In [None]:
# Check that reviews looks right
reviews_dict["Lonzo Ball"]

"Ball is ranked in the top three by virtually every team in the league, and a handful of lottery teams have him No. 1 on their boards. He's one of the best passers to come into the draft in a decade. He's also a good athlete with deep range on his jumper who plays with an edge that teams love. His unorthodox jumper and so-so defensive effort are issues, but Ball's ability to make others around him better makes him an elite prospect.\n"

In [None]:
# Check that examples looks right
examples_dict["Athleticism"]

["Scouts and executives half-jokingly deemed Merrill the Luka Doncic of the Mountain West for his step-back 3s, cerebral game and ability to get to all of his spots by way of deception, forceful change of direction and strength. Although clearly not in the same stratosphere as the 6-8 Doncic, Merrill did show that same type of clutch gene as a primary shot creator against both SDSU and New Mexico, never getting rattled or sped up, playing at his own pace and drilling a handful of off-the-dribble 3s from well beyond NBA distance. While not the most creative live-dribble passer, he's more than capable of running the show for stretches, seeing over the top of the defense and making the right read, especially with teams having to fight over ball screens because of his shooting.\nWith short arms, an undefined frame and less than stellar run-and-jump athleticism, Merrill will still have some skeptics in NBA circles. He has had some trouble containing more explosive perimeter players over his

In [None]:
# Check that reviews looks right
reviews1718cont_dict["Jason Preston"]

"One of the best passers in the college game, Jason Preston led Ohio to the second round of the NCAA tournament with an upset win over Virginia and then proceeded to help himself even further with a strong showing at the NBA draft combine. He has excellent size and length for a guard, which helps compensate somewhat for his skinny frame and lack of speed and explosiveness, things that his detractors point to as issues.\n\nHe'll have to work through translating his outstanding feel for the game from college to the NBA ranks, his inconsistent jumper, and at times his porous defense. Preston has some of the best basketball instincts of any player in this class and is capable of making every read a point guard needs operating out of pick-and-roll. With the dearth of true point guards this size, Preston is likely a more consistent jumper away from carving out a niche in the NBA as he brings several winning intangibles to the table that teams covet with his unselfishness and overall basketba

## Feeding Data into the LLM

Now, we use the data from `reviews_dict` and `examples_dict` to generate prompts for the language model (LLM) as follows:

1. **Iterate through every key (player name) in `reviews_dict`. For each key:**
   - `{TEXT}` = The key's value (the player's review).

2. **Then, for each player, iterate through every key (category) in `examples_dict` and its corresponding values (examples):**
   - `{CATEGORY}` = Key from `examples_dict`.
   - `{00_EXAMPLE}` = 0th index value from the list of examples in `examples_dict`.
   - `{05_EXAMPLE}` = 1st index value from the list of examples in `examples_dict`.
   - `{10_EXAMPLE}` = 2nd index value from the list of examples in `examples_dict`.

This process involves evaluating each player across every category by running the LLM for each scenario.

### Writing to `results_dict`

We store the outcomes in a dictionary named `results_dict`, structured as:
- **Key**: Player name.
- **Value**: List of dictionaries, where each item contains:
  - **Key**: Category name.
  - **Value**: List of length 3, with each index containing:
    - **Index 0**: Raw output from the LLM.
    - **Index 1**: Text within the `<reasoning>` tags of the LLM output, or null if tags weren't found.
    - **Index 2**: Text within the `<rating>` tags of the LLM output, or null if tags weren't found.

In [None]:
prompt = '''
Here is the text of a qualitative review of a basketball player:

<review>
{TEXT}
</review>

Your task is to analyze the review text above and output a quantitative rating for the player in the
following category: {CATEGORY}

The rating should be a decimal between 0 and 1, based on this scale:
0 = The review describes the player as terrible in this category
0.5 = The review describes the player as neutral in this category
1 = The review describes the player as phenomenal in this category
Output NA if the category is not captured in the review.

To help calibrate your rating scale, here are some example reviews and their ratings in the
{CATEGORY} category:

<example_00>
{EXAMPLE_00}
</example_00>
This review corresponds to a 0 rating in the {CATEGORY} category.

<example_05>
{EXAMPLE_05}
</example_05>
This review corresponds to a 0.5 rating in the {CATEGORY} category.

<example_10>
{EXAMPLE_10}
</example_10>
This review corresponds to a 1 rating in the {CATEGORY} category.

Carefully read the review text provided, and compare it to the example reviews to determine the most
appropriate rating in the {CATEGORY} category for this player.

First, write your reasoning for the rating you will give inside <reasoning> tags. Explain how the
review compares to the 0, 0.5 and 1 example reviews in terms of what it says about the player's
abilities in the {CATEGORY} category.

Then, output your final quantitative rating inside <rating> tags. This should be a decimal between 0
and 1, or NA if the category is not captured in the review. Make sure your rating is well-calibrated
to the 0, 0.5 and 1 example reviews provided.
'''

In [None]:
def extract_between_tags(tag: str, string: str, strip: bool = False) -> list[str]:
    ext_list = re.findall(f"<{tag}>(.+?)</{tag}>", string, re.DOTALL)
    if strip:
        ext_list = [e.strip() for e in ext_list]
    return ext_list

In [None]:
# Initialize results storage
results_dict = {}

# For tracking progress on run
counter = 0

# Change to whatever smaller dictionary you created
for player, review in reviews1718cont_dict.items():
    print(str(counter) + " " + player)
    print()
    results_dict[player] = []
    for category, examples in examples_dict.items():
        # Replace variables in prompt
        prompt_with_variables = prompt.format(
            TEXT=review,
            CATEGORY=category,
            EXAMPLE_00=examples[0],
            EXAMPLE_05=examples[1],
            EXAMPLE_10=examples[2]
        )
        # Execute the LLM call
        llm_output = CLIENT.messages.create(
            model=MODEL_NAME,
            max_tokens=4096,
            messages=[
                {
                    "role": "user",
                    "content":  prompt_with_variables
                },
            ],
        ).content[0].text
        # Extract reasoning and rating
        reasoning = extract_between_tags("reasoning", llm_output, strip=True)
        rating = extract_between_tags("rating", llm_output, strip=True)
        # Store results
        results_dict[player].append({
            category: [
                llm_output,  # Raw LLM output
                reasoning[0] if reasoning else None,  # Extracted reasoning
                rating[0] if rating else None,  # Extracted rating
            ]
        })
    counter+=1

0 Filip Petrusev

1 Jason Preston



TypeError: can only concatenate str (not "float") to str

In [None]:
# For checking correct outcome for a player (Pick a player in the dictionary you just tested)
chandler = results_dict['Chandler Hutchison']

for category_dict in chandler:
    for category, details in category_dict.items():
        # Extract reasoning and rating, assuming the reasoning is the second item in the list and rating is the third
        reasoning, rating = details[1], details[2]
        print(f"Category: {category}\nRating: {rating}\nReasoning: {reasoning}\n")

### Saving to CSV

Finally, we save this information to a CSV named "results", where:
- **Column A** (leftmost column): Contains the names of the players.
- **The remaining columns**: Contain the unpacked values from `results_dict`, with the column titles in the form:
  - `[Category Name] - Raw`
  - `[Category Name] - Reasoning`
  - `[Category Name] - Rating`

In [None]:
# Flatten the results_dict for DataFrame conversion
rows = []
for player, categories in results_dict.items():
    row = {'Player': player}
    for category_result in categories:
        for category, values in category_result.items():
            row[f"{category} - Raw"] = values[0]
            row[f"{category} - Reasoning"] = values[1]
            row[f"{category} - Rating"] = values[2]
    rows.append(row)

df = DataFrame(rows)

# Download CSV of data
df.to_csv("results2017.csv", index=False)