# **INFO:**

Check the bottom of this doc for project info. I wanted to test the API first to see if the data necessary to complete it could even be gathered before a write up. 
***

Testing the Marvel Rivals API

key: 

rate limit: 500 requests per minute

In [1]:
# Advanced example with session management and retries
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import pandas as pd
import json
import time
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Define common parameters

# api/player-match/{id} needs the UID of the player not the name
player_id = 946297425 # For WoozyMckay

# API key from Lunar Client
API_key = ''

In [3]:
# Define the API client
class MRAPIClient:
    def __init__(self, base_url='https://mrapi.org'):
        self.base_url = base_url
        self.session = requests.Session()

        retry_strategy = Retry(
            total=3,
            backoff_factor=1,
            status_forcelist=[500, 502, 503, 504]
        )

        adapter = HTTPAdapter(max_retries=retry_strategy)
        self.session.mount("http://", adapter)
        self.session.mount("https://", adapter)

    def get_data(self, endpoint):
        try:
            response = self.session.get(f"{self.base_url}{endpoint}", headers={'X-API-Key': {API_key}})
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            print(f"Error fetching data: {e}")
            raise



In [7]:

client = MRAPIClient()
data = client.get_data('/api/player-match/946297425')
print(data)

[{'match_timestamp': 1739048473, 'match_duration': {'minutes': 7, 'seconds': 49, 'raw': 469}, 'season': '2', 'match_uid': '6714034_1739047847_94_11001_10', 'match_map': {'id': 1170, 'name': 'Royal Palace - Yggsgard', 'gamemode': 'domination'}, 'score': {'ally': 2, 'enemy': 2}, 'winner_side': 0, 'mvp_uid': 851765237, 'svp_uid': 946297425, 'gamemode': {'id': 1, 'name': 'quick-play'}, 'stats': {'kills': 5, 'deaths': 6, 'assists': 5, 'is_win': False, 'has_escaped': False, 'hero': {'id': 1031}}}, {'match_timestamp': 1738984433, 'match_duration': {'minutes': 7, 'seconds': 39, 'raw': 459}, 'season': '2', 'match_uid': '6713872_1738983799_755_11001_10', 'match_map': {'id': 1272, 'name': "Birnin T'Challa - Intergalactic Empire of Wakanda", 'gamemode': 'domination'}, 'score': {'ally': 2, 'enemy': 2}, 'winner_side': 0, 'mvp_uid': 536679700, 'svp_uid': 951112935, 'gamemode': {'id': 2, 'name': 'competitive'}, 'stats': {'kills': 3, 'deaths': 6, 'assists': 3, 'is_win': False, 'has_escaped': False, 'he

In [8]:
df = pd.DataFrame(data)
display(df)

Unnamed: 0,match_timestamp,match_duration,season,match_uid,match_map,score,winner_side,mvp_uid,svp_uid,gamemode,stats
0,1739048473,"{'minutes': 7, 'seconds': 49, 'raw': 469}",2,6714034_1739047847_94_11001_10,"{'id': 1170, 'name': 'Royal Palace - Yggsgard'...","{'ally': 2, 'enemy': 2}",0,851765237,946297425,"{'id': 1, 'name': 'quick-play'}","{'kills': 5, 'deaths': 6, 'assists': 5, 'is_wi..."
1,1738984433,"{'minutes': 7, 'seconds': 39, 'raw': 459}",2,6713872_1738983799_755_11001_10,"{'id': 1272, 'name': 'Birnin T'Challa - Interg...","{'ally': 2, 'enemy': 2}",0,536679700,951112935,"{'id': 2, 'name': 'competitive'}","{'kills': 3, 'deaths': 6, 'assists': 3, 'is_wi..."
2,1738716504,"{'minutes': 10, 'seconds': 46, 'raw': 646}",2,4890706_1738715635_70_11001_10,"{'id': 1272, 'name': 'Birnin T'Challa - Interg...","{'ally': 1, 'enemy': 1}",1,597511711,1602193538,"{'id': 2, 'name': 'competitive'}","{'kills': 20, 'deaths': 5, 'assists': 3, 'is_w..."
3,1738708425,"{'minutes': 11, 'seconds': 29, 'raw': 689}",2,6713712_1738707645_728_11001_10,"{'id': 1101, 'name': 'Hall of Djalia - Interga...",{},0,1729261467,510830805,"{'id': 1, 'name': 'quick-play'}","{'kills': 20, 'deaths': 6, 'assists': 0, 'is_w..."
4,1738472543,"{'minutes': 8, 'seconds': 9, 'raw': 489}",2,6713494_1738471964_628_11001_10,"{'id': 1101, 'name': 'Hall of Djalia - Interga...",{},1,1811191434,747941503,"{'id': 1, 'name': 'quick-play'}","{'kills': 7, 'deaths': 6, 'assists': 1, 'is_wi..."
5,1738471812,"{'minutes': 6, 'seconds': 58, 'raw': 418}",2,6713530_1738471301_187_11001_10,"{'id': 1101, 'name': 'Hall of Djalia - Interga...",{},0,1935871575,1768923224,"{'id': 1, 'name': 'quick-play'}","{'kills': 17, 'deaths': 1, 'assists': 1, 'is_w..."
6,1738127290,"{'minutes': 17, 'seconds': 2, 'raw': 1022}",2,6713849_1738126045_816_11001_10,"{'id': 1291, 'name': 'Midtown - Empire of Eter...","{'ally': 3, 'enemy': 3}",0,936055193,1225899974,"{'id': 2, 'name': 'competitive'}","{'kills': 20, 'deaths': 5, 'assists': 37, 'is_..."
7,1738125133,"{'minutes': 4, 'seconds': 22, 'raw': 262}",2,6714917_1738124779_527_11001_10,"{'id': 1034, 'name': 'Shin-Shibuya - Tokyo 209...",{},0,432152608,403803439,"{'id': 1, 'name': 'quick-play'}","{'kills': 13, 'deaths': 1, 'assists': 0, 'is_w..."
8,1738085641,"{'minutes': 8, 'seconds': 5, 'raw': 485}",2,6713662_1738084929_110_11001_10,"{'id': 1267, 'name': 'Hall of Djalia - Interga...","{'ally': 0, 'enemy': 0}",1,972498423,962752765,"{'id': 2, 'name': 'competitive'}","{'kills': 6, 'deaths': 6, 'assists': 0, 'is_wi..."
9,1737758372,"{'minutes': 8, 'seconds': 49, 'raw': 529}",2,6713643_1737757676_743_11001_10,"{'id': 1288, 'name': 'Hell's Heaven - Hydra Ch...","{'ally': 0, 'enemy': 0}",1,1178622012,164334222,"{'id': 2, 'name': 'competitive'}","{'kills': 22, 'deaths': 8, 'assists': 4, 'is_w..."


Lets take a quick look at the stats header 

In [24]:
stats_data = df['stats']
stats_df = pd.json_normalize(stats_data)
stats_df.index = df['match_uid']
display(stats_df)

Unnamed: 0_level_0,kills,deaths,assists,is_win,has_escaped,hero.id
match_uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
6714034_1739047847_94_11001_10,5,6,5,False,False,1031
6713872_1738983799_755_11001_10,3,6,3,False,False,1015
4890706_1738715635_70_11001_10,20,5,3,True,False,1041
6713712_1738707645_728_11001_10,20,6,0,False,False,1048
6713494_1738471964_628_11001_10,7,6,1,False,False,1033
6713530_1738471301_187_11001_10,17,1,1,True,False,1033
6713849_1738126045_816_11001_10,20,5,37,True,False,1031
6714917_1738124779_527_11001_10,13,1,0,True,False,1029
6713662_1738084929_110_11001_10,6,6,0,False,False,1027
6713643_1737757676_743_11001_10,22,8,4,False,False,1041


***
### Initial Thoughts

Generally speaking, it seems that we only gather about 20 matches, there's likely some way to persist it, however this doesn't matter all that much. Individual stats are gathered for the player, but we're more curious about the stats overall for the match. 

We likely want to take the output here from the Match id, then plug that into more api requests for the match itself.

In [10]:
matches = df['match_uid'].to_list() 
print(matches)

['6714034_1739047847_94_11001_10', '6713872_1738983799_755_11001_10', '4890706_1738715635_70_11001_10', '6713712_1738707645_728_11001_10', '6713494_1738471964_628_11001_10', '6713530_1738471301_187_11001_10', '6713849_1738126045_816_11001_10', '6714917_1738124779_527_11001_10', '6713662_1738084929_110_11001_10', '6713643_1737757676_743_11001_10', '6714041_1737681747_560_11001_10', '6713658_1737652077_166_11001_10', '6714560_1737496044_85_11001_10', '6713689_1737320655_332_11001_10', '6714235_1737256844_925_11001_10', '6713695_1737217222_178_11001_10', '4890825_1737129860_130_11001_10', '6713804_1737079817_728_11001_10', '6714104_1737078417_532_11001_10', '6714077_1737075380_398_11001_10']


In [None]:
match_data = client.get_data('/api/match/6714034_1739047847_94_11001_10')
print(match_data)

{'match_uid': '6714034_1739047847_94_11001_10', 'replay_id': '10477527920', 'mvp': {'player_uid': 851765237, 'hero_id': 1027}, 'svp': {'player_uid': 946297425, 'hero_id': 1041}, 'gamemode': {'id': 1, 'name': 'quick-play'}, 'players': [{'player_uid': 1760979252, 'name': 'Boogerscrotum', 'hero_id': 1050, 'is_win': False, 'kills': 2, 'deaths': 4, 'assists': 3, 'hero_damage': 1434, 'hero_healed': 6792.121636390686, 'damage_taken': 5171.407166004181, 'heroes': [{'hero_id': 1050, 'playtime': {'minutes': 7, 'seconds': 39, 'raw': 459}, 'kills': 2, 'deaths': 4, 'assists': 3, 'hit_rate': 0.5477707006369427}]}, {'player_uid': 2060844458, 'name': 'SpiderManBulge', 'hero_id': 1023, 'is_win': False, 'kills': 3, 'deaths': 8, 'assists': 3, 'hero_damage': 6320.204075336456, 'hero_healed': 2842.7498240470886, 'damage_taken': 11076.655098438263, 'heroes': [{'hero_id': 1040, 'playtime': {'seconds': 0, 'raw': 0}, 'kills': 0, 'deaths': 0, 'assists': 0, 'hit_rate': 0}, {'hero_id': 1039, 'playtime': {'minutes

In [15]:
match_df = pd.json_normalize(match_data)
display(match_df)

Unnamed: 0,match_uid,replay_id,players,mvp.player_uid,mvp.hero_id,svp.player_uid,svp.hero_id,gamemode.id,gamemode.name
0,6714034_1739047847_94_11001_10,10477527920,"[{'player_uid': 1760979252, 'name': 'Boogerscr...",851765237,1027,946297425,1041,1,quick-play


In [16]:
# we're more interested in the player data
player_data = match_data['players']
player_df = pd.json_normalize(player_data)
display(player_df)

Unnamed: 0,player_uid,name,hero_id,is_win,kills,deaths,assists,hero_damage,hero_healed,damage_taken,heroes
0,1760979252,Boogerscrotum,1050,False,2,4,3,1434.0,6792.121636,5171.407166,"[{'hero_id': 1050, 'playtime': {'minutes': 7, ..."
1,2060844458,SpiderManBulge,1023,False,3,8,3,6320.204075,2842.749824,11076.655098,"[{'hero_id': 1040, 'playtime': {'seconds': 0, ..."
2,946297425,WoozyMckay,1041,False,5,6,5,7897.712778,6203.631268,3900.479958,"[{'hero_id': 1031, 'playtime': {'minutes': 2, ..."
3,294125044,Portal112,1029,False,10,8,0,5170.03537,0.0,4896.128411,"[{'hero_id': 1029, 'playtime': {'minutes': 7, ..."
4,1816976623,DraconianSpy,1018,False,4,6,0,6029.966664,0.0,19273.281858,"[{'hero_id': 1011, 'playtime': {'minutes': 5, ..."
5,805995708,Pilot_Service,1042,False,8,8,0,9626.563036,945.880249,8636.996516,"[{'hero_id': 1048, 'playtime': {'minutes': 4, ..."
6,851765237,Super_49,1027,True,24,3,0,11634.407031,0.0,14727.332698,"[{'hero_id': 1027, 'playtime': {'minutes': 7, ..."
7,553022362,melody_69,1038,True,18,3,2,6870.429959,0.0,4459.110641,"[{'hero_id': 1038, 'playtime': {'minutes': 7, ..."
8,1972373663,AK47-404,1037,True,16,2,0,9119.925743,0.0,14869.67193,"[{'hero_id': 1037, 'playtime': {'minutes': 7, ..."
9,228809324,Bonehead.99,1023,True,10,3,17,3216.97335,12075.260708,3853.433267,"[{'hero_id': 1023, 'playtime': {'minutes': 7, ..."


In [18]:
# lets look at the hero data as well
hero_data = player_df['heroes']
hero_df = pd.json_normalize(hero_data)
display(hero_df)

Unnamed: 0,0,1,2,3
0,"{'hero_id': 1050, 'kills': 2, 'deaths': 4, 'as...",,,
1,"{'hero_id': 1040, 'kills': 0, 'deaths': 0, 'as...","{'hero_id': 1039, 'kills': 0, 'deaths': 3, 'as...","{'hero_id': 1041, 'kills': 3, 'deaths': 2, 'as...","{'hero_id': 1023, 'kills': 0, 'deaths': 3, 'as..."
2,"{'hero_id': 1031, 'kills': 0, 'deaths': 2, 'as...","{'hero_id': 1024, 'kills': 3, 'deaths': 1, 'as...","{'hero_id': 1021, 'kills': 1, 'deaths': 1, 'as...","{'hero_id': 1041, 'kills': 1, 'deaths': 2, 'as..."
3,"{'hero_id': 1029, 'kills': 10, 'deaths': 8, 'a...",,,
4,"{'hero_id': 1011, 'kills': 4, 'deaths': 3, 'as...","{'hero_id': 1018, 'kills': 0, 'deaths': 3, 'as...",,
5,"{'hero_id': 1048, 'kills': 5, 'deaths': 3, 'as...","{'hero_id': 1042, 'kills': 3, 'deaths': 5, 'as...",,
6,"{'hero_id': 1027, 'kills': 24, 'deaths': 3, 'a...",,,
7,"{'hero_id': 1038, 'kills': 18, 'deaths': 3, 'a...",,,
8,"{'hero_id': 1037, 'kills': 16, 'deaths': 2, 'a...",,,
9,"{'hero_id': 1023, 'kills': 10, 'deaths': 3, 'a...",,,


It would seem this generates an unknown number of columns based on the number of swaps a player makes per match. While this information is necessary fo accurracy (`hit_rate`), I think the length of the matrix could potentially be a better variable itself. (e.g. the more swaps you make, the *less likely* your team is to win)

The number of rows align with the number of players, so per match this could be added back  on the the match_df. Consider:
- Creating a column containing a string or array of all heroes played per match
- Creating a column containing the avg accuracy (hit_rate) of heros played per match for each individual entry

In [21]:
display(hero_df[0][0])

{'hero_id': 1050,
 'kills': 2,
 'deaths': 4,
 'assists': 3,
 'hit_rate': 0.5477707006369427,
 'playtime.minutes': 7,
 'playtime.seconds': 39,
 'playtime.raw': 459}

***
So we have to go to this low of level to get accuracy from the match data, but to get a list of heros we would also need to reassign the values as well before working with the data. 

We likely also want to pull player profile data to check rank and most potentially most played heros. Here's a list of items that might be necessary for analysis:
- player rank
- heroes id:name -> we want to grab a list of hero id's and their actual name in game
- hero role

***
## What are we investigating? 

This is the real question. Long since the original hero-shooter (Overwatch) there's been a general consensus of the community at high ranks that "**stats don't matter**". Since Overwatch doesn't have a public API where you amass any of this data, people have just accepted this as *fact*. The idea of this investigation is prove and/or challenge it through a series of questions. Some of the proposed questions are:

- Kills/Deaths/Assists 
    - do these values impact win likelihood?
    - which one has the strongest correlation with win % ?
    - does last_hit spread win games or solo carry more impactful? 
- Dmg/Mit/Heal
    - does healing win games or is it damage?
    - how much does blocking damage impact the outcome of a game? 
- META
    - do meta characters have high winrates in general or is it comp dependent?
    - is tripple support really that strong?
***
# How much data do we need?

### Estimating the Required Data Size
A common rule of thumb for machine learning is to have at least 10 times as many samples as features for simple models like logistic regression. For more complex models (e.g., random forests, neural networks), this can be 50–100x or more.

#### **Step 1: Estimate Data Points per Match**
Since each match involves 12 players, you need to decide if you:

- Treat each player individually → 12 data points per match
- Aggregate team stats → 2 data points per match
- Keep match-level stats → 1 data point per match

#### **Step 2: Estimate Required Matches**
For a simple logistic regression, let’s assume you have 20 features (e.g., damage, KDA, hero, healing, etc.):

- **Logistic Regression** → Needs 200–500 matches (4,000–12,000 player data points)
- **Random Forest/XGBoost** → Needs 1,000–5,000 matches (20,000–60,000 player data points)
- **Neural Network** → Needs 5,000+ matches (100,000+ player data points)

Ideally, I want ~5000+ matches as a neural network is most likely to give the best answer to this multifacted problem, however for the nature of this project and time commitment, we'll settle for 1000 matches for the time being. 

***
## Assumptions

It is impossible to account for general player variance (e.g. poor performance due to life events) so we have to consider some factors outside of our control. Here are some reasonable assumptions to be made regarding the game. 

### **1. Players in the top 5%-10% understand the game (*but can't always impact it*)**

The idea here is that you're good enough to be in the top of the upper quartile of the entire playerbase, but you aren't the top 1% who can solo-carry or meaningfully change the game with your presence alone. At this range it's expected:
- All players generally know what all characters do (abilities/cooldowns/comp-types/etc.)
- All players are able to somewhat effectively coordinate with a team (regardless of verbal communication)
- Most players can land skill-shots (i.e. non-targeted abilities) and have a semi-consistent accuracy (for non-melee characters)
- All players know what the job of their role is, as well as what K/D/A & Dmg/Mit/Heal are

This range mostly consists of Diamond → Grand Master players, but excludes the Celestial rank which is made up of the top 'Carry' players whose presence drastically changes the outcome of the match. 

So for this exercise, most match samples will be take from **Diamond** to **Grand Master**, with the exception of a smaller sample pool of lower ranked and/or quickplay games to test against a null group. 

**Benefits of Filtering to Upper Ranks**
- ✅ More Consistent Gameplay → Reduces randomness from players who don’t fully grasp the game.
- ✅ Better Hero Understanding → Ensures win rates reflect actual hero strengths rather than misplays.
- ✅ More Useful Insights → The model will be more applicable to serious players looking to improve.

**Potential Downsides**
- ❌ Less Data Overall → You may need more matches to reach the same level of statistical confidence.
- ❌ Not Representative of All Players → If you ever want to predict win likelihood for lower-ranked players, the model might not generalize well.

### **2. Team Coordination vs. Solo-play**

There isn't a great way to check for this other than individually tagging each player in a match and counting how often they appear relative to other players. While possible, it can severly limit the amount of data I can collect since the likelyhood of *N-stack vs N-stack* is high. 

**I'll be assuming that if there is a team with N-stack, the enemy team also has a N-stack of equal players.**

### **3. Match Duration & Early FFs**

Any matches that resulted in a surrender will be removed from the sampling pool. There are unlikely to give data either way that is important. This also includes matches that had leavers. 

### **4. Hero Representation & Meta Considerations**

Some heroes are played far more often than others, which can bias the model.
- Should you exclude heroes with too little data to avoid inaccurate win rates?
- Should the model account for patch updates that change hero balance?

To avoid this, a minimum match count per hero will be required before including it in the dataset. If the number of matches the hero appears in strays too far from the mean, then I'll look at the data further to see if they need to be excluded from this analysis or if more matches need to be added. 

***