Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some active player marked w/ UNK team and position #25

Closed
oliverjen opened this issue May 30, 2014 · 7 comments
Closed

some active player marked w/ UNK team and position #25

oliverjen opened this issue May 30, 2014 · 7 comments

Comments

@oliverjen
Copy link

Hi,

I restored the nfldb and ran nfldb-update today. I got a bunch of errors for guys who were drafted this year (e.g. Bridgewater, etc), but that's no big deal.

Some active (for 2013) players' teams/positions are marked as UNK in the player table.
For example, Michael Bennett from SEA or Tony Gonzalez from ATL.
Re-running nfldb-update doesn't seem to change their team/positions.

@BurntSushi
Copy link
Owner

Yuck.

As of now, nfldb gets team and position data for players based on the current team roster. Take a look at the SEA and ATL rosters: http://www.nfl.com/teams/seattleseahawks/roster?team=SEA and http://www.nfl.com/teams/seattleseahawks/roster?team=ATL

See anything missing?

So that's why. The question is now, what to do about it.

I'll look into pulling team/position data from their profile page and the team's roster page.

@oliverjen
Copy link
Author

Ah, it would work perfectly during the season. But during the offseason, when a roster churns, it could be out of whack. Thanks for looking at it, Andrew!

@gojonesy
Copy link

I think that an issue that I am facing is related to this as well...

I am trying to aggregate a team's forced fumbles for 2013. In this example, Arizona:

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year=2013, season_type='Regular', team='ARI')
# Get all Forced Fumbles
q.play(pos_team__ne='ARI', down__ne=0)
plays = q.as_plays()
run_total = 0
aggregated = nfldb.aggregate(plays)
aggregated = sorted(aggregated, key=lambda p: p.defense_ffum, reverse=True)
for pp in aggregated:
    if pp.team == "ARI":
        print pp.player, pp.defense_ffum
        run_total += pp.defense_ffum

print run_total

output:

John Abraham (ARI, OLB) 4
Jerraud Powers (ARI, CB) 1
Tyrann Mathieu (ARI, FS) 1
Darnell Dockett (ARI, DE) 1
Calais Campbell (ARI, DE) 1
Sam Acho (ARI, OLB) 1
Matt Shaughnessy (ARI, OLB) 1
Marcus Benard (ARI, LB) 1
Jasper Brinkley (MIN, ILB) 0
Rashad Johnson (ARI, FS) 0
Lorenzo Alexander (ARI, OLB) 0
Karlos Dansby (CLE, ILB) 0
Yeremiah Bell (UNK, UNK) 0
Dan Williams (ARI, NT) 0
Patrick Peterson (ARI, CB) 0
Antoine Cason (CAR, CB) 0
Frostee Rucker (ARI, DE) 0
Tony Jefferson (ARI, FS) 0
Justin Bethel (ARI, CB) 0
Ronald Talley (ARI, DE) 0
Javier Arenas (ATL, CB) 0
Alameda Ta'amu (ARI, NT) 0
Kenny Demens (ARI, ILB) 0
Dontay Moch (CIN, LB) 0
Daryl Washington (ARI, ILB) 0
Jaron Brown (ARI, WR) 0
Bryan McCann (ARI, DB) 0
11

You will notice that many of these players played for ARI in 2013, but they are listed on another team now. No problem with that. The issue here is that Karlos Dansby had a Forced Fumble last year that is not listed.

It seems way more likely that there is something wrong with my code, but I ran across this today and recalled this issue.

Do you think this is related?

@BurntSushi
Copy link
Owner

@gojonesy No, it's probably not related. Player affiliations don't really have anything to do with the connection between players and statistics. A single missing forced fumble among otherwise correct data probably just indicates that the source data was wrong. (Although to verify, you should try and locate the specific play where it occurred and see if it's logged in nfldb or possibly nflgame.)

I did try out your code and I think it's right. But I simplified it for you:

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year=2013, season_type='Regular', team='ARI')
q.play(team='ARI', pos_team__ne='ARI', down__ne=0)

run_total = 0
for pp in q.sort('defense_ffum').as_aggregate():
    print pp.player, pp.defense_ffum
    run_total += pp.defense_ffum
print run_total

The key is taking advantage of the team field on the play_player table. You were half way there with pos_team__ne='ARI' (which is an attribute of a play).

Also, aggregating and sorting inside the database is much faster. :-)

@gojonesy
Copy link

Thanks for that Andrew. That seems way more efficient.

Dansby still doesn't show a Forced Fumble. NFL.com shows that he had one on 10/13 against San Francisco.

I may have just stumbled across a random data inconsistency...

Thanks for your help

@BurntSushi
Copy link
Owner

I may have just stumbled across a random data inconsistency...

This isn't improbable. There are a lot of data inaccuracies, unfortunately. I did a test measuring inaccuracies last season: https://github.com/BurntSushi/nflgame/tree/master/test-data/results-yahoo-2012-max

You can see, for example, the results of defensive end stats: https://github.com/BurntSushi/nflgame/blob/master/test-data/results-yahoo-2012-max/de.tsv --- There are several players with slightly off forced fumble statistics, tackles, etc.

(Note that the players that are waaaay off are likely a result of an invalid player match in my test rather than actual inaccuracies.)

FYI, there's really nothing we can do. These are straight from the JSON data.

@BurntSushi
Copy link
Owner

Closing this for now. I think the kinks may have worked themselves out.

Note though that there are still a bunch of players who don't have gsis identifiers yet. This will produce some vomit if you run nfldb-update and it tries updating the player meta data. This is OK and it should hopefully start to disappear as we get closer to the season.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants