Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data #1

Open
aisobran opened this issue Nov 5, 2015 · 22 comments
Open

Data #1

aisobran opened this issue Nov 5, 2015 · 22 comments

Comments

@aisobran
Copy link
Owner

aisobran commented Nov 5, 2015

The data has been added to the data folder in the repo.

@aadithya93
Copy link
Collaborator

How is the parsing going on? Is any help needed?

@aisobran
Copy link
Owner Author

I have all the pass plays parsed but not the run plays. If a play contains "pass", it's getting parsed.

@aisobran
Copy link
Owner Author

Pushed the parsed run and pass plays. The data is all tabular. The receiver column is dirty but everything else is pretty clean. Did a check by counting by value for each.

The data doesn't contain any special team plays or penalties.

To aggregate data just use some form of group by.

Also pushed the apache zeppelin notebook I used to do the parsing.

Since you used python so far I'm cool with using pandas, scikit-learn for ML from here on out.

@aisobran
Copy link
Owner Author

Added the year and week to each row. Also filtered out any commas in receiver or playString to avoid issues.

@aisobran
Copy link
Owner Author

Pushed the new set. Let me know if you see any other discrepancies. Around 1000 plays were added back into the data set.

@aadithya93
Copy link
Collaborator

I also notice that there is not attribute to define output attribute i.e. pass or run. I assume that the plays with NaN in the complete attribute are all run plays. Am I correct?

@aisobran
Copy link
Owner Author

Right, both complete and passer are redundant for a pass play. If either of them is not filled in (NaN) then it was a run play.

@aadithya93
Copy link
Collaborator

So is it not required to add the output attribute to the training set? I can add an attribute playType to describe if its a pass or run play and push the data if its okay.

@aisobran
Copy link
Owner Author

Yeah that works for me.

@aisobran
Copy link
Owner Author

It might be best to rename the file incase there's issues with the original and we have to replace it again.

@aadithya93
Copy link
Collaborator

Yes. I will create a new data file with the output attribute.

@aisobran
Copy link
Owner Author

There is actually an issue with this data, I'll push up a new one shortly.

@aadithya93
Copy link
Collaborator

Okay sure.

@aisobran
Copy link
Owner Author

Pushed the new data, 4 entries were invalid

@aadithya93
Copy link
Collaborator

I added a new file with the output attribute. pushed the file

@aadithya93
Copy link
Collaborator

Hey I feel that there are few issues with the parsed data. The distance [short, deep] should be non null for pass plays right? There is a mismatch in the number of non-null values complete and distance. Is it correct?

@aisobran
Copy link
Owner Author

Sometimes there are pass plays that may not be labeled short or deep but if you have specific example I can check it out in the parser.

@aadithya93
Copy link
Collaborator

I saw few plays that had NaN for complete and passes, which would mean that they are run plays but had a distance value as short. When i looked at those particular plays, one was actually a pass play that was incomplete and one was run play with a gain of 1 yard but had the word "short gain" in commentary.
These examples are
1st example-(DEN, DEN 32, Q1, 3 and 10) (14:18) (Shotgun) K.Orton FUMBLES (Aborted) at DEN 23, RECOVERED by PHI-D.Patterson at DEN 23. D.Patterson for 23 yards, TOUCHDOWN. Denver challenged the backward pass ruling, and the play was REVERSED. (Shotgun) K.Orton pass incomplete short right to D.Graham (J.Parker).
2nd example-(SF, DEN 38, Q3, 1 and 10) (6:10) F.Gore left tackle to DEN 37 for 1 yard (M.Thomas). Gore starts left then turns right for the short gain

@aisobran
Copy link
Owner Author

I'm looking into it.

@aisobran
Copy link
Owner Author

So those two should be fixed and other issues like those should be fixed. Let me know if you see anything else.

@aadithya93
Copy link
Collaborator

there is one data point with injury information. the play string is stored in the yardsGained attribute. I think that should be removed. You find it by looking at the unique values for yardsGained attribute.

@aisobran
Copy link
Owner Author

Pushed new data with the INJURY UPDATE filtered out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants