New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distance and speed to frames as well as extra data when available. #109
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The MetricaTrackingSerilizer was the first serializer for Metrica Sports tracking data included in kloppy. This serializers is for the CSV format that Metrica Sports used to provide it's data in. It's also the format of 2 of the 3 games with tracking that Metrica Sports made public. Metrica Sports for quite some time now has a new format for their tracking data. They now provide their tracking data in the EPTS FIFA standard format. That data can be loaded with the EPTS selializer in kloppy that will be now renamed to specify that that is the serializer for Metrica Sports data. Thus, to avoid confussion, I propose in this PR that we renamed the original serializer and file to make it clear that those methods are for the data in CSV format. It's still pending any backwards compatibility or error we might need to throw if users want to use the MetricaTrackingSeiliazer as it was before.
While a lot of the implementation of the EPTSSerializer would apply to the general strcuture of the EPTS format, there are certain elements like periods information or type of data that are specific to Metrica Sports data. There are no other providers at the moment which data is in EPTS format. Thus, rather than create a base EPTSSerilizer and then do MetricaEPTS serializer based on that one, for now we will rename the EPTSSerializer to make clear that is a Metrica one. We will refactor the code in the future if there is a need for another EPTS serializer.
Before there was a test for EPTS, another for loading EPTS data from Metrica Sports, and another one for the metadata of Metrica Sports. There was also a different metadata file for Metrica Sports events files. Since now there is only one EPTS serializer and it is for Metrica Sports files, I have consolidates the tests on testing the serializations of that alone. I have also consolidated the test files in just one metadata file, one tracking data file and one events data file.
Before for each player on each frame we were only serialing the x and y coordinates of the tracking data. Most providers on top of that provide the speed of each player on each frame, and some also provide the distance covered. In this first commit I changed the attribute in Frames from players_coordiantes to players_data. players_data has the PlayerData information for each player on that frame. PlayerData is a new model that has coordinates, distance and speed attributes. This changes are in this commit only implemented for the Metrica Sports EPTS serializer. I also modified the helper to_pandas so that this extra data is included on the resulting dataframe as well.
Depending on the data type, on Metrica Sports data there could be fields that do not belong to PlayerData types (position, distance, speed). This extra data types need to be deserialized, added to the Frame model on other_data and then also included in the dataframe when using to_pandas on the dataset. The other_data is a dict where any kind of data can be included and it's added to the Frame model. While for use now only on the Metrica EPTS data it leaves a base for serialiazers of different providers to add data other than the player and ball data. Next step is to modify and changes tests and serializers for all other providers.
Looks good to me! Only little addition for backwards compatible: please add a ‘player_coordinates’ property which returns ‘Dict[Player, Point]’ |
@koenvo sorry about the delay on the addition you requested! I just came to do it and saw it's already implemented :). Thanks! And great that this is merged to master already! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @koenvo!
Finally, here is the pull request that includes two main changes:
EPTSSerializer
toMetricaEPTSSerializer
(and MetricaTrackingSerializer to MetricaCsvTrackingSeializer`)It's one big PR with everything together, but let me know if you want me to divide this on different PRs!