Insight DE 18A
This project is to build a pipeline for the analysis on a popular online video game DOTA2.
About DOTA2 (From Wikipedia):
Dota 2 is played in matches between two teams of five players, with each team occupying and defending their own separate base on the map. Each of the ten players independently controls a powerful character, known as a "hero", who all have unique abilities and differing styles of play.
About the data:
OpenDOTA provides a data dump of over 1 billion matches (see blog). For each match it contains the information of start time, cluster, and the data for each player in a match, such as kills, deaths, and gold per minute.
The official Steam API can also be used for streaming. But due to the query limits (~100,000 calls per day), the data dump by OpenDOTA is used in this project. To simulate real-time data feeding, the raw data of the data dump, which is in csv format, is converted to JSON format before sent to Kafka for ingestion (See producer_script for more details).
One thing to note is that, the matches in the raw data is out of order (e.g., a match in 2012 comes after a match in 2016, then followed by a match in 2014). For simplicity, while feeding the data to Kafka, the start_time of each match is simulated: it is approximately in ascending order, but with some extent of out of order is included. The default disorderness is set to be one hour, i.e., a match can be sent to Kafka at any time in a range of +/- 1 hr, based on its actual start time. See producer_script for more details. A more realistic way to simulate would be a latency of 10 - 90 mins after the start time (which is the event time).
The app built in this project can be beneficial for both players and the game company. It has two main tabs, heroes and players, which show both the real-time and historical data in a dashboard.
This project can help players to have a better understanding of the game "meta", e.g., what heroes combinations are the most powerful, how to counter pick heroes.
(1) The dashboard displays the real-time win rate for each hero, based on the last 100 matches played.
(2) For each hero, the web app gives you the win rate and popularity (how many times this hero is picked every day) over time. Also, it shows the best teammates and best opponents of this hero.
The project can also help the game company (Valve Corporation) to monitor users' activity/behavior, and optimize the in-game matchmaking system.
Basically, users of this app can query with a specific date, or with an account id.
(1) query with a date: this will give you the daily active users around that day. Also, it displays the distribution of players in different regions/countries over the world.
(2) query with an account id: this will show the activity of the player with that account id. For each day in the past, it gives: number of matches played and won, and number of minutes played on that day.
[Consumer]-----[Redis]---------⌝
| |
[S3]-----[Producer]-----[Kafka]-----[Flink] [Django]
| |
[Spark]-----[Cassandra]--------⌟