Jeremy Kroeker and Brayden Arthur
Creating a tool to help developers analyze community discussions on Reddit. Many popular online games have active communities in online forums (such as Reddit). People user forums as a way to find other gamers to play with, share stories about the game, suggest improvements, and give feedback regarding bugs/issues. Reddit is organized into "subreddits", where each subreddit is the hub of discussion for a certain topic. Generally, each game has its own dedicated subreddit.The goal of Project 1 was to investigate any correlations between user bug reports in forums and the actual patches released by developers. In this project three games were analyzed: League of Legends (released 2009), Team Fortress 2 (released 2007), and World of Warcraft (released 2004). There seemed to be some correlation as a spike in user activity would often be followed by a patch, and conversely, after a patch was released there would often be a spike in activity as users discuss the changes. The following observations were made in regards to future work:
- Although the data from Project 1 was interesting, it did not provide much utility to developers.
- The Reddit API was incredibly easy to use, in comparison to manually scraping text from other forums. As such, data could easily be obtained for ANY subreddit, not just the three analyzed in Project 1.
- Patch Note data was difficult to obtain. A separate script needs to be written for each patch note repository (a process which is not easily scalable).
These led to the following design decisions for Project 2:
- Create a tool to assist developers. Game developers such as Riot Games (creators of League of Legends) are already active on the forums, regulairly reading and responding to user comments, but we want to streamline this process.
- Focus just on the data that can be obtained from Reddit since it is easiest to gather and does not require the developer to write any custom scripts. Ideally, all search customization can be done from a UI and the tool can be released as a ready-to-go package.
- The tool should run on any subreddit. This means it is not limited to gaming-related subreddits, so developers on any kind of project could use it.
In addition to creating the tool. We also decided to play around a little more with the existing data from Project 1, just to see what other visualizations we could create and what other information could be extracted from the dataset.
- Setup a local SQLEXPRESS database as described in https://github.com/PolloDiablo/SENG-371-Project-1/blob/master/docs/database-setup.txt
- Download the project and open it in Eclipse
- Navigate to "src/myGUI.java" and press run
The UI has 4 Panels, this is a brief overview of each:
- Monitor Subreddit - This feature lets you monitor a subreddit in realtime. It will look for posts which contatin the given keywords and that are above the given threshold. If it finds any posts from the last 24 hours that meet these criteria, it will send out an email. NOTE: this feature does not require the database to be setup, since it just looks at live Reddit data.
- Graph Data - Assuming you have setup AND populated the database (see #4), you can use this feature to create graphs. There are two primary graph types that you can create:
- Track the variations in popularity of a single keyword over time
- Compare the occurences of multiple keywords over the entire time span As a time-saving feature, after you successfully create a graph, your current query parameters are stored to a data file. The next time you launch the application, each of the query fields will automatically be populated for you. This means, for example, that you would not need to enter the URL of your database each time you use the tool. Some of this data is also populated into panel 4.
- Display Graphs - This page provides a link to the folder where the graphs have been generated.
- Data Scraper - Assuming you have created the database as described in /docs/data.txt , this feature allows you to populate the database with data from Reddit. Given that it has to send a large number of queries to the Reddit API, acquiring this data may take a long time.
Here is the resulting graph for the single keyword "Ahri" during 2014:
Here is the resulting graph comparing all League of Legends characters created prior to 2014, during 2014:
As mentioned earlier, we also wanted to perform additional analysis on the existing data from Project 1. In the chart below, we normalize the values of Reddit Popularity and Patch Notes, and find the difference for each keyword. Essentially, these tells you which League of Legends characters are getting proportionally more or less attention when compared to the forum activity.
Since this chart requires Patch Note data, we did not make it accessible through the UI. However, the code to create this type of chart resides in "src/GraphCreator_MultiKeyword.java".
Overall, the tool seemed to be a success. The project was not over-scoped, so we had time to implement all intended functionality. A video demo of the tool is available [here](https://www.youtube.com/watch?v=OGSlmd3pc8U). This project focused largely on creating the tool, rather than analyzing data. The validity of any graphs created by this tool will vary largely based on what data the user is studying and how they setup each graph. However, in all cases, users will be looking at Reddit data, so they must be aware of the limitations of natural language parsing. A keyword search will not pick-up on things such as slang, typos, abbreviations, or sarcasm. Additionally, any online forum is subject to trolling and any number of forum "games" which could affect the data being collected. There is still some manual setup required to get the tool running (specifically the database creation). It would be ideal to somehow automate this process. Perhaps by storing the data to a file, rather than into a proper database. This would reduce performance but would be much simpler and would work across more platforms (SQLEXPRESS not required).The UI of the tool could also be improved if the tool were to be developed further. The current version is essentially a prototype with barebones functionality. UI elements such as date-pickers would make it much easier to fill-out the graph creation forms. Requiring epoch time is not user-friendly.
There is some overlap in functionality between the live subreddit monitor and the data scraper. Both of them query the Reddit API, but the live monitor does not store any data. These two features code be merged so that as the subreddit is monitored, the database is simultaneously updated. However, this merge could become complex since Reddit posts vary in score over time and the live updater just looks at posts in the last 24 hours. So it would also periodically need to check backwards in time to update the scores of older posts.
Task | Importance[1-10] | Effort[1-10] | Conclusion |
---|---|---|---|
Subreddit Monitoring Tool | 10 | 10 | required |
Auotmated Graph Creation | 10 | 6 | required |
Research Alternate Visualizations | 7 | 2 | required |
Implement Alternate Visualizations | 7 | 6 | required |
Research Alternate Sources of Patch Data | 6 | 7 | optional |
Apply Methodology to Non-game Projects | 8 | 5 | required |
Tool/UI to Parse Reddit Data | 7 | 10 | required |
Final Report/Video | 10 | 10 | required |