# Assignment 1 -- Insights from Overwatch League

## 76 total marks

In [1]:
# You will need the following packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

## Introduction

**Business Context.** You work for Overwatch League, the premier esports league for the game Overwatch. They have collected data about past pro tournaments and would like you to answer some questions about past tournaments. Build visualizations and answer the questions below. 

**About Overwatch League.** In the Overwatch League, teams of professional players compete against each other to be the best in the world at playing the team-based first-person shooter game, Overwatch.

Each match consists of two teams of six players each, with each player selecting one of the game's many heroes to play as. The teams are then tasked with completing various objectives, such as escorting a payload across the map or capturing and holding control points.

The team that is able to complete their objectives the quickest or prevent the other team from completing their objectives wins the match.

Matches are typically played in a best-of-three format, meaning that the first team to win two rounds is declared the winner. Each round is called a "map", and the team that wins the most maps wins the match.

The Overwatch League is divided into two seasons per year, with teams from all around the world competing in a variety of different venues. The top teams from each season advance to the playoffs, where they compete for the chance to be crowned the Overwatch League champion. Each map is one of two types, it is either a control map or an escort map.  

**Business Problem.** Your task is to format the given data and provide visualizations that would answer the specific questions the client has, which are mentioned below.

- **Analytical Context.** You are given a CSV file (stored in the already created ```data``` folder) containing details about each match like winning team, map type, date of match, tournament title, etc. 
You will be performing the following tasks on the data:

1. Read, transform, and prepare data for visualization
2. Perform analytics and construct visualizations of the data to identify patterns in the dataset
        
The client has a specific set of questions they would like to get answers to. You will need to provide visualizations to accompany these.

# Question 1 -- loading in and getting an overview (5 marks)

**1a) (1 mark)**  Use the pandas function ```read_csv()``` to load the file ```match_map_stats.csv``` as a DataFrame. Name this DataFrame ```df```. 

**1b) (1 mark)** Print the first 5 rows of the DataFrame

**1c) (1 mark)** Print the column names in `df`

We have the following columns

- **round_start_time** Stating time of the round
- **round_end_time** Ending time of the round
- **stage** Stage of the competition
- **match_id** Id for the match
- **game_number** Game number in the given match
- **match_winner** Winner of the match
- **map_winner** Winner of the game/map
- **map_loser** Lost of the game/map
- **map_name** Name of map
- **map_round** Round number (each map has several rounds)
- **winning_team_final_map_score** Final score of map winner
- **losing_team_final_map_score** Final score of map loser
- **control_round_name** Name of the round if it was a control type map
- **attacker** Team that was the attacker
- **defender** Team that was the defender
- **team_one_name** Name of first team in match
- **team_two_name** Name of second team in match
- **attacker_payload_distance** Distance the attacker moved the payload on payload maps
- **defender_payload_distance** 
- **attacker_time_banked** Attacker time left over
- **defender_time_banked** Defender time left over
- **attacker_control_perecent** Percent objective was charged by the attacker team
- **defender_control_perecent** Percent objective was charged by the defender team
- **attacker_round_end_score** Attacker score at the end of this round
- **defender_round_end_score** Defender score at the end of this round

**1d) (2 marks)** Answer the folowing question:

Overwatch is aiming to expand their fanbase through the following advertising campaigns:

- [ ] Advertising the most popular teams
- [ ] Promoting a rematch between last years best teams
- [ ] Creating an underdog story by highlighting last years worst performing teams
- [ ] Which demographics are more likely to buy tickets?

Which of these initiatives could directly benefit from an analysis of the data provided? Check all that apply.

**Note:** You can create check marks by double-clicking this cell and adding an ```[x]``` in the corresponding lines. 

# Question 2 -- Cleaning the data (5 marks) 

In this dataset there are two types of maps, escort and control. On control maps, the teams try to control an objective until the meter reaches 100. On escort maps, the attacker tries to escort a payload to the end of the map, while the defender tries to stop it. Control maps can be found in the dataset by rows where the `control_perecent` is not 0 for at least one team. That is, `control_perecent` will be Nan or 0 for both teams when the map is **not** a control map. 
- 2a) **(1 marks)** Create a variable for map type called map_type, which has two possible values "control" and "escort". 
- 2b) **(2 marks)** For rows which correspond to control maps, set the time_banked and payload distance values to `NaN`. 
- 2c) **(2 marks)** Ensure that `round_start_time` and `round_end_time` time variables are in the `date_time` format and create a variable called `year` which contains the year the match took place

# Question 3 --  Win rates on control maps (14 marks) 

3a) **(2 marks)** Create and print a sorted list of the unique teams. Store this list in a variable called `teams`. 

3b) **(1 marks)** Create and print a dictionary called `country` with team names as keys and their corresponding country as the entries. 

A team's win rate is the number of games won divided by the total number of games played
1. 3c) **(1 marks)** Subset the DataFrame so that it contains only the control maps.
2. 3d) **(5 marks)** Make a horizontal bar chart displaying the win rate on control maps of each team, order the bar chart from lowest win rate to highest win rate, add a vertical line at 50%. Color the teams by country. **Hint it is useful to use the `sns.barplot` function, set the `dodge` argument as follows:`dodge=False` and make use of the `orient`, `hue` and `hue_order` arguements.**
3. 3e) **(2 marks)** Which teams have a higher than 50% win rate? Do some countries produce better teams than others?


3f) **(3 marks)** Make the same plot for each year - are the yearly plots similar to the overall plot? 

# Question 4 -- Win rates over time (17 marks)

- 4a) **(3 marks)** Make a line of the Shanghai Dragons' win rate on control maps over time. 
- 4b) **(1 marks)** How would describe the Shanghai Dragons' win rate on control maps over time?
- 4c) **(3 marks)** For each team who had a win rate on control maps in the top 5 in 2022, plot their win rates on control maps over time. 
- 4d) **(2 marks)** Make two observations about these plots. 

- 4e) **(5 marks)** Instead of control maps, make a bar chart of the teams' win rates on the escort maps in 2022, i.e., consider only escort maps which occured in 2022 when computing this win rate. Make the same chart without subsetting to 2022, i.e., compute the escort map win rate for each team using maps which took place in any year. For example, the win rate chart on escort maps for 2022 should look like the following: <img src="WRE.png" alt="drawing" width="500"/> Note that the vertical line at 50 is useful but not necessary. 
- 4f) **(3 marks)** Compare the 2022 win rates on control maps to those on the escort maps. What do you observe?


# Question 5 --  What map should Toronto work on improving? (18 marks)

Suppose you have been contracted as an analyst for the Toronto Defiant team. 

- 5a) **(5 marks)**  There are different types of control maps, which can be found in the `control_round_name` column. Make a bar chart of Toronto's win rate on each individual control map (can be found in `control_round_name`) for 2022 and all time. For example, your plot might look like: 
<img src="WRTD.png" alt="drawing" width="500"/>
- 5b) **(5 marks)** Which plot should be used to make reccommendations for Toronto in 2023? Based on this judgement and the plot you generated, what map should Toronto practice the most? Are there any problems with this analysis?
- 5c) **(2 marks)**  Assign each team a rank according to their 2022 control map win rate, with 1 being the highest win rate on control maps in 2022 and 20 being the lowest win rate on control maps in 2022. Print each team name with its rank beside it. 
- 5d) **(3 marks)** Determine Toronto's three best and three worst control map names (by win rate) in 2022. Print the mean rank of the teams Toronto played for each of these maps. 
- 5e) **(3 marks)** Compare the mean ranks on Toronto's best maps to their worst maps. What does this analysis reveal about Toronto's map-specific win rates?

# Question 6 -- (9 marks) 

The league is considering changing the rules if too many matches (>5\%) end in a draw. The league organizers consider draws to be bad for the game. 


- 6a) **(3 marks)** What proportion of escort maps end in a draw?
- 6b) **(6 marks)** Make a bar plot of number of draws by map. Do certain escort maps have more draws? What proportion of draws happen on the maps with the two highest number of draws -- what would you tell the league organisers concerning this proportion?

# Question 7 -- (8 marks) 

On escort maps, the attacker must push a "payload" a certain distance within an alotted time. If the attacker pushes the payload to the end of the map before the alotted time is up, then the attacker can bank that time. On the other hand, if the attacker does not push the payload to the end of the map, the distance they were able to push it is recorded. 

- 7a) **(4 marks)** On maps where the attacker won the round, print the summary statistics of the `attacker_time_banked` column, along with a histogram of the `attacker_time_banked` column. On maps where the attacker lost the round, print the summary statistics of the `attacker_payload_distance` column, along with a histogram of the `attacker_payload_distance` column. 
- 7b) **(4 marks)** Describe the characteristics of the histrograms and interpret the summary statistics in the context of the data. 