# Visualisation Solutions

In this activity we will be working on the visualisation of data. Questions to be answered are in ***bold*** and should be answered in the same box unless specified. In the code, some parts need to be filled. This will be marked with __ or be empty spaces. Note that the code will not work if these are not changed to the correct values, so make sure that you change all of them.

In [None]:
import numpy as np
import os
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import math
import seaborn as sns
import pandas as pd
from collections import Counter
#Make sure the helper_functions.py file is in the same folder as this notebook
from helper_functions import get_vgs_proportion


In [None]:
#Get the path to the files we will be using.
path_weather = os.path.join(os.getcwd(), 'datasets', 'weather_simple.csv')
path_videogames = os.path.join(os.getcwd(), 'datasets', 'vgsales.csv')
#Load the data into the countries_info variable. This results in a DataFrame object.
weather = pd.read_csv(path_weather, delimiter = ';')
videogames = pd.read_csv(path_videogames, delimiter = ',')

In [None]:
#Looking at the data we will need 
weather

In [None]:
#Look at the other dataset
videogames

## Activity 1: Using the right chart

Here you are given selections of the two datasets that we have loaded. Your job is to find the right plot to use for each of them and represent them. The following cell contains code that you can copy and paste for representing the data. Remember to substitute the appropriate variable names.

Code for drawing a pie chart. Copy, paste, and substitute with your own data
```
data.plot.pie(subplots=True)
        ```
Code for drawing a line chart. Copy, paste, and substitute with your own data
```
data.plot.line
plt.legend(['_'])
```
Code for drawing a bar chart. Copy, paste, and substitute with your own data
```
data.plot.bar()
plt.legend(['_'])
```
Code for stacked 100% area chart
```
data.plot.area()
```

In [None]:
#Let's get all the data that we will represent now

#Number of kind of days (cloudy, sunny, rainy)
kind_days = Counter(weather['Weather']) #Counts how many days of each kind there are
num_kind_days = pd.DataFrame.from_dict(kind_days, orient='index')

#Maximum temperatures
max_temp = weather['Max_Temperature']

#Minimum temperatures
min_temp = weather['Min_Temperature']

#Amount of rain
rain = weather['Rain(mm)']

#Proportion of sold videogames for each Nintendo platform per year
#There is some processing of the data to get the necessary dataset. You can look at it
#in the helper_functions.py file
prop_videogames = get_vgs_proportion() 


In [None]:
#Your chart 1

***Explain your choice:***


In [None]:
#Your chart 2

***Explain your choice:***


In [None]:
#Your chart 3

***Explain your choice:***


In [None]:
#Your chart 4

***Explain your choice:***


In [None]:
#Your chart 5

***Explain your choice:***


## Activtity 2: Correcting plots

Correct the following plots. Mistakes could be axis wrongly labeled, bad colour decision ([here](https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html) you have a link to different colormaps you can use), not the right plot for the data...

In [None]:
plt.plot(min_temp, color = 'blue', label = 'max_temperature (ºC)')
plt.plot(max_temp, color = 'blue', label = 'max_temperature (ºC)')
plt.legend()

***Explain the error(s) you found:***

In [None]:
#We need to change the weather values to numbers for them to have different colours
weather_scatter =weather.replace(['Sunny', 'Cloudy', 'Rainy'],[0,1,2])
weather_scatter.plot.scatter(x='Date', y = 'Max_Temperature', c = 'Weather', colormap = 'RdYlGn')

#This is for representing the legend. Each "patch" is one of the colour labels
sunny_patch = mpatches.Patch(color='red', label='Sunny')
cloudy_patch= mpatches.Patch(color='yellow', label='Cloudy')
rainy_patch = mpatches.Patch(color='green', label='Rainy')
plt.legend(handles=[sunny_patch, cloudy_patch,rainy_patch])
plt.show()

***Explain the error(s) you found:***


In [None]:
vg_selection = videogames.loc[videogames['NA_Sales'] <=15]
vg_selection.plot.scatter(x='NA_Sales', y='EU_Sales',c='Global_Sales',s=vg_selection['Global_Sales'], alpha=0.5, colormap='viridis')
plt.ylabel('NA_Sales')
plt.xlabel('EU_Sales')
plt.show()

***Explain the error(s) you found:***


## Activity 3: Stories with data

 

The year is 2008. Imagine you work at a videogames company. Your company wants to develop a game for one of the Sony platforms, but they need you to decide which of them will be the best option. To do that, you get historical data on how many videogames have been sold per year in each of the Sony platforms since 1980.

In our data, the platforms we will be looking at are called "PS" (Play Station), "PS2" (Play Station 2), "PS3" (Play Station 3), "PS4" (Play Station 4), "PSP" (Play Station Portable) and "PSV" (Play Station Vita). We will now select these individually.

In [None]:
#First we take the data until 2007 included
videogames_2007 = videogames.loc[videogames['Year'] <= __]

#Now we select the platforms. 
ps = videogames_2007.loc[videogames_2007['Platform'] == '__']
ps2 = videogames_2007.loc[videogames_2007['Platform'] == '__']
ps3 = videogames_2007.loc[videogames_2007['Platform'] == '__']
ps4 = videogames_2007.loc[videogames_2007['Platform'] == '__']
psp = videogames_2007.loc[videogames_2007['Platform'] == '__']
psv = videogames_2007.loc[videogames_2007['Platform'] == '__']

The data we have extracted tells us how many copies of each specific videogame has been sold each year in specific regions or around the world. We need to know how many videogames have been sold in total eaach year for each platform to decide which is selling the most and choose that. To do this, we will add all the videogames sold globally (the "Global_Sales" tag) in a year for each platform.

In [None]:
years = sorted(videogames.Year.unique())

#Function for adding the videogames of each year in a platform
def total_platform(a):
    #Create the DataFrame for our data
    total_years = pd.DataFrame(data = [],index = years, columns = ['Global_Sales'])
    for year in years:
        #Calculate the total in that year
        total_year = a.loc[a['Year'] == year, 'Global_Sales'].sum()
        #Save the total into our DataFrame
        total_years.at[year, 'Global_Sales'] = total_year
    return total_years

#Read the function defined above and try to find out what the variable 'a' is. 
#Using the defined function, fill in the following values.
total_ps = __
total_ps2 = __
total_ps3 = __
total_ps4 = __
total_psp = __
total_psv = __

***Based on the defined function, can you explain what data we have now for each platform, referring to their rows and columns?***



Let us now look at the million of games sold during 2007 for each of the platforms.


***Beofre looking at the numbers, intuitively, which platform would you pick based on sales from the previous year? The one with most sales? The one with the least? Something else?***


In [None]:
#Fill in the year we are looking at
print("PS:", total_ps.loc[__, 'Global_Sales'])
print("PS2:", total_ps2.loc[__, 'Global_Sales'])
print("PS3:", total_ps3.loc[__, 'Global_Sales'])
print("PS4:", total_ps4.loc[__, 'Global_Sales'])
print("PSP:", total_psp.loc[__, 'Global_Sales'])
print("PSV:", total_psv.loc[__, 'Global_Sales'])


We now have the millions of copies of videogames sold on each of the Sony platforms in 2007. 


***Which one is the platform with more sold copies? Is this alone enough to decide on what platform we want to sell our game?***


In this case,we can not look at data from only one year, but how they are evolving. Previously, we saw that PS2 had slightly more sales than PS3 the year before, but that does not mean that they will sell the most in PS2 next year. The way we can determine this is by looking at the trend, if sales are going up or down, through a graph

In [None]:
#Fill in with the necessary data set for each of the lines
plt.plot(__.iloc[:28])
plt.plot(__.iloc[:28])

***Seeing the trend now, which platform do you think it would be better to release the game on? Why?***


We have only looked at the Sony platforms, but there are other categories in the dataset that could help us decide on which platform we want our game to be released. 


***Looking through the original dataset, can you name one of these and give a reason why it would be a good characteristic to look at?***
