# Visualisation 

In this activity we will be working on the visualisation of weather data. 

### How does this notebook work?

Run the cells with code written on them. To do this, you can select them and press Shift + Enter or press the "Play" button on the left side of the cell (if you do not see this, hover with the mouse on the cell and it should appear). Remember that all cells need to be run in order, even those in which you did not need to write any code. If you think you might have skipped one, on the left side of the cell you will find the line number, which corresponds to the order in which you have ran the cells. Keep in mind that if you run a cell and it gets numbered as 4, if you run the same cell immediately after it will be renumbered to 5.

Looking at the code you will notice some parts are incomplete and have '\_\_' written instead. This means that you need to complete that part of the code. In some other parts, you will need to write your own code. This will be specified.

You will also find hyperlinks to documentation on different functions that we will use. It is recommended to look at them to familiarise yourself with what you are doing and how they work.

There are also questions to be completed in text cells. Click twice on them to start editing them or select them and press Enter, and press Shift+Enter when you are finished to go back to reading mode. The questions can be answered in one or two sentences in general.

In [None]:
import numpy 
import os
import matplotlib.pyplot 
import matplotlib.patches 
import math
import seaborn 
import pandas 
from collections import Counter
#Make sure the helper_functions.py file is in the same folder as this notebook
from helper_functions import get_vgs_proportion


In [None]:
#Get the path to the files we will be using.
path_weather = os.path.join(os.getcwd(), 'datasets', 'weather_simple.csv')
#Load the data into the countries_info variable. This results in a DataFrame object.
weather = pandas.read_csv(path_weather, delimiter = ';')

In [None]:
#Looking at part of the data we will need 
weather[:5]

We can use the describe() function now to get an idea of the composition of our dataset

In [None]:
weather.describe(include='all')

From this, we now know that there are 31 days in our dataset (the count of the 'date' column is 31). We also know that it covers the rain, minimum temperature, maximum temperature and the wind of each of the days.

## Activity 1: Using the right chart

Here you are given selections of the dataset that we have loaded. Your job is to find the right plot to use for each of them and represent them with the right legend. Each cell will explain the data you need to plot. Remember to substitute the appropriate names in the code. We will use the [pyplot](https://matplotlib.org/3.1.1/api/pyplot_summary.html) library, which allows us to choose from a range of plots.

***Example:***

We will now plot the minimum temperature. To do this, we first extract the min_temp column from our data. We will then check the type of data we are looking at and consider which plot would be best to represent it.


In [None]:
#Extract the minimum temperature data from the weather data
min_temp = weather['min_temperature']
print("The data type is: ",numpy.dtype(min_temp))

We now have the minimum temperature data and know that it contains numerical values. We need to decide what type of plot we need, keeping in mind that the temperature is a continuous kind of data. We need to choose between a bar plot and a line plot. As it is continuous, our best choice is the line plot. 

To have the most control over the plot, we will be using [matplotlib's subplots](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplot.html). This will allow us to easily access the labels of different parts of the graph.


In [None]:
#Get the figure for the plot and the axis objects to have easy access to all plot data.
fig, ax = matplotlib.pyplot.subplots()

#Create the plot that we will visualise. A line plot is the default for the function plot()
plot = ax.plot(min_temp, label = "Minimum temperature (ºC)")

#Set the labels of the axis
ax.set_ylabel("ºC")
ax.set_xlabel("Day")

#Tell the program that you want the legend to be shown
matplotlib.pyplot.legend()

#Show the plot we have made
matplotlib.pyplot.show()

Now we will represent the maximum temperature. As before, we will first extract the necessary data and check the type.

In [None]:
max_temp = weather['max_temperature']
print("The data type is: ",numpy.dtype(min_temp))

***What kind of plot do you think would be more suitable? Line or bar plot?***


Fill in the legend with the correct labels

In [None]:
#Get the figure for the plot and the axis objects to have easy access to all plot data.
fig, ax = matplotlib.pyplot.subplots()

#Create the plot that we will visualise. A line plot is the default for the function plot()
plot = ax.plot(max_temp, label = "__")

#Set the labels of the axis
ax.set_ylabel("__")
ax.set_xlabel("__")

#Tell the program that you want the legend to be shown
matplotlib.pyplot.legend()

#Show the plot we have made
matplotlib.pyplot.show()

We will now represent the wind speed. As always, we first extract the data we need.

In [None]:
wind = weather['wind(mph)']
print("The data type is: ",numpy.dtype(min_temp))

***What kind of plot do you think would be more suitable? Bar or line plot?***


In [None]:
#Get the figure for the plot and the axis objects to have easy access to all plot data.
fig, ax = matplotlib.pyplot.subplots()

#Create the plot that we will visualise. A line plot is the default for the function plot()
plot = ax.plot(wind, label = "__")

#Set the labels of the axis
ax.set_ylabel("__")
ax.set_xlabel("__")

#Tell the program that you want the legend to be shown
matplotlib.pyplot.legend()

#Show the plot we have made
matplotlib.pyplot.show()

Now we will represent the number of each kind of day (cloudy, sunny or rainy) along the month. First we will count them using a [Counter](https://docs.python.org/3/library/collections.html#collections.Counter). We will then pass this information to a DataFrame, just as before, and check the type of the data.

In [None]:
#Count the number of kind of days (cloudy, sunny, rainy)
kind_days = Counter(weather['weather']) #Counts how many days of each kind there are

#Put the information in a table
num_kind_days = pandas.DataFrame.from_dict(kind_days, orient='index')
print("The type is: ", numpy.dtype(num_kind_days[0]))

***What kind of plot do you think would be more suitable? Bar or line plot?***

These values make reference to a quantity and are not continuous, so a bar plot is the best choice.

In [None]:
#Get the figure for the plot and the axis objects to have easy access to all plot data.
fig, ax = matplotlib.pyplot.subplots()

#Create the plot that we will visualise. A line plot is the default for the function plot()
plot = matplotlib.pyplot.bar(x=num_kind_days.index, height=num_kind_days[0], label="__")

#Set the labels of the axis
ax.set_ylabel("__")
ax.set_xlabel("__")

#Tell the program that you want the legend to be shown
matplotlib.pyplot.legend()

#Show the plot we have made
matplotlib.pyplot.show()

Finally, we will represent the rain. Again, we first extract the necessary data.

In [None]:
rain = weather['rain(mm)']

***What kind of plot do you think would be more suitable? Bar or line plot? Keep in mind that we are measuring the ammount of rain that has fallen per unit of area***

This measures the ammount of rain that has fallen in a unit of area, so the best option is to use a bar plot

In [None]:
#Get the figure for the plot and the axis objects to have easy access to all plot data.
fig, ax = matplotlib.pyplot.subplots()

#Create the plot that we will visualise. A line plot is the default for the function plot()
plot = matplotlib.pyplot.bar(x=range(31), height=rain, label="__")

#Set the labels of the axis
ax.set_ylabel("__")
ax.set_xlabel("__")

#Tell the program that you want the legend to be shown
matplotlib.pyplot.legend()

#Show the plot we have made
matplotlib.pyplot.show()

## Activtity 2: Correcting plots

Correct the following plots. Mistakes could be axis wrongly labeled, bad colour decision ([here](https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html) you have a link to different colourmaps you can use, take a look at colourmaps like 'viridis'), not the right plot for the data and other mistakes.

In [None]:
#Get the figure for the plot and the axis objects to have easy access to all plot data.
fig, ax = matplotlib.pyplot.subplots()

#Create the plot that we will visualise.
plot1, = ax.plot(min_temp, label = "Maximum temperature", color = 'blue')
plot2 = ax.plot(max_temp, label = "Maximum temperature", color = 'blue')

#Set the labels of the axis and the title
ax.set_ylabel("ºC")
ax.set_xlabel("Day")
matplotlib.pyplot.title(label="Maximum and minimum temperature over a month")

#Start all the y axis on 0
ax.set_ylim(ymin=0)

#Tell the program that you want the legend to be shown
matplotlib.pyplot.legend()

#Show the plot we have made
matplotlib.pyplot.show()

***Explain the error(s) you found:***


In [None]:
#Just changing the size of the plot so it is easier to read
matplotlib.pyplot.rcParams["figure.figsize"] = (10,5)

#We need to change the weather values to numbers for them to have different colours
weather_scatter =weather.replace(['Sunny', 'Cloudy', 'Rainy'],[0,1,2])
weather_scatter.plot.scatter(x='date', y = 'max_temperature', c = 'weather', colormap = 'RdYlGn')

#This is for representing the legend
sunny_patch =  matplotlib.patches.Patch(color='red', label='Sunny')
cloudy_patch=  matplotlib.patches.Patch(color='yellow', label='Cloudy')
rainy_patch =  matplotlib.patches.Patch(color='green', label='Rainy')
matplotlib.pyplot.legend(handles=[sunny_patch, cloudy_patch,rainy_patch])

matplotlib.pyplot.title('Kind of day and maximum temperatur of each day')
matplotlib.pyplot.xlabel('Day')
matplotlib.pyplot.ylabel('Maximum temperature')
matplotlib.pyplot.show()

***Explain the error(s) you found:***

In [None]:
#Get the figure for the plot and the axis objects to have easy access to all plot data.
fig, ax = matplotlib.pyplot.subplots()

#Create the plot that we will visualise. A line plot is the default for the function plot()
plot = matplotlib.pyplot.plot(rain, label="Rain")

#Set the labels of the axis
ax.set_ylabel("Amount of rain (mm)")
ax.set_xlabel("Day")

#Tell the program that you want the legend to be shown
matplotlib.pyplot.legend()

#Show the plot we have made
matplotlib.pyplot.show()

***Explain the error(s) you found***

