# Exercise notebook :

In [None]:
import warnings
warnings.simplefilter('ignore', FutureWarning)

import pandas as pd
from datetime import datetime

In [None]:
london = pd.read_csv('London_2014.csv', skipinitialspace=True)
london.head()

`Note that the right hand side of the table has been cropped to fit on the page.
You’ll find out how to remove rogue spaces.`

## Every picture tells a story

It can be difficult and confusing to look at a table of rows of numbers and make any
meaningful interpretation especially if there are many rows and columns.
Handily, pandas has a method called **plot()** which will visualise data for us by producing
a chart.

The following line of code tells Jupyter to display inside this notebook any graph that is created.

In [None]:
%matplotlib inline

The `plot()` method can make a graph of the values in a column. Gridlines are turned on by the `grid` argument.

To plot `‘Max Wind SpeedKm/h ’`, it’s as simple as this code:

In [None]:
london['Max Wind SpeedKm/h'].plot(grid=True)

The `grid=True` argument makes the gridlines (the dotted lines in the image above)
appear, which make values easier to read on the chart. The chart comes out a bit small,
so the graph can be made bigger by giving the method a `figsize=(x,y)` argument where `x` and `y` are integers that determine the length of the `x-axis` and `y-axis`.

In [None]:
london['Max Wind SpeedKm/h'].plot(grid=True, figsize=(10,5))

That’s better! The argument given to the `plot()` method, `figsize=(10,5)` simply tells
`plot()` that the `x-axis` should be 10 units wide and the `y-axis` should be 5 units high. In
the above graph the `x-axis` (the numbers at the bottom) shows the dataframe’s index, so 0
is 1 January and 50 is 18 February.
The `y-axis` (the numbers on the side) shows the range of wind speed in kilometres per
hour. It is clear that the windiest day in 2014 was somewhere in mid-February and the
wind reached about `66 kilometers per hour`.
By default, the `plot()` method will try to generate a line, although as you’ll see in a later
modules, it can produce other chart types too.

Multiple lines can be plotted by selecting multiple columns.

In [None]:
london[['Max Wind SpeedKm/h', 'Mean Wind SpeedKm/h']].plot(grid=True, figsize=(10,5))

### Task

In the cell below, write code to plot the minimum, mean, and maximum temperature during 2014 in London.

## Changing a dataframe's index
We have seen that by default every dataframe has an integer index for its rows which
starts from `0`.
The dataframe we’ve been using, london , has an index that goes from `0 to 364`. The
row indexed by 0 holds data for the first day of the year and the row indexed by 364 holds
data for the last day of the year. However, the column `'GMT' holds datetime64` values
which would make a more intuitive index.
Changing the index to `datetime64` values is as easy as assigning to the dataframe’s
index attribute the contents of the `'GMT'` column, is done by assigning to the dataframe's `index` attribute the contents of the `'GMT`' column, like this:

In [None]:
london['GMT'] = pd.to_datetime(london['GMT'])

In [None]:
london.index = london['GMT']
london.head(2)

`Notice that the 'GMT' column still remains and that the index has been labelled to show
that it has been derived from the 'GMT' column.`

The `iloc` attribute can still be used to get and display rows by number, but now you can now also use the `datetime64` index to get a row by date, using the dataframe's `loc` attribute, like this:

In [None]:
london.loc[datetime(2014, 1, 1)]

A query such as *'Return all the rows where the date is between December 8th and December 12th'* can now be done  succinctly like this:

In [None]:
london.loc[datetime(2014,12,8) : datetime(2014,12,12)]

#The meaning of the above code is get the rows beween and including 
#the indices datetime(2014,12,8) and datetime(2014,12,12)

Because the table is in date order, we can be confident that only the rows with dates
between 8 December 2014 and 12 December 2014 (inclusive) will be returned. However if
the table had not been in date order, we would have needed to sort it first, like this:

In [None]:
london = london.sort_index()
london

Now we have a `datetime64` index, let's plot `'Max Wind SpeedKm/h'` again:

In [None]:
london['Max Wind SpeedKm/h'].plot(grid=True, figsize=(10,5))

Now it is much clearer that the worst winds were in mid February.

### Task
Use the code cell below to plot the values of `'Mean Humidity'` during spring (full months of March, April and May).

Your project this week is to find out what would have been the best two weeks of weather
for a 2014 vacation in a capital of a **BRICS** country.

I’ve written up my analysis of the best two weeks of weather in London, UK, which you can
open in project notebook.
The structure is very simple: besides the introduction and the conclusions, there is one
section for each step of the analysis – obtaining, cleaning and visualising the data.
Once you’ve worked through my analysis you should open a dataset for just one of the
BRICS capitals: Brasilia, Moscow, Delhi, Beijing or Cape Town, the dataset has been downloade and can be found in the folder. The choice of capital is up
to you. You should then work out the best two weeks, according to the weather, to choose
for a two-week holiday in your chosen capital city.

Once again, do not open the file with Excel , but you could take a look using a text
editor.
In my project, I was looking for a two
week period that had relatively high temperatures and little rain. If you choose a capital in
a particularly hot and dry country you will probably be looking for relatively cool weather
and low humidity.

Note that the London file has the dates in a column named ‘GMT’ whereas in the BRICS
files they are in a column named ‘Date’. You will need to change the Python code
accordingly. You should also change the name of the variable, London, according to the
capital you choose.

## GOOD LUCK!