<a href="https://colab.research.google.com/github/Doongka/GHDColabExamples/blob/master/Visualising_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Visualization Example
The goal of this notebook is to provide a quick overview of some useful Python tools used for cleaning, analysing and visualizing data.

It has has been created using Google Colab, a free cloud-based notebook environment that allows you to write and execute Python code without needing to set up your own local Python environment.

This notebook will allow you to run the code in segments as you go.  When you get to a code block, you can execute (run) the code by pressing CTRL-ENTER.  

Try it on the code block below:-

In [9]:
# assign "Hello World!" to a new variable "greeting"
greeting = "Hello World!"

# run the "print" function with "greeting" as an argument. i.e. Print to screen whatever is stored in "greeting"
print(greeting)

Hello World!


If successful you will see the words "Hello World!" output below the code.

Something important to note is that a # symbol at the start of a line causes Python to ignore that particular line. This is useful to provide a description of your code like I've done above or to stop lines of code from running with debugging. A quick way to "comment" and "uncomment" your code is to press CTRL-/



## Prepare the workspace
Google Colab has many useful Python packages preinstalled such so there shouldn't be any need to install them yourself.  Packages are basically collections of useful functions.  If case you do need something that is not installed, you can run a "pip" installation as per below.  

You can see that I've "commented" the bit of code that installs the package "geopandas".  Uncomment this line and run the code block.

In [0]:
# We place a ! in front of the command to indicate that we want this to run in a console.
# !pip install geopandas

## Uploading your data to Colab
The first step in any data analysis is to prepare the data.  As we are running this in Google Colab, we need to upload any datasets.  There are several ways to do this but for this example I've used the most intuitive.  

After running the cell below, you will see a "Choose Files" button that will allow you to select the files you wish to upload.


In [0]:
from google.colab import files

uploaded = files.upload()

Running the next cell will give you an overview of the files you have just uploaded.

In [13]:
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

User uploaded file "weatherAUS.csv" with length 14167326 bytes


## Wrangling your data

Two commonly used libraries are NumPy and Pandas that help with any extract, transform and load (ETL) operations required to get the data ready.

Numpy is a library that basically allows you to create and perform operations on matrices and multidimensional arrays.

Pandas is a library that provides many useful functions for data manipulation. We will be using the "Dataframe" functionality which will allow us to import data from a CSV file and perform several data wrangling and cleansing functions.



In [0]:
#These commands allow us to load the libraries in to our current notebook
import numpy as np
import pandas as pd

## Loading your data

In [20]:
data = pd.read_csv('weatherAUS.csv')
data.head()
data.

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,WindDir3pm,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RISK_MM,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,WNW,20.0,24.0,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,0.0,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,WSW,4.0,22.0,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,0.0,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,WSW,19.0,26.0,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,0.0,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,E,11.0,9.0,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,1.0,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,NW,7.0,20.0,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,0.2,No


## Visualizing your Data


Altair is a great data visualization library that is preinstalled in Colab.  It is a statistical visualization language w
These have a wide range of graph and data visualization types available


In [29]:
import altair as alt

from vega_datasets import data

source = data.seattle_weather()


interval = alt.selection_interval()

subset = data[data.Location=="Brisbane"]

subset.keys

# scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
#                   range=['#e7ba52', '#a7a7a7', '#aec7e8', '#1f77b4', '#9467bd'])
color = alt.Color('weather:N', scale=scale)

# We create two selections:
# - a brush that is active on the top panel
# - a multi-click that is active on the bottom panel
brush = alt.selection_interval(encodings=['x'])
click = alt.selection_multi(encodings=['color'])

# Top panel is scatter plot of temperature vs time
points = alt.Chart().mark_point().encode(
    alt.X('monthdate(date):T', title='Date'),
    alt.Y('temp_max:Q',
        title='Maximum Daily Temperature (C)',
        scale=alt.Scale(domain=[-5, 40])
    ),
    color=alt.condition(brush, color, alt.value('lightgray')),
    size=alt.Size('precipitation:Q', scale=alt.Scale(range=[5, 200]))
).properties(
    width=550,
    height=300
).add_selection(
    brush
).transform_filter(
    click
)


# Bottom panel is a bar chart of weather type
bars = alt.Chart().mark_bar().encode(
    x='count()',
    y='weather:N',
    color=alt.condition(click, color, alt.value('lightgray')),
).transform_filter(
    brush
).properties(
    width=550,
).add_selection(
    click
)

alt.vconcat(
    points,
    bars,
    data=source,
    title="Seattle Weather: 2012-2015"
)

AttributeError: ignored