# STAT1100 Data Communication and Modelling: Week 5

First, let's look at the basics of making plots in python.

## Plotting in Python

In this section we will go some more of the basic of plotting with matplotlib. We will look at making a line plot,
scatter plot, and bot-and-whisker plot, but there are also may more options
[available](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html).

The standard process with matplotlib, no matter which plot you are making, is to first add all of you information to the
plot then either show of save it. Showing or running `plt.clf()` will clear the plot of any previous information, so if you run
two shows in a row, the second will be blank. This means you have set all information, such as the graph, axis labels, title,
etc., on the plot before you can see it.

First we will look at making a line plot, where we will use the `plot` method to make it. We will use numpy to make our
`X` and `Y` values, in the following we use the `numpy.arange(start, end, step)` function which makes a range of numbers
from the `start` to the `end - step` at a `step` interval, so we go from $-\pi$ to $\pi$ with 0.01 intervals in
between. We will also use the `numpy.sin` function to make the `Y` values.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

X = np.arange(-np.pi, np.pi + 0.01, 0.01)
Y = np.sin(X)

Now plotting is pretty simple process, we just need to call the `plot` method with the `X` and `Y` values. Then show or save
it.

In [None]:
plt.plot(X, Y)
plt.show()  # This directly opens a window showing the plot

plt.plot(X, Y)
plt.savefig('sin.png')  # This saves the plot to a file called 'sin.png'

If we want to add labels and a title to our plot, we can do so by using the `title` and `xlabel` and `ylabel` methods.
As expected we will use a string to specify the text for each these respectively through a function argument,
it is worth noting that these use LaTeX formatting, so you can use math symbols.

In [None]:
# We will plot a hyperbola this time
X = np.arange(0.0001, 0.01, 0.0001)
Y = 1 / X

plt.plot(X, Y)
# We will use the LaTeX formatting for the title, r is needed before the string
# to stop the \ from doing something unexpected
plt.title(r'Hyperbola $y = \frac{1}{x}$')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

The scatter plot and box-and-whisker plot have mostly the same process to make, the only difference is that instead of
using `plot` we use `scatter` and `boxplot` respectively.

In [None]:
# This time we will plot random numbers
rng = np.random.default_rng()
# The scatter plot will need X, Y co-ordinates
X, Y = rng.normal(0, 1, size=(2, 1000))
plt.scatter(X, Y)
plt.show()

# The box-and-whisker plot will just need a list of values
values = rng.normal(0, 1, size=1000)
plt.boxplot(values)
plt.show()

### Exercises

1. Make line plot of $e^x$.
2. Make a single plot with lines for both $\sin(x)$ and $\cos(x)$.
3. Make a scatter plot of uniform random numbers.
4. Make a box-and-whisker plot of poisson random numbers.

## Geospatial plotting

We will now do some geoplotting using the geopandas library. This library is built on top of the pandas library,
and adds a geometry column which stores shape data showing the locations where the row information applies. We will also
use the libpysal library to load the files we will analyse. The files are of the format shapefile, which is unique file
format used within GIS software, it is a binary file that contains a table of vectors representing the locations of the
points, lines, and shapes composing the map.

First we will install the required libraries:

In [None]:
%pip install -U geopandas libpysal

Now we will create two maps, one that shows the locations of Australian capital cities, then ones that highlight crime
rates in two American cities.

### Australian Capital Cities

Starting with the capitals, the following code plots the location of Sydney on the map. Your task is to add code to plot
the other seven capital cities:

In [None]:
import pandas as pd
import geopandas
import matplotlib.pyplot as plt

# First we create the data state where and what we are interested in
df = pd.DataFrame(
    {'City': ['Sydney'], 'Country': ['Australia'], 'Latitude': [-33.8688], 'Longitude': [151.2093]}
)

# Next we convert the data to a into the geopandas version of the dataframe
gdf = geopandas.GeoDataFrame(df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))

# Then we get map data of the world from geopandas
url = "https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip"
world = geopandas.read_file(url)

# We first plot just the map shape of Australia
ax = world[world.SOVEREIGNT == 'Australia'].plot(color='white', edgecolor='black')

# We next place the points from our data on top of the plotted map of Australia
gdf.plot(ax=ax, color='red')

# Finally, we show the plot/map
plt.show()

### Crime Rates in Two American Cities

For the crime rates, we will provide an example showing how to do so with the `columbus` (Columbus, Ohio) data from
libpysal, your task will be to plot the car thefts with the San Francisco Crime data (get the path to `SFCrime_blocks.shp`).

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import libpysal as ps

# State where the file is
pth = ps.examples.get_path("columbus.shp")  
# Read the file
columbus = gpd.read_file(pth)  
# Plot the crime rate
columbus.plot(column='CRIME', cmap='OrRd', edgecolor='black', legend=True)  
plt.show()

In the above example, we use the column argument to make the row's geometry be highlighted according to
the value of the `CRIME` column. The cmap argument states the [colour map](https://matplotlib.org/stable/gallery/color/colormap_reference.html)
to use for the highlighting, `OrRd` is a sequence from white to red. The edgecolor argument is used to set the colour
of the outline of the geometry. Finally, the legend argument is used to show the legend.

Task note: Before you can get the path to `SFCrime_blocks.shp`, you must load the example with

In [None]:
ps.examples.load_example('SanFran Crime')