# Explore White Shark Data
This notebook will guide your exploration of a white shark dataset. 

## Load modules and set filename

In [None]:
import white_shark as ws
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

import geopandas as gpd
from shapely.geometry import Point
%matplotlib inline

from shapely.geometry import Polygon

# Set the filename
filename = 'subset-calc-pos.csv'

## Convert CSV to DataFrame

In [None]:
# Call make_df from the main white_shark (ws) file
shark = ws.make_df(filename)

# Meet the data

Note the difference in the following commands and their output

In [None]:
print('The dataset contains', shark.shape[0], 'rows and', shark.shape[1], 'columns.')
print('The column names are:', list(shark.columns.values))
    
# Use 'iloc[]' to select row '0', (does not count header as a row)
# print(shark.iloc[0])

# Use slicing to get the first row (prints header names also)
# print(shark[:1])

## Show first few rows of dataframe

In [None]:
shark.head()

# Initial data exploration

Here we'll use a few basic techniques to explore the data we just imported 

In [None]:
# Count number of times each shark was observed  
tag_counts = shark["TRANSMITTER"].value_counts()
print(tag_counts)

In [None]:
# Shark observation frequency, nearly same as above, but normalized to total observations   
# setting normalize=True
tag_frequency = shark["TRANSMITTER"].value_counts(normalize=True)
print(tag_frequency)

type(tag_frequency)

## Use the describe() method to get summary statistics

The describe() method will operate on numerical columns of our shark dataframe. 

The output contains things like count, mean, max, etc..

This will be more useful once we do our own calculations (e.g., speed) using the data. 

In [None]:
shark.describe()

## Plot a histogram of the frequency data

In [None]:
# Create a dataframe from the frequency data
df1 = tag_frequency.to_frame()

ax1 = df1.plot.bar()
ax1.set_xlabel("Shark ID", labelpad=20, weight='bold', size=12)
ax1.set_ylabel("Observation Frequency", labelpad=20, weight='bold', size=12)


# Plot some time series data
Here you'll manipulate the data frame to extract all the data 
for an individual shark and plot it using pyplot (plt). 

In [None]:
# Use logical indexing to extract all data for a specific animal (coded by TRANSMITTER field)
shark20 = shark[shark.TRANSMITTER == '2020-20']
# shark20

# Plot the shark's x-position through time
plt.plot(shark20.DATETIME, shark20.X)
plt.plot(shark20.DATETIME, shark20.Y)
plt.legend(['X-position', 'Y-position'])
plt.xlabel('time')
plt.ylabel('position (m)')
plt.show()

# Plot all of the x-y position data points
plt.plot(shark20.X, shark20.Y, '.-')

# Overlay the points that used less than 3 receiver triangles
plt.plot(shark20[shark20.n < 3].X, shark20[shark20.n < 3].Y, '*')

plt.xlabel('X-position (m)')
plt.ylabel('Y-position (m)')
plt.show()


# Convert DataFrame to GeoDataFrame
## From Longitude/Latitude

In order to overlay our shark data on a map, we first have to convert the regular dataframe to a GeoDataFrame 

In [None]:
# Note: There's extra stuff we don't really need, so we'll create a subset of the original shark dataframe
sub_shark = shark[["TRANSMITTER", "DATETIME", "LAT", "LON", "n", "HPE"]]

# Convert to GeoDataFrame, set geometry from LON/LAT columns
gshark = gpd.GeoDataFrame(sub_shark,
    geometry=gpd.points_from_xy(sub_shark.LON, sub_shark.LAT))

gshark.head()

# Contextily for Mapping

We'll use a package called contextily to generate a basemap for plotting our shark data. 

The package can be installed within your conda environment with the command:
**conda install contextily** 

In [None]:
import contextily as ctx

In [None]:
# Look at map Providers that can be accessed with ctx
ctx.providers.keys()

In [None]:
# Some providers have additional keys for specific map types
ctx.providers.Stamen.keys()

### Specify bounding box of coordinates 

If we know the bounds of the region we'd like to map, for example the field site where shark data was collected, we can download tiles for creating our map of the area.

In [None]:
# Bounding box for Santa Barbara field site, rough estimate
west, south, east, north = (-119.6, 34.35, -119.5, 34.45)

# Download tiles using bound2img
sb_img, sb_ext = ctx.bounds2img(west, south, east, north,
                                ll=True,
                                source=ctx.providers.Stamen.Terrain)

### Render the map

In [None]:
f, ax_sb = plt.subplots(1, figsize=(9, 9))
ax_sb.imshow(sb_img, extent=sb_ext)

In [None]:
# Manually set CRS for shark data
gshark = gshark.set_crs("EPSG:4326")

# Extract each shark's data for plotting individually
shark19 = gshark[gshark.TRANSMITTER == '2020-19']
shark20 = gshark[gshark.TRANSMITTER == '2020-20']
shark21 = gshark[gshark.TRANSMITTER == '2020-21']

type(shark21)

In [None]:
# Set more accurate bounds for plotting
west2, south2, east2, north2 = (-119.58, 34.39, -119.535, 34.425)

# Set x and y limit, based on updated bounds
xlim = ([west2, east2])
ylim = ([south2,  north2])

# Use the plot() method to plot the points of one shark
ax_shark = shark20.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')

# On the same axis, plot the other shark's points
shark19.plot(ax=ax_shark, alpha=0.5, edgecolor='k')
shark21.plot(ax=ax_shark, alpha=0.5, edgecolor='k')

# Set axes limits
ax_shark.set_xlim(xlim)
ax_shark.set_ylim(ylim)

# Add a basemap 
ctx.add_basemap(ax_shark, 
                crs=gshark.crs.to_string(),
                source=ctx.providers.Stamen.Terrain)

# Add a legend and axis labels
ax_shark.legend(["shark20", "shark19", "shark21"])
ax_shark.set_ylabel("Latitude")
ax_shark.set_xlabel("Longitude")
ax_shark.set_title("White Shark Positions during 24h")

#### Note
We can get a more accurate bounding box by looking
at the min/max of the **LON/LAT** columns of the 
*shark.describe()* output but we also want to pad these values 
to get a better sense of where the coastline is.

There also appears to be a limit to how small the region can be. 
This is probably due to the way the map sources create tiles. 

The example above has the smallest bounding box I could set before generating an error. 
Different map Providers may have smaller/larger tiles. 
It should be possible to get a map of a slightly larger area than we need, save the map as an image file, and then only show the (smaller) region when we plot.

## Extra 
The lines of code below are just extra things I started playing with. 

In [None]:
# Get map of some location using Contextily's Place() method
loc = ctx.Place("Claremont, CA", zoom_adjust=0)  # zoom_adjust modifies the auto-zoom

# Print some map metadata
for attr in ["w", "s", "e", "n", "place", "zoom", "n_tiles"]:
    print("{}: {}".format(attr, getattr(loc, attr)))

# Create a subplot figure object with axes="axs"
fig, axs = plt.subplots(1, 3, figsize=(15, 5))

# Plot the map "loc" in axis 0
ctx.plot_map(loc, ax=axs[0])

In [None]:
# Set the source Provider
nightlights = ctx.providers.NASAGIBS.ViirsEarthAtNight2012

# Use the Place() method and the Provider we set above to get a map of California
CA_lights = ctx.Place("California", source=nightlights)

In [None]:
CA_lights.plot()