# Woman in Data Science - Oslo
### 30. April 2024 @ Bankboksen DNB

Welcome to our analytical notebook for the Champagne Coding. In this notebook, we will explore a three different data sources, the first is a sample from Bloomberg's Global Facilities Geolocation dataset, the second is the EMODnet human activity set, and a third one is Copernicu's Global Ocean Biogeochemistry Hindcast. Our focus will be on exploring if is possible to make any connections between the presence of human activity (today we focus on aquaculture) on biogeochemical attributes of the ocean. We will visualize the data and hope to inspire discussions on how these insights could potentially help DNB to advice better their customers on becoming greener. 

In [None]:
import odp.geospatial as odp
import geopandas as gpd
import pandas as pd
import cmocean
import hvplot.xarray
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 
pd.set_option("display.max_columns", None)

In [None]:
db = odp.Database()
db_plt = odp.PlotTools()
gd = odp.GridData()

### Bloomberg Global Geofacilities Dataset

In [None]:
asset_locations_df = pd.read_excel(open('GlobalFacilityGeolocation_Bloomberg.xlsx', 'rb'), sheet_name='GlobalFacilityGeolocation_Bloom')

In [None]:
asset_locations_df[:3]

In [None]:
print(asset_locations_df.columns)

In [None]:
sample_companies= ["Dhofar Fisheries & Food Industries Co", "Dong Won Fisheries Co Ltd", "Norway Royal Salmon ASA", "Salmar ASA", "Shandong Zhonglu Oceanic Fisheries Co Ltd"]
aquaculture_df = asset_locations_df[asset_locations_df.LONG_COMP_NAME.isin(sample_companies)]

In [None]:
# Create a GeoDataFrame by extracting lat/long from the Bloomberg dataframe
from shapely.geometry import Point, Polygon
geometry = [Point(xy) for xy in zip(aquaculture_df['LONGITUDE_OF_LOCATION'], aquaculture_df['LATITUDE_OF_LOCATION'])]
crs = {'init':'epsg:4326'}
geo_df = gpd.GeoDataFrame(aquaculture_df, #specify our data
                          crs=crs, #specify our coordinate reference system
                          geometry=geometry) #specify the geometry list we created

In [None]:
#Let's explore with the embedded visualisation from GeoPandas
geo_df.explore(width=800, height=600)

In [None]:
# Let's look at Aquaculture activities in Norway
aquaculture_norway_df = asset_locations_df[(asset_locations_df.LONG_COMP_NAME.isin(sample_companies)) & (asset_locations_df.COUNTRY_ISO == 'NO')]

In [None]:
geometry = [Point(xy) for xy in zip(aquaculture_norway_df['LONGITUDE_OF_LOCATION'], aquaculture_norway_df['LATITUDE_OF_LOCATION'])]
crs = {'init':'epsg:4326'}
geo_df = gpd.GeoDataFrame(aquaculture_norway_df, #specify our data
                          crs=crs, #specify our coordinate reference system
                          geometry=geometry) #specify the geometry list we created

geo_df.explore(tooltip="LONG_COMP_NAME",  # show "BoroName" value in tooltip (on hover)
               popup=True,  # show all values in popup (on click)
               style_kwds=dict(color="red"),# use blue outline
               width=800, height=600
)

### Emodnet Dataset

In [None]:
df_db = db.datasets
emodnet_list = [name for name in df_db.index if 'Emodnet' in name]
emodnet_list

Example on how to use multiple parameters for querying:
```
df=query('Ocean Biodiversity Information System')
        date_from='2000-01-01',
        date_to='2020-02-01',
        poly='POLYGON ((51.0 3.0, 51.3 3.61, 51.3 3.0, 51.0 3.0))',
        limit=5)

```


In [None]:
# Let's use the marine Finfish
df = db.query(emodnet_list[0])

In [None]:
df.explore(
    tooltip="OWNER_NAME",  # show "BoroName" value in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    style_kwds=dict(color="blue"),  # use black outline
    width=800, height=600
)

In [None]:
#### We can add a filter for just Norway
filter_norway = db.filter_data("COUNTRY", "=", "Norway")
df=db.query('Emodnet HA aquaculture - marine Finfish', filters=[filter_norway])

In [None]:
df[:3]

In [None]:
### We can also just query for a specific region of the country by adding a polygon to our query
poly = "POLYGON ((5.0 59.0, 10 59, 10 64, 5 64, 5 59))"

filter1 = db.filter_data("COUNTRY", "=", "Norway")
gdf=db.query('Emodnet HA aquaculture - marine Finfish',
            filters=[filter1],
            poly=poly)

gdf.explore(width=800, height=600)

## Biogeochemistry dataset from Ocean Hub Platform

In [None]:
gd.datasets.loc["global-ocean-biogeochemistry-hindcast-monthly mean"].database_description

In [None]:
hindcast_ds= gd.open_dataset('global-ocean-biogeochemistry-hindcast-monthly mean')

In [None]:
hindcast_ds

In [None]:
# Let us start with Oxygen, which is an important indicator for Ocean Health..
hindcast_ds.o2

In [None]:
# Pick one specific month (keep in mind that the monthly means are calculated on 15 and 16 of every month (see dimension time)
# Also, let us pick a box by defining longitude and latitude ranges and focus on Nordic countries..
hindcast_ds_slice = hindcast_ds.sel(
            longitude=slice(0,25),
            latitude=slice(50,75),
            time='2020-01-16')

# Let us also pick one depth (alternatively is possible to calculate the mean for all the measurements across the different depths, see commented code)
#hindcast_ds_slice = hindcast_ds_slice.mean('depth')
hindcast_ds_slice = hindcast_ds_slice.isel(depth=0)

In [None]:
# You can calculate the minimum value on a slice..
hindcast_ds_slice.o2.min().compute()

In [None]:
# You can calculate the minimum value on a slice..
hindcast_ds_slice.o2.max().compute()

#### Let's use xarray's plotting function, and [cmocean](https://matplotlib.org/cmocean/)'s color palette

In [None]:
hindcast_ds_slice.o2.plot(figsize=(5,5), cmap=cmocean.cm.oxy)

In [None]:
# Let us look at the whole history from 1999 to 2020..
hindcast_ds_slice_1999_2020 = hindcast_ds.sel(
                                        longitude=slice(0,25),
                                        latitude=slice(50,75),
                                        time=slice('1999-01-16', '2020-12-16'))

In [None]:
hindcast_ds_slice_1999_2020

In [None]:
# Let us pick two specific longitude and latitude and observe the differences depending of the depth!
line1 = hindcast_ds_slice_1999_2020.o2.sel(longitude=5, latitude=60, method="nearest").hvplot.line(label='Onshore Aquaculture')
line2 = hindcast_ds_slice_1999_2020.o2.sel(longitude=0, latitude=60, method="nearest").hvplot.line(label='Offshore Point')

In [None]:
combined_plot = line1 * line2
combined_plot.opts(legend_position='top_left', legend_offset=(0,0), legend_cols=1, height=400)
combined_plot

### For more info on how to plot with hvplot, check this link: https://hvplot.holoviz.org/user_guide/Plotting.html

In [None]:
# Let's analyse for a whole year for 1999...
hindcast_ds_slice_1999 = hindcast_ds.sel(
                                        longitude=slice(0,25),
                                        latitude=slice(50,75),
                                        time=slice('1999-01-16', '1999-12-16'))

In [None]:
# Let's visualise the monthly levels of oxygen
monthly_means_1999 = hindcast_ds_slice_1999.isel(depth=0).groupby("time.month").mean()
fg = monthly_means_1999.o2.plot(
    col="month",
    col_wrap=4,
    cmap=cmocean.cm.oxy,
)

In [None]:
monthly_means_1999.o2.sel(longitude=5, latitude=60, method="nearest").hvplot.line()

In [None]:
# Now, let's compare how did it look 21 years later...
hindcast_ds_slice_2020 = hindcast_ds.sel(
                                        longitude=slice(0,25),
                                        latitude=slice(50,75),
                                        time=slice('2020-01-16', '2020-12-16'))

In [None]:
monthly_means_2020 = hindcast_ds_slice_2020.isel(depth=0).groupby("time.month").mean()
fg = monthly_means_2020.o2.plot(
    col="month",
    col_wrap=4,
    cmap=cmocean.cm.oxy,
)

In [None]:
monthly_means_2020.o2.sel(longitude=5, latitude=60, method="nearest").hvplot.line()

### Some notes: It could be interesting to:
- Look at other variables such as ph(acidity), nppv(CO2), no3 (dissolved nitrogen).
- Overlay two years and see how the monthly fluctuations look like.
- Compare the measurements from one year with respect to another for two different coordinates (one near a aquaculture facility, and another one which is remote from any human activity, which is in the same latitude), use different time-lines from years to months.

In [None]:
# We could for example compare two coordinate points (one with aquaculture activity and one without)
# This function calculates a point that is d km distant from a specfic coordinate with a specific bearing.

from math import asin, atan2, cos, degrees, radians, sin

def get_point_at_distance(lat1, lon1, d, bearing, R=6371):
    """
    lat: initial latitude, in degrees
    lon: initial longitude, in degrees
    d: target distance from initial
    bearing: (true) heading in degrees
    R: optional radius of sphere, defaults to mean radius of earth

    Returns new lat/lon coordinate {d}km from initial, in degrees
    """
    lat1 = radians(lat1)
    lon1 = radians(lon1)
    a = radians(bearing)
    lat2 = asin(sin(lat1) * cos(d/R) + cos(lat1) * sin(d/R) * cos(a))
    lon2 = lon1 + atan2(
        sin(a) * sin(d/R) * cos(lat1),
        cos(d/R) - sin(lat1) * sin(lat2)
    )
    return (degrees(lat2), degrees(lon2),)

In [None]:
# Let us visualise the two coordinates in a map
print(get_point_at_distance(63.779, 8.521, 500, 270, R=6371))

data = [['Salma aquaculture', 63.779, 8.521], ['No activity', 62.38, -1.57]]
 
# Create the pandas DataFrame
comparison_df = pd.DataFrame(data, columns=['LOCATION', 'LATITUDE', 'LONGITUDE'])

geometry = [Point(xy) for xy in zip(comparison_df['LONGITUDE'], comparison_df['LATITUDE'])]
crs = {'init':'epsg:4326'}
geo_df = gpd.GeoDataFrame(comparison_df, #specify our data
                          crs=crs, #specify our coordinate reference system
                          geometry=geometry) #specify the geometry list we created

geo_df.explore(tooltip="LOCATION",  # show "BoroName" value in tooltip (on hover)
               popup=True,  # show all values in popup (on click)
               style_kwds=dict(color="red"),# use blue outline
               width=800, height=600
)

In [None]:
# Longitud and Latitude of "Salma" and "No activity"
lats = [63.779, 62.38]
lons = [8.521, -1.57]

### Let us plot the time series for these two different locations when it comes to o2

In [None]:
# Salma location
hindcast_ds_slice_1999_2020.o2.sel(longitude=lons[0], latitude=lats[0], method="nearest").hvplot.line()

In [None]:
# No activity location
hindcast_ds_slice_1999_2020.o2.sel(longitude=lons[1], latitude=lats[1], method="nearest").hvplot.line()

### Let us plot the time series for these two different locations when it comes to ph

In [None]:
# Salma
hindcast_ds_slice_1999_2020.ph.sel(longitude=lons[0], latitude=lats[0], method="nearest").hvplot.line()

In [None]:
# No activity location
hindcast_ds_slice_1999_2020.ph.sel(longitude=lons[1], latitude=lats[1], method="nearest").hvplot.line()

### Let us plot the time series for these two different locations when it comes to no3

In [None]:
# Salma
hindcast_ds_slice_1999_2020.no3.sel(longitude=lons[0], latitude=lats[0], method="nearest").hvplot.line()

In [None]:
# No Activity
hindcast_ds_slice_1999_2020.no3.sel(longitude=lons[1], latitude=lats[1], method="nearest").hvplot.line()

In [None]:
### Let us plot the time series for these two different locations when it comes to CO2 concentration in the water

In [None]:
# Salma
hindcast_ds_slice_1999_2020.nppv.sel(longitude=lons[0], latitude=lats[0], method="nearest").hvplot.line()

In [None]:
# No Activity
hindcast_ds_slice_1999_2020.nppv.sel(longitude=lons[1], latitude=lats[1], method="nearest").hvplot.line()

## Alternative datasets...

In [None]:
# This one is a daily measurements... it takes longer to load.
print("global-ocean-biogeochemistry-hindcast-daily mean: ", 
      gd.datasets.loc["global-ocean-biogeochemistry-hindcast-daily mean"].database_description,
     "\n")

# This one is a forecast based on some models and are not observations, but they contain more attributes.
print('global-analysis-forecast-bio-001-028-monthly', 
      gd.datasets.loc['global-analysis-forecast-bio-001-028-monthly'].database_description,
     "\n")

In [None]:
ds= gd.open_dataset('global-analysis-forecast-bio-001-028-monthly')

In [None]:
ds_slice2022 = ds.sel(longitude=slice(0,25),
                      latitude=slice(50,75),
                      time=slice('2022-01-01', '2022-12-31'))

In [None]:
ds_time = ds_slice2022.isel(depth=0)

In [None]:
ds_time.ph.sel(longitude=101, latitude=101, method="nearest").hvplot.line()

In [None]:
ds_time.o2.sel(longitude=101, latitude=101, method="nearest").hvplot.line()

In [None]:
ds_time.ph.hvplot(groupby='time',
                    widget_type='scrubber', 
                    widget_location='bottom',
                    width=600, cmap='greens')

In [None]:
ds_time.o2.hvplot(groupby='time',
                    widget_type='scrubber', 
                    widget_location='bottom',
                    cmap='reds',
                    width=600)