# On the fly analysis

### Data ingest from google cloud storage
Data ingestion information is in the cloud_function documentation.

```bash
ogr2ogr -progress \
  -makevalid -overwrite \
  -nln eez_minus_mpa -nlt PROMOTE_TO_MULTI \
  -lco GEOMETRY_NAME=the_geom \
  -lco PRECISION=FALSE \
  -lco SPATIAL_INDEX=GIST \
  -lco FID=id \
  -t_srs EPSG:4326 -a_srs EPSG:4326 \
  -f PostgreSQL PG:"host=$POSTGRES_HOST port=$POSTGRES_PORT \
   user=$POSTGRES_USER password=$POSTGRES_PASSWORD \
   dbname=$POSTGRES_DB active_schema=$POSTGRES_SCHEMA" \
   -doo "PRELUDE_STATEMENTS=CREATE SCHEMA IF NOT EXISTS $POSTGRES_SCHEMA AUTHORIZATION CURRENT_USER;" "/vsizip/vsigs/$URL";
```


## Data analysis
Input call:

```bash
curl 'https://30x30.skytruth.org/functions/analysis/' \
  -H 'content-type: application/json' \
  --data-raw '{"id":"d7c9978f92fff5a373f2dec55e17bbab","type":"Feature","properties":{},"geometry":{"coordinates":[[[-22.791446197507895,9.57642078480319],[-13.563358667514933,11.633521660754937],[0.7068797809280056,5.048301327696478],[-7.5698585191689745,-4.074667696937624],[-22.791446197507895,9.57642078480319]]],"type":"Polygon"}}' \
  --compressed
```

Response:
    
```json
{locations_area:{"code":<location_iso>, "protected_area": <area>, "area":<location_marine_area>}, "total_area":<total_area>}
```


## Data preprocessing

We are going to use the intermidiate data from mpas and from eez, in order to create a dataset that can be used for spatial analysis.
The steps are:
1. Load both datasets
2. Create a difference dataset from the two (Substract the mpas from the eez)
3. disaggregate the eez dataset based on the iso3 codes
4. upload the data to google cloud storage


In [16]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [19]:
import logging
import sys
from pathlib import Path
import pandas as pd
import geopandas as gpd

scripts_dir = Path("../..").joinpath("src")
if scripts_dir not in sys.path:
    sys.path.insert(0, scripts_dir.resolve().as_posix())

from helpers.settings import get_settings
from helpers.file_handler import FileConventionHandler
from helpers.utils import download_and_unzip_if_needed, rm_tree, make_archive, writeReadGCP

from pipelines.processors import clean_geometries, create_difference

from pipelines.output_schemas import MPAsTableOTFSchema

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("requests").setLevel(logging.WARNING)
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("fiona").setLevel(logging.WARNING)
logging.getLogger("fiona").setLevel(logging.WARNING)

In [3]:
mysettings = get_settings()
prev_step = "preprocess"
current_step = "onthefly"

In [4]:
pipe = "mpa"

pipe_dir_eez = FileConventionHandler("eez")
pipe_dir_mpas = FileConventionHandler(pipe)

output_dir = pipe_dir_mpas.get_processed_step_path(current_step)
output_file = output_dir.joinpath("eez_minus_mpa.shp")
zipped_output_file = pipe_dir_mpas.get_step_fmt_file_path(current_step, "zip", True)
remote_path = pipe_dir_mpas.get_remote_path(current_step)

# Download the EEZ file && unzip it
download_and_unzip_if_needed(pipe_dir_eez, prev_step, mysettings)
# Download the mpas file && unzip it
download_and_unzip_if_needed(pipe_dir_mpas, prev_step, mysettings)

# Load the data
eez = gpd.read_file(pipe_dir_eez.get_step_fmt_file_path(prev_step, "shp")).pipe(clean_geometries)
mpas = gpd.read_file(pipe_dir_mpas.get_step_fmt_file_path(prev_step, "shp")).pipe(clean_geometries)

/home/mambauser/data/eez/processed/eez_preprocess.zip
/home/mambauser/data/eez/processed/preprocess
/home/mambauser/data/mpa/processed/mpa_preprocess.zip
/home/mambauser/data/mpa/processed/preprocess


In [5]:
a_minus_b = await create_difference(eez, mpas)

  0%|          | 0/282 [00:00<?, ?it/s]

100%|█████████▉| 281/282 [06:32<00:53, 53.34s/it]

<class 'shapely.geometry.base.GeometrySequence'>


100%|██████████| 282/282 [07:02<00:00,  1.50s/it]


In [6]:
a_minus_b

Unnamed: 0,MRGID,GEONAME,POL_TYPE,AREA_KM2,ISO_SOV1,ISO_SOV2,ISO_SOV3,geometry
0,8444.0,American Samoa Exclusive Economic Zone,200NM,405830.0,USA,,,"MULTIPOLYGON (((-166.64194 -17.555, -166.651 -..."
1,8379.0,Ascension Exclusive Economic Zone,200NM,446005.0,GBR,,,"MULTIPOLYGON (((-10.93296 -7.90389, -10.93294 ..."
2,8446.0,Cook Islands Exclusive Economic Zone,200NM,1969553.0,NZL,,,"MULTIPOLYGON (((-158.75396 -6.13852, -159.2757..."
3,8389.0,Overlapping claim Falkland / Malvinas Islands:...,Overlapping claim,550566.0,GBR,ARG,,"MULTIPOLYGON (((-59.14325 -55.75011, -58 -55.7..."
4,8440.0,French Polynesian Exclusive Economic Zone,200NM,4766689.0,FRA,,,"MULTIPOLYGON (((-135.92905 -7.89648, -135.9282..."
...,...,...,...,...,...,...,...,...
277,62589.0,Chagos Archipelago Exclusive Economic Zone,200 NM,650804.0,MUS,,,"POLYGON ((75.83452 -5.23039, 75.8326 -5.31997,..."
278,8383.0,Overlapping claim South Georgia and South Sand...,Overlapping claim,1237783.0,GBR,ARG,,"MULTIPOLYGON (((-35.63012 -50.8306, -35.61631 ..."
279,8402.0,Bermudian Exclusive Economic Zone,200NM,464389.0,GBR,,,"MULTIPOLYGON (((-60.70499 32.39067, -60.705 32..."
280,8456.0,United States Exclusive Economic Zone,200NM,2451023.0,USA,,,"MULTIPOLYGON (((-67.28403 45.19125, -67.284 45..."


In [20]:
MPAsTableOTFSchema(a_minus_b[~(a_minus_b.geometry.is_empty == True)]).to_file(
    filename=output_file.as_posix()
)
# zip data
make_archive(output_file.parent, zipped_output_file)

# clean unzipped files
(
    rm_tree(pipe_dir_mpas.get_processed_step_path(current_step))
    if pipe_dir_mpas.get_processed_step_path(current_step).exists()
    else None
)

# LOAD
## load zipped file to GCS
writeReadGCP(
    credentials=mysettings.GCS_KEYFILE_JSON,
    bucket_name=mysettings.GCS_BUCKET,
    blob_name=remote_path,
    file=zipped_output_file,
    operation="w",
)

INFO:pyogrio._io:Created 277 records
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
