# pyobistools: Tools for data enhancement and quality control - for python!

## Installation
---
Installing `pyobistools` requires going to https://github.com/cioos-siooc/pyobistools/

In [None]:
import sys
import pandas as pd
import numpy as np
from ckanapi import RemoteCKAN
NaN = np.nan
from pyobistools.taxa import *
from pyobistools.validation.check_fields import check_fields
from pyobistools.validation.check_eventids import *
from pyobistools.validation.check_onland import *
from pyobistools.validation.check_scientificname_and_ids import *
import plotly.express as px
import requests
pd.set_option('max_colwidth', None)

You can also install `pyobis` - instructions: https://github.com/iobis/pyobis/blob/main/README.md

In [None]:
! pip install pyobis

In [None]:
from pyobis import dataset
from pyobis import occurrences

## Taxon matching
---
`search_worms()` searches for records based on a list of scientific names and returns a standardized pandas DataFrame representing the results 

In [None]:
names = ["Abra alva", "Buccinum fusiforme", "Buccinum fusiforme", "Buccinum fusiforme", "hlqsdkf"]
search_worms(names)

## Check required fields
---
`check_fields(data, level, analysis_type, accepted_name_usage_id_check)` will check all OBIS requirements are present for a given core or extension.
- **data** = the input data as a pandas DataFrame
- **level** = `error` or `warning`, the difference between requirements not being met or recommendations not being met
- **analysis_type** = `event_core`, `occurrence_core`, `occurrence_extension`, or `extended_measurement_or_fact_extension`
- **accepted_name_usage_id_check** = `True` or `False` will filter out unaccepted scientific name ids

In [None]:
data = pd.DataFrame(columns = ["occurrenceID","sciientificName","locality","minimumDepthInMeters"])
data["occurrenceID"] = [1,2,3]
data["scientificName"] = ["Abra alba", "NA", "NA"]
data["locality"] = ["North Sea", "English Channel", "Flemish Banks"]
data["minimumDepthInMeters"] = [10,None,5]

check_fields(data, 'error', 'occurrence_core', 'False')

## Plot points on a map
---
`not found`

## Check on-land
---
`check_onland(data, land, report, buffer, offline)` will check whether given points are over land or not.

- **data** = the input data as a pandas DataFrame
- **land** = a custom land polygon to check against. If not provided, use Natural Earth.
- **report** = if True, errors returned instead of records
- **buffer** = set how far inland points are still to be deemed valid
- **offline** = if True, a local simplified shoreline is used, otherwise an OBIS webservice is used. Default is False

In [None]:
query = dataset.search(scientificname = 'Prionace glauca')
data = query.execute() # or query.data
data

In [None]:
## Grab Mola mola occurrences from OBIS web server
data = occurrences.search(scientificname = 'Prionace glauca').execute()

In [None]:
data.head(2) # shows an example of the data package

In [None]:
## Passes OBIS data through a check to see whether any values may be on land or not
on_land = check_onland(data, offline=True) # potentially 235 observations that might be on land

In [None]:
on_land

In [None]:
# Percent of on-land entries.
on_land['on_land'].value_counts(normalize=True).mul(100).astype(str)+'%'

In [None]:
# Request the report instead of the offending rows only
check_onland(data, report = True)