# Florida Keys Taxonomic Analysis
Taxonomic Analysis - Florida Keys

### importing modules

In [18]:
import geopandas as gpd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns
import cartopy as crt
import geoplot as gplt
%matplotlib inline

In [19]:
import plotly.express as px
import plotly.graph_objects as go

We'll not need `pyobis` for this notebook since we have an attached csv data source that had already been fetched using it.

In [20]:
df = pd.read_csv("../input/obis-florida-keys-occurrence-records-19972012/florida-keys-1997-2012.csv")

KeyboardInterrupt: 

## Notes before we proceed
> To be specific, these are notes for me to read before I proceed with the analysis. It helps me to stay focused in what work I'll do.

This is a **seascapes analysis** and not a **species analysis** in particular.

1. seascapes as a product, we need not to generate <- ERDDAP server
2. depth change due to surface phenomenon?
3. if spatial smapling did change? did the sampling change over time?
4. characterize each seascapes?
5. holvert index, refractoring curve. investigate richness. adjusting for sampling bias
6. seascapes -> surface parameters, may have meaning to depth
7. for things to be easier, focus on surface at the time
8. might add a biologist into the discussion
9. corals in area where certain seascapes are dominant <- related to surface phenomenon
10. ideal taxon, phytoplankton since attached to seascapes <- more or less productive based on seascape
11. one that moves and doesn't stay at the same position
12. investigate MoF records as well
13. may also look at GBIF
14. research questions
+ what species live in these different seascapes?
    + taxonomic dist, whether a part. seascapes suggest occurrence of one species
+ what are the biological traits of these species?
+ different in the amount of biodiversity they support?

15. as seascapes made by satellite people not biodiversity people; it is a common research question.
16. how to adjust sampling bias?
+ ES50 metric; https://esajournals.onlinelibrary.wiley.com/doi/epdf/10.2307/1934145?saml_referrer might be helpful

17. potential data source for seascape data: [ERDDAP](http://erddap.com/#search=seascape)
+ 8-day is best <- temporal resolution
+ 1 km resolution <- spatial resolution
+ resolutions should be comparable with OBIS data, 
    + check if that uncertainity is whithin a seascape analysis
18. what we can do in the research
+ might exclude if uncertainity is huge, need not skip blanks
+ perform a correlation test
+ what does individualDensity mean?
    + sometimes, NERC (https://vocab.nerc.ac.uk/search_nvs/);
    + a deep rabbithole to get deep down
+ individualCount v/s organismQuantity? 
    + biodata mobilisation (https://ioos.github.io/bio_mobilization_workshop/04-create-schema/index.html)
+ there could be different organismQuantity?
19. Reading Sources
+ Read here about OBIS: https://github.com/iobis/manual
+ MoF Viewer: https://mof.obis.org/, not well standardised
+ we can ask questions on slack <- https://github.com/ioos/bio_data_guide
    + very much active and experts stay in here

### Cleaning the data
Our data looks messy and before we prepare our sunburst, let's clean our dataset first. 

We'll remove/replace some NaNs and remove unwanted rows & columns.

In [None]:
df.drop("Unnamed: 0", axis=1, inplace=True)

In [None]:
df.head()

> **The heirarchy we are following** 
> 
> kingdom -> phylum -> class -> order -> family -> genus -> species

In [None]:
# when kingdom is NaN
df[df["kingdom"].isna()].T

Let us drop this.

In [None]:
df = df.drop(index=774841)

Let us now look at empty phylum

In [None]:
df[df["phylum"].isna()]

In [None]:
# pandas consumes a lot of memory unneccessarily
import gc
gc.collect()

Calling garbage collector because there are some memory leaks with pandas, we saved whooping 4 GB just after running this command.

Let us fix all `NaNs` to `str(None)`. That will fix all errors that will come.

In [None]:
df[['']]

In [30]:
# again calling to avoid a crash on Kaggle
gc.collect()

23

## Taxonomic Distribution
Using plotly sunburst.

In [31]:
fig = px.sunburst(df,
    path=["kingdom", "phylum", "class", "order", "family", "genus","species"],
    width=750, height=750,
    title="Taxonomic Pie Chart",
)
fig.show()

ValueError: ('None entries cannot have not-None children', kingdom        Animalia
phylum         Annelida
class        Polychaeta
order               nan
family     Aberrantidae
genus               nan
species             nan
Name: 50001, dtype: object)

In [32]:
df.loc[50001:50001,:]

Unnamed: 0,infraphylum,country,date_year,habitat,references,scientificNameID,scientificName,dropped,gigaclassid,aphiaID,...,startDayOfYear,otherCatalogNumbers,footprintSRS,associatedSequences,locationRemarks,behavior,verbatimDepth,taxonConceptID,subtribeid,subtribe
50001,,,1997,,,urn:lsid:marinespecies.org:taxname:233984,Aberrantidae,False,,233984,...,,,,,,,,,,
