# Overview
This notebook intends to create a model that can be used to predict the concentration of microplastics, given the latitude, longitude and timestamp. For this your group will create 3 different models:
 - model 1: To predict the horizontal velocity(u) vector od the given latitude and longitude
 - model 2: To predict the vertical velocity(v) vector of the given latitude and longitude
 - model 3: To predict the concentration of microplastics of the given latitude, longitude, u, v and timestamp

## Data Explanation
The data using in this notebook was obtained from the [EARTHDATA](https://gpm.gsfc.nasa.gov/data/).
And all data was provide in the form of a netCDF file.

### Ocean Currents
This dataset describes the ocean currents at the surface of the Earth, between 01/01/1993 and 01/01/2021.
Link to the dataset: [Ocean Surface Current Analyses Real-time](https://search.earthdata.nasa.gov/search/granules?p=C2098858642-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&tl=1648086157.373!3!!&fs10=Ocean%20Currents&fsm0=Ocean%20Circulation&fst0=Oceans)
Documentation: [Ocean Currents](https://podaac-tools.jpl.nasa.gov/drive/files/allData/oscar/L4/oscar_v2.0/docs/oscarv2guide.pdf)
Download: [Ocean Currents](https://search.earthdata.nasa.gov/downloads/5030564349)

#### Ranges
The data covers -89.75° to 89.75° latitude and 0° to 359.75° longitude
01/01/1993 and 01/01/2021.

#### Usage
For the usage we catch data from 2010 to 2021.


### Microplastics
This dataset describes the concentration of microplastics in the ocean.

Link to the dataset: [CYGNSS L3 Ocean Microplastic Concentration V1.0](https://search.earthdata.nasa.gov/search/granules/collection-details?p=C2142677420-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&tl=1648079405.876!3!!&fsm0=Water%20Quality&fst0=Oceans)
Documentation: [Microplastics Doc](https://podaac-tools.jpl.nasa.gov/drive/files/allData/cygnss/L3/docs/148-0402-1_L3_Microplsatic_ATBD_Released.pdf)
Data Description: [Microplastics Data Description](https://podaac-tools.jpl.nasa.gov/drive/files/allData/cygnss/L3/docs/148-0401-2_L3_Microplastic_netCDF_Data_Dictionary.xlsx)
Download: [Microplastics Data](https://search.earthdata.nasa.gov/downloads/9368584333)

#### Columns
- **MP_concentration(Microplastic Concentration):**
  Near-surface ocean microplastic number density geometrically averaged over the 1 x 1 degree cell centered on lat and lon and averaged over one month centered on Timestamp, as derived from anomalies in CYGNSS L2 MSS samples.

- **stdev_MP_samples(Geometric standard deviation of microplastic concentration samples within spatiotemporal bin):**
  The geometric standard deviation of the individual samples of microplastic concentration geometrically averaged together to produce the monthly 1x1 deg L3 gridded product.

- **num_MP_samples(Number of microplastic concentration samples within spatiotemporal bin):**
  The number of individual samples of microplastic concentration geometrically averaged together to produce the monthly 1x1 deg L3 gridded product.

#### Ranges
The data covers -37° to 37° latitude and 0° to 359.75° longitude
The data covers the dates between 02/04/2017 and 25/09/2018

#### Usage
For the usage we catch all the data.


In [1]:
import pandas as pd
import xarray as xr

sea_surferce = xr.open_dataset('/Users/vitorhugo/Desktop/facu.nosync/PER_4/EAM/TPs/main/data/Sea_surface/oscar_currents_final_20170402.nc').to_dataframe()

sea_microplastic = xr.open_dataset('/Users/vitorhugo/Desktop/facu.nosync/PER_4/EAM/TPs/main/data/Microplastics/cyg.ddmi.s20170402-000000-e20170430-000000.l3.grid-microplastic.a10.d10.nc',
                                  decode_cf=False ).to_dataframe()

In [2]:
sea_microplastic.reset_index(inplace=True)
sea_surferce.reset_index(inplace=True)

In [3]:
sea_microplastic.time = pd.to_datetime(sea_microplastic.time, unit='s')

In [27]:
sea_surferce.dropna()

Unnamed: 0,latitude,longitude,time,lat,lon,u,v,ug,vg
67023,46,783,2020-02-27,-78.25,195.75,-0.043907,0.003295,-0.040121,0.005221
67024,46,784,2020-02-27,-78.25,196.00,-0.044057,0.022361,-0.040522,0.024803
68426,47,746,2020-02-27,-78.00,186.50,-0.032439,-0.036792,-0.023662,-0.033061
68427,47,747,2020-02-27,-78.00,186.75,-0.034885,-0.026592,-0.026257,-0.022218
68428,47,748,2020-02-27,-78.00,187.00,-0.036149,-0.014383,-0.027586,-0.010049
...,...,...,...,...,...,...,...,...,...
983562,683,42,2020-02-27,81.00,10.50,-0.058170,0.010718,-0.021292,0.023257
983563,683,43,2020-02-27,81.00,10.75,-0.057173,0.014449,-0.020160,0.025626
983564,683,44,2020-02-27,81.00,11.00,-0.053890,0.018745,-0.019273,0.029720
983565,683,45,2020-02-27,81.00,11.25,-0.053145,0.016655,-0.017927,0.030910


In [5]:
sea_surferce.time = sea_surferce.time.apply(lambda x: x.strftime('%Y-%m-%d'))

In [6]:
sea_surferce.time = pd.to_datetime(sea_surferce.time)

In [10]:
pd.merge(sea_surferce, sea_microplastic, on=['time', 'lat', 'lon']).drop(['latitude', 'longitude'], axis=1)

Unnamed: 0,time,lat,lon,u,v,ug,vg,MP_concentration,stdev_MP_samples,num_MP_samples
0,2017-04-02,-37.0,0.00,0.047021,-0.118582,0.048551,-0.156574,7.674302e+05,105.438815,533
1,2017-04-02,-37.0,0.25,0.025014,-0.101328,0.025305,-0.139278,8.058970e+05,141.690521,501
2,2017-04-02,-37.0,0.50,0.007516,-0.041009,0.007431,-0.079920,7.340737e+05,154.208858,488
3,2017-04-02,-37.0,0.75,-0.002129,0.059129,-0.001356,0.019361,5.379874e+05,389.446705,464
4,2017-04-02,-37.0,1.00,-0.005274,0.149390,-0.004920,0.111028,3.908790e+05,592.544665,429
...,...,...,...,...,...,...,...,...,...,...
427675,2017-04-02,37.0,358.75,-0.025552,-0.060860,-0.022493,-0.059385,1.162185e+08,258.126342,125
427676,2017-04-02,37.0,359.00,-0.005789,-0.103711,-0.004418,-0.103496,1.467700e+08,357.508697,155
427677,2017-04-02,37.0,359.25,-0.003856,-0.137441,-0.003264,-0.138371,8.054956e+07,410.582070,153
427678,2017-04-02,37.0,359.50,-0.011238,-0.135992,-0.011799,-0.135844,2.389635e+07,413.973273,141


In [None]:
# lat, long, date -> [modelo 1] = u
# lat, long, date -> [modelo 2] = v
# lat, long, date, u, v -> [modelo 3] = microplastic

# Dataset 1 - Ocean Surface Currents:
# https://search.earthdata.nasa.gov/search/granules/collection-details?p=C2098858642-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&tl=1647902537.793!3!!&fs10=Ocean%20Currents&fsm0=Ocean%20Circulation&fst0=Oceans&m=-21.297179625649704!-124.31249999999999!0!1!0!0%2C2

# Train: 2016 - 2020
# Validation: 2021


# date -> seson -> seson, lat, long -> [modelo]