<center>
<table>
  <tr>
    <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Courses</font></h1>
</center>

---

<center><h1><font color="red" size="+3">Introduction to MovingPandas</font></h1></center>

## Reference Documents

* [MovingPandas](https://github.com/movingpandas/movingpandas)
* [MovingPandas - Examples](https://movingpandas.org/examples)
* [BYU/NIC iceberg database](https://movingpandas.github.io/movingpandas-website/2-analysis-examples/iceberg.html)
* [Tutorial 3 - Trajectory data mining in Python](https://sustainability-gis.readthedocs.io/en/latest/lessons/L3/mobility-analytics.html)

_______

# <font color="red"> Objectives</font>


# <font color="red"> What is MovingPandas? </font>
![fig_logo](https://movingpandas.github.io/movingpandas/assets/img/logo-wide.svg)

- A Python library for handling the movement of geospatial objects.
- Provides trajectory data structures and functions for movement data exploration and analysis.
- It is based based on Pandas, GeoPandas, and HoloViz.

# <font color="red"> Features of MovingPandas</font>

- Create trajectories from diverse sources, including CSV files, GIS file formats, (Geo)DataFrames, and OGC Moving Features JSONs (MF-JSON)
- Find locations for given time stamps and time spans
- Compute movement speed, direction, and sampling intervals
- Detect and extract stops
- Split trajectories into individual trips
- Clean, simplify, generalize, and aggregate trajectories
- Create static and interactive visualizations

![fig_sample](https://user-images.githubusercontent.com/590385/137953765-33f9ce1b-037c-4c86-82b2-0620de5ca28f.gif)

---

## Required Packages


- __Matplotlib__: for basic plots.
- __Pandas__: Manipulation and exploratory data analysis of tabular data.
- __Shapely__: For manipulation and analysis of planar geometric objects
- __GeosPandas__: Combines the capabilities of Pandas and Shapely for geospatial operations
- __MovingPandas__: Handling the movement of geospatial objects.

----

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
import datetime as dt
from pathlib import Path
import requests

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker

In [None]:
import numpy as np
import h5py
import pandas as pd
import geopandas as gpd

In [None]:
from shapely import geometry as shpgeom
from shapely import wkt as shpwkt

In [None]:
import movingpandas as mpd

In [None]:
import holoviews as hv

In [None]:
import hvplot.pandas 

In [None]:
plot_defaults = {'linewidth':5, 'capstyle':'round', 'figsize':(9,3), 'legend':True}
hv.opts.defaults(hv.opts.Overlay(active_tools=['wheel_zoom'], 
                              frame_width=500, frame_height=400))
hvplot_defaults = {'tiles':None, 'cmap':'Viridis', 'colorbar':True}

In [None]:
mpd.show_versions()

In [None]:
def download_remote_file(url: str):
    """
    Download remote binary file.
    
    Parameters
    ----------
    url : str
       HTTP address of the file we want to download
    """
    filename = Path(url).name # get the file name
    resp = requests.get(url, stream=True)
    if resp.status_code == 200:
        print(f"Downloading remote file: {filename}")
        with open(filename, 'wb') as fid:
            fid.write(resp.content)
        print("Done dowloading...")
    else:
        print("URL does not exist")

# <font color="red"> Understanding Trajectory</font>

A trajectory:
- A time-ordered series of geometries.
   - The geometries and associated attributes are stored in a GeoPandas GeoDataFrame.
- __Can be seen as a sequence of points that specify the position of a moving object in space and time__.
- Can have a parent trajectory and can itself be the parent of successive trajectories.
- Can represent its data either as point-based or as line-based. 
- A segment is a part of the trajectory that contains a list of episodes. 
    - Each episode has a starting and ending timestamp, a segmentation criterion (annotation type), and an episode annotation. 
    - For instance, an annotation type can be the “weather conditions”, and an episode annotation can be “a storm”, “heavy rain”, “extremely high waves”, etc.

### Method for Trajectory

| Method for Trajectory | Description |
| ---- | ---- |
| `set_crs(crs)` | |
| `has_parent()` | boolean |
| `is_latlon()` | boolean |
| `to_linestring()` | shapely.geometry.LineString |
| `to_linestringm_wkt()` | string |
| `get_start_location()` | shapely.geometry.Point |
| `get_end_location()` | shapely.geometry.Point |
| `get_bbox()` | (minx, miny, maxx, maxy) tuple |
| `get_start_time()` | datetime |
| `get_end_time()` | datetime |
| `get_duration()` | timedelta |
| `get_length()` | float |
| `get_direction()` | float |
| `get_row_at(timestamp, method='nearest')` | pandas.Series |
| `get_position_at(timestamp, method='nearest')` | shapely.geometry.Point |
| `interpolate_position_at(timestamp)` | shapely.geometry.Point |
| `get_linestring_between(timestamp1, timestamp2)` | shapely.geometry.LineString |
| `get_segment_between(timestamp1, timestamp2)` | Trajectory |
| `add_direction()` | |
| `add_distance()` | Add a new distance column in the trajectory's GeoDataFrame |
| `add_speed()` | Add a new speed column in the trajectory's GeoDataFrame|
| `add_acceleration()` | Add a new acceleration column in the trajectory's GeoDataFrame |
| `make_line(df)` | shapely.geometry.LineString |
| `clip(shapely.geometry.polygon)` | Trajectory |
| `intersection(fiona.feature)` | Trajectory |

# <font color="red"> First Example: Create a simple trajectory</font>

- Trajectory objects consist of a trajectory ID and a GeoPandas GeoDataFrame with a DatetimeIndex. 
- The DataFrame represents the trajectory data as a Pandas time series with associated point locations.

#### Create a time series Pandas DataFrame of Shapely Point object.
 - We represent a moving object at locations (`positions` list) and at the dates/times (`dates` list).

In [None]:
latitudes = [0, 0, 50, 75]
longitudes = [0, 50, 50, 75]
positions = [shpgeom.Point(x,y) for x, y in zip(longitudes, latitudes)]

dates = [dt.datetime(2023,10,1,9,0,0),  dt.datetime(2023,10,1,9,6,0),
         dt.datetime(2023,10,1,9,10,0), dt.datetime(2023,10,1,9,15,0)]

dates = [dt.datetime(2023,10,1,9,min,0) for min in [0, 6, 10, 15]]

In [None]:
df = pd.DataFrame({'geometry': positions, 't': dates})
df

Set the `t` column as index:

In [None]:
df = df.set_index('t')
df

#### Convert the Pandas DataFrame into a GeoPandas GeoDataFrame

In [None]:
gdf = gpd.GeoDataFrame(df)#, crs=31256)
gdf

What is the CRS?

In [None]:
gdf.crs is None

#### Convert the GeoPandas GeoDataFrame into a MovinPandas trajectory


```python
Trajectory(df, traj_id, 
           obj_id=None, t=None, x=None, y=None, 
           crs='epsg:4326', parent=None)
```

- __df__ (GeoDataFrame or DataFrame) – GeoDataFrame with point geometry column and timestamp index
- __traj_id__ (any) – Trajectory ID
- __obj_id__ (any) – Moving object ID
- __t__ (string) – Name of the DataFrame column containing the timestamp
- __x__ (string) – Name of the DataFrame column containing the x coordinate
- __y__ (string) – Name of the DataFrame column containing the y coordinate
- __crs__ (string) – CRS of the x/y coordinates
- __parent__ (Trajectory) – Parent trajectory

In [None]:
simple_traj = mpd.Trajectory(gdf, traj_id=1)

In [None]:
print(simple_traj)

#### Get the trajectory GeosDataFrame

In [None]:
simple_traj.df

Plot the trajectory GeosDataFrame:

In [None]:
simple_traj.df.plot();

#### Convert the trajectory into GeoPandas GeoDataFrame of Shapely Line objects

In [None]:
lines = simple_traj.to_line_gdf()
lines

In [None]:
lines.plot();

In [None]:
lines_wkt = simple_traj.to_traj_gdf(wkt=True)
lines_wkt

In [None]:
lines_wkt.plot()

#### Compute the sampling interval (median time difference between records):

In [None]:
simple_traj.get_sampling_interval()

#### Quick plot using the trajectory object

In [None]:
simple_traj.plot();

## <font color="blue">Extracting information from the Trajectory</font>

In [None]:
simple_traj.get_crs() is None

Start/End Locations

In [None]:
str(simple_traj.get_start_location())

In [None]:
str(simple_traj.get_end_location())

In [None]:
simple_traj.get_length()

Start/End Times

In [None]:
simple_traj.get_start_time()

In [None]:
simple_traj.get_end_time()

In [None]:
simple_traj.get_duration()

## <font color="blue">Processing the Trajectory</font>

- We compute the distance, speed, and acceleration of movement along the trajectory (between consecutive points). 
- The paramters are added as new columns.

#### Distance
- Add `distance` column and values to the trajectory’s DataFrame.
    - Compute the distance to each point from the previous.
- Use the `add_distance` function.

```python
add_distance(overwrite=False, name='distance', units=None)
```
- The default distance units are meters (or CRS units, if the CRS units are not known or specified), and the default time units are seconds.
- A few units for distance: `"dm"`, `"cm"`, `"m"` (default), `"ft"`, `"yd"`, `"km"`, `"mi"`.

In [None]:
simple_traj.add_distance()

In [None]:
simple_traj.df

You can rename the column and specify the units:

In [None]:
simple_traj.add_distance(overwrite=True, name="distance (cm)", units="cm")
simple_traj.df

#### Speed
- Add `speed` column and values to the trajectory’s DataFrame.
    - Compute the speed to each point from the previous
- Use the `add_speed` function:

```python
add_speed(overwrite=False, name='speed', 
          units=UNITS(distance=None, time=None, 
                      time2=None, crs=None))
```
- Units for time: `"s"`, `"m"`, `"h"`, `"d"`, `"y"`.

In [None]:
simple_traj.add_speed(overwrite=True, 
                      name="speed (km/s)", units=("km", "s"))

simple_traj.df

#### Acceleration
- Add acceleration column and values to the trajectory’s DataFrame.
    - Compute he acceleration to each point from the previous.
- Use the `add_acceleration` function

```python
add_acceleration(overwrite=False, name='acceleration', 
                 units=UNITS(distance=None, time=None, 
                             time2=None, crs=None))
```

In [None]:
simple_traj.add_acceleration(overwrite=True, 
                             name="acceleration (mph/s)", 
                             units=("mi", "h", "s"))

simple_traj.df

### Visualization

We can can turn the trajectory into a linestring:

In [None]:
simple_traj.to_linestring()

We can directly call the `plot()` function that draws each line segment individually.

In [None]:
simple_traj.plot()

We can visualize the speed values where each line segment is colored:

In [None]:
simple_traj.plot(column="speed (km/s)", linewidth=5, 
                 capstyle='round', legend=True)

Create an inteactive plot with `hvplot`:

In [None]:
simple_traj.hvplot()

We can select the background image with the by setting the `tiles` parameter with one of the options:

   ‘CartoDark’, ‘CartoEco’, ‘CartoLight’, ‘CartoMidnight’, 
   ‘EsriImagery’, ‘EsriNatGeo’, ‘EsriReference’, ’EsriTerrain’,
   ‘EsriUSATopo’, ‘OSM’, ‘StamenLabels’, ‘StamenTerrain’,
   ‘StamenTerrainRetina’, ‘StamenToner’, ‘StamenTonerBackground’,
   ‘StamenWatercolor’, ‘Wikipedia’ (default)



In [None]:
simple_traj.hvplot(tiles="EsriTerrain") #"StamenTerrain")

### Time interpolating the position of the object

- Use the function `get_position_at()`:

We need to make sure that the date falls between existing dates.

In [None]:
date = dt.datetime(2023,10,1,9,4,45)

In [None]:
pos = simple_traj.get_position_at(date, method="nearest")  
print(pos)

In [None]:
pos = simple_traj.get_position_at(date, method="interpolated")  
print(pos)

In [None]:
pos = simple_traj.get_position_at(date, method="ffill") # from the previous row
print(pos)

pos = simple_traj.get_position_at(date, method="bfill") # from the following row
print(pos)

#### If the timestamp falls outside the time range between trajectory start and end time, we get an error:

In [None]:
date = dt.datetime(2023,10,1,11,4,45)
try: 
    pos = simple_traj.get_position_at(date)
except ValueError as e:
    print(f"ValueError: {e}")

# <font color="red"> Applications</font>


## <font color="blue">Tracking the International Space Station (ISS) </font>

- ISS orbits the Earth about every 90 minutes at a speed of five miles per second.
- The webpage [http://api.open-notify.org/iss-now.json](http://api.open-notify.org/iss-now.json) provides the time and the position of the ISS as a JSON object:
```python
{"timestamp": 1689905399, "iss_position": {"latitude": "-16.3200", "longitude": "176.0359"}, "message": "success"}
```
- We access the webpage every 5 seconds (for several hours) to collect data on the trajectory of ISS.
- We create a CSV file containing the time and position of ISS over several hours.

In [None]:
url = "https://portal.nccs.nasa.gov/datashare/astg/training/python/geopandas/ISS_timeseries_path.csv"
df_iss = pd.read_csv(url)
df_iss                     

Convert the `geometry` column into a Shapely geometry:

In [None]:
df_iss['geometry'] = df_iss['geometry'].apply(shpwkt.loads)

Convert the timestamp into a datetime object:

In [None]:
df_iss['timestamp'] = pd.to_datetime(df_iss['timestamp'], unit='s')
df_iss

Rename the columns and set the column `t` to be index:

In [None]:
df_iss.columns = ['latitude', 'longitude', 't', 'geometry']
df_iss = df_iss.set_index('t')
df_iss

Create the GeoPandas GeoDataFrame:

In [None]:
gdf_iss = gpd.GeoDataFrame(df_iss, crs={'init': 'epsg:4326'})

Create the trajectory:

In [None]:
iss_traj = mpd.Trajectory(gdf_iss, 1)

In [None]:
print(iss_traj)

Plot the trajectory:

In [None]:
iss_traj.plot(legend=True, figsize=(9,5))

- We only use one trajectory for the entire dataset.
- There are mapping issues when the prime meridian is crossed

#### Create a collection of MovingPandas trajectories

Use the function `TrajectoryCollection`:


```python
TrajectoryCollection(data, traj_id_col=None, obj_id_col=None, 
                     t=None, x=None, y=None, 
                     crs='epsg:4326', min_length=0, 
                     min_duration=None)
```

- __data__ (list[Trajectory] or GeoDataFrame or DataFrame) – List of Trajectory objects or a GeoDataFrame with trajectory IDs, point geometry column and timestamp index
- **traj_id_col** (string) – Name of the GeoDataFrame column containing trajectory IDs
- **obj_id_col** (string) – Name of the GeoDataFrame column containing moving object IDs
- __t__ (string) – Name of the DataFrame column containing the timestamp
- __x__ (string) – Name of the DataFrame column containing the x coordinate
- __y__ (string) – Name of the DataFrame column containing the y coordinate
- __crs__ (string) – CRS of the x/y coordinates
- **min_length** (numeric) – Desired minimum length of trajectories. (Shorter trajectories are discarded.)
- **min_duration** (timedelta) – Desired minimum duration of trajectories. (Shorter trajectories are discarded.)

Add a trajectory id:

In [None]:
traj_id = np.zeros_like(df_iss["longitude"].values, int)

count = 0
prev_long = df_iss['longitude'].values[0]

for i, long in enumerate(df_iss['longitude'].values[1:], start=1):
    if (-179.9999 <= long <= -179.00) and (179.000 <= prev_long <= 179.99999):
        count += 1
    prev_long = long
    traj_id[i] = count

df_iss['traj_id'] = traj_id
df_iss

In [None]:
df_iss.traj_id.unique()

In [None]:
iss_trajc = mpd.TrajectoryCollection(df_iss, 
                                 x = "longitude", y="latitude",
                                 traj_id_col="traj_id", t="t")

In [None]:
iss_trajc

In [None]:
iss_trajc.plot();

In [None]:
ntrajs = len(iss_trajc.trajectories)
colors = ["red", "blue", "green", "purple"]

fig, ax = plt.subplots(figsize=(20,15))

for i in range(ntrajs):
    iss_trajc.trajectories[i].plot(ax=ax, 
                                   color=colors[i], lw=3,
                                  label=f"Trajectory: {i}")
plt.legend();

#### Select one trajectory

In [None]:
my_traj = iss_trajc.trajectories[1]

In [None]:
my_traj.df

In [None]:
my_traj.plot(color="blue")

#### Use `hvplot` to have an interactive map

In [None]:
my_traj.hvplot(color="blue")

## <font color="blue">Measurement of `NO2` by the Ozone Monitoring Instrument (OMI)</font>

- [The Ozone Monitoring Instrument (OMI)](https://www.earthdata.nasa.gov/learn/find-data/near-real-time/omi) aboard NASA's Aura satellite (launched in 2004) measures ozone from Earth's surface to top-of-atmosphere. 
  - OMI is a nadir-viewing wide-field-imaging spectrometer, giving daily global coverage.
  - OMI measures the key air quality components such as nitrogen dioxide (NO$_2$), sulfur dioxide (SO$_2$), bromine oxide (BrO), OClO, and aerosol characteristics.
  - OMI provides mapping of pollution products from an urban to super-regional scale.
- Near real-time (NRT) OMI data are available through LANCE generally within three hours after a satellite observation.

Here we focus on the [Nitrogen Dioxide (NO2) Total and Tropospheric Column](https://disc.gsfc.nasa.gov/datasets/OMNO2_003/summary) 1-orbit L2 Swath.

### Create the Pandas DataFrame

In [None]:
def convert_dict_dtype(sample_dict):
    '''
    Converts attribute dictionary from Numpy data types 
    to general Python data types

    Parameters
    ----------
    sample_dict : dict
         A dictionary of attributes
         
    Returns
    sample_dict : dictt
         A dictionary of attributes
    '''
    for key, item in sample_dict.items():
        if isinstance(item, np.ndarray):   # Converts np arrays to a list to, if applicable, an int or float
            item = list(item)
        
            if len(item) == 1:
                item = item[0]
        elif isinstance(item, np.bytes_):   # Converts np bytes to an np string to a Python string
            item = str(item.astype('str'))
        
            if item[0] == '(' or item[0] == '{':   # Converts to tuple or dict if applicable
                item = eval(item)
            # **eval() relaiability??**
            
        sample_dict[key] = item   # Updates any changes to the key value
        
    return sample_dict

In [None]:
def get_ds_attrs(ds):
    """
       Give a dataset identifier, return the dataset attribute.
       
       Input Parameters:
          - ds: dataset identifier
       Returned value:
          - ds_attrs: a dictionary
    """
    ds_attrs = dict(ds.attrs)
    ds_attrs = convert_dict_dtype(ds_attrs)
    
    return ds_attrs

In [None]:
def get_ds_attribute_value(ds_attrs, attr_name):
    '''
    Obtain the value of a specified attribute in a dataset.
    
    Parameter
    ---------
    ds_attrs : dict
         A dictionary of dataset attributes
    attr_name : str
         Attribute name    
    
    Returns
    --------
    value: float, int, str, list
         Value of the attribute. If attribute not available, None.
    '''
    for key, value in ds_attrs.items():
        if key == attr_name:
            return value 
    return None

In [None]:
def restore_data(ds):
    '''
    Restore the dataset data using the dataset attributes.
      
    Parameters
    ----------
    ds : h5py dataset identifier
    
    Returns:
    data : numpy array
    '''
    ds_attrs = get_ds_attrs(ds)
    
    _FillValue = get_ds_attribute_value(ds_attrs, '_FillValue')
    scale_factor = get_ds_attribute_value(ds_attrs, 'scale_factor')
    add_offset = get_ds_attribute_value(ds_attrs, 'add_offset')
    
    data = ds[()]#.astype('float')
    
    data = np.where(data != _FillValue, data, np.nan)
    if add_offset:
        data -= add_offset
    if scale_factor:
        data *= scale_factor

    return data

In [None]:
def get_arrays(fname):
    with h5py.File(fname, 'r') as fid:
        geo_grp = fid['HDFEOS']['SWATHS']['ColumnAmountNO2']['Geolocation Fields']
        data_grp = fid['HDFEOS']['SWATHS']['ColumnAmountNO2']['Data Fields']
        NO2 = restore_data(data_grp['ColumnAmountNO2Trop'])[:,0]
        time = geo_grp['Time'][()]
        lats = geo_grp['SpacecraftLatitude'][()]
        lons = geo_grp['SpacecraftLongitude'][()]
    return NO2, time, lats, lons

In [None]:
list_files =[
    "OMI-Aura_L2-OMNO2_2023m0709t0114-o100959_v003-2023m0710t052026.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t0253-o100960_v003-2023m0710t052055.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t0432-o100961_v003-2023m0710t060000.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t0610-o100962_v003-2023m0710t124018.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t0749-o100963_v003-2023m0710t141856.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t0928-o100964_v003-2023m0710t141539.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t1107-o100965_v003-2023m0710t143421.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t1246-o100966_v003-2023m0710t171304.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t1425-o100967_v003-2023m0710t171303.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t1603-o100968_v003-2023m0710t171256.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t1742-o100969_v003-2023m0710t171227.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t1921-o100970_v003-2023m0710t224725.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t2100-o100971_v003-2023m0710t224852.he5",
    "OMI-Aura_L2-OMNO2_2023m0709t2239-o100972_v003-2023m0710t224703.he5"
]

In [None]:
#data_dir = "/Users/jkouatch/myTasks/PythonTraining/ASTG606/Materials/sat_data/OMI_Data/"
data_dir = "/tljh-data/sat_data/OMI_Data"

In [None]:
num_files = len(list_files)
first_iter = True
for i in range(1):
    fname = Path(data_dir) / list_files[i]
    print(f"Reading: {fname}")
    X, Y, Z, W = get_arrays(fname)
    if first_iter:
        first_iter = False
        NO2, time, lats, lons = X, Y, Z, W
    else:
        NO2 = np.concatenate((NO2, X), axis=0)
        time = np.concatenate((NO2, Y), axis=0)
        lats = np.concatenate((NO2, Z), axis=0)
        lons = np.concatenate((NO2, W), axis=0)

In [None]:
NO2.shape

Convert the time (GPS unit) to a datetime object:

In [None]:
Times = np.zeros_like(time, object)
gps_epoch = dt.datetime(1980, 1, 6)
for j, t in enumerate(time):
    Times[j] = (gps_epoch + dt.timedelta(seconds=time[j] - (35 - 19))).strftime("%Y-%m-%d %H:%M:%S.%f")

In [None]:
df_omi = pd.DataFrame(
    dict(latitude=lats, longitude=lons, 
         NO2TropSurf=NO2, t=Times))
df_omi

In [None]:
df_omi.info()

In [None]:
#df_omi = df_omi.set_index('t')
#df_omi

In [None]:
df_omi['longitude'] = df_omi['longitude']%360

### Visualization

Timeseries plot:

In [None]:
df_omi.plot(x='t', y='NO2TropSurf')
plt.xticks(rotation=90);

Histogram:

In [None]:
df_omi['NO2TropSurf'].plot(kind='hist', figsize=(12,8));

Trajectory:

In [None]:
traj_omi = mpd.Trajectory(df_omi,
                          traj_id=1,
                          x = "longitude", y="latitude",
                          t="t")

In [None]:
traj_omi.plot();

In [None]:
fig, ax = plt.subplots(figsize=(12,10))

traj_omi.plot(legend=True, 
           column="NO2TropSurf", 
           capstyle='round', 
              cmap="jet", ax=ax);

In [None]:
traj_omi.hvplot(tiles="ESRI")

In [None]:
hv_kwargs = dict(hover_cols=["latitude", "longitude"], frame_height=300, frame_width=300)

traj_omi.hvplot(**hv_kwargs)

## IASI METOP

In [None]:
#data_dir = "/Users/jkouatch/myTasks/PythonTraining/ASTG606/Materials/sat_data/IASI_Data"
data_dir = "/tljh-data/sat_data/IASI_Data"

In [None]:
nc4fname = Path(data_dir) / 'x0044.iasi_metop-a.2020-12-14T21:00:00Z.nc4'

In [None]:
def get_arrays_nc4(fname):
    with h5py.File(fname, 'r') as fid:
        data_grp = fid['ObsValue']
        brightnessTemp = restore_data(data_grp['brightnessTemperature'])[:,0]
        time = fid['MetaData']['dateTime'][()]
        lats = fid['MetaData']['latitude'][()]
        lons = fid['MetaData']['longitude'][()]
    return brightnessTemp, time, lats, lons

In [None]:
brightnessTemp, time, lats, lons = get_arrays_nc4(nc4fname)

#### Convert the time records into datetime objects

In [None]:
Times = np.zeros_like(time, object)
init_dt = dt.datetime(1970, 1, 1)
for i, t in enumerate(time):
    sec = int(t)
    Times[i] = (init_dt + dt.timedelta(seconds=sec)).strftime("%Y-%m-%d %H:%M:%S.%f")

#### Create the Pandas DataFrame

In [None]:
df_iasi = pd.DataFrame(
    dict(latitude=lats, longitude=lons, 
         brightnessTemp=brightnessTemp, t=Times))
df_iasi

#### Sort the time records

In [None]:
df_iasi = df_iasi.sort_values(by='t')
df_iasi

#### Examine longitudes and latitudes to iderntify trajectories

In [None]:
# plot and rotate the tick labels with rot= in the first plot call
ax = df_iasi.plot(x='t', y='longitude', 
                  color='tab:blue', 
                  figsize=(15,8), rot=90)
ax2 = ax.twinx()
df_iasi.plot(x='t', y='latitude', 
             color='tab:red', ax = ax2)
ax2.legend(loc='lower right');

- There appears to be five (5) obvious trajectories based on the number of crossings of the central meridian.
- There are a lot of longitude oscillations after the first crosssing of the central meridian:
   - It will create problems in determing the trajectories.

#### Set the values of the trajectory ids
- Every time we cross the central meridian, we have a new trajectory

In [None]:
traj_id = np.zeros_like(time, int)

count = 0
prev_long = df_iasi['longitude'].values[0]

for i, long in enumerate(df_iasi['longitude'].values[1:], start=1):
    if (0.0 <= long <= 0.99999) and (340.000 <= prev_long <= 359.99999):
        count += 1
    prev_long = long
    traj_id[i] = count

df_iasi['traj_id'] = traj_id

In [None]:
df_iasi

In [None]:
df_iasi.plot(x='t', y='brightnessTemp')
plt.xticks(rotation=90);

In [None]:
df_iasi['brightnessTemp'].plot(kind='hist', figsize=(12,8));

#### Create MovingPandas Trajectories

In [None]:
iasi_trajc = mpd.TrajectoryCollection(df_iasi, 
                                 x = "longitude", y="latitude",
                                 traj_id_col="traj_id", t="t")

In [None]:
iasi_trajc

In [None]:
len(iasi_trajc.trajectories)

#### Visualize the trajectories

In [None]:
iasi_trajc.plot()

In [None]:
ntrajs = len(iasi_trajc.trajectories)
fig, ax = plt.subplots(nrows=ntrajs, ncols=1, figsize=(15,30))

for i in range(ntrajs):
    iasi_trajc.trajectories[i].plot(ax=ax[i])
    ax[i].set_title(f"Trajectory {i}")
    
plt.tight_layout()

We want to consider trajectories 6, 9 and 10.

In [None]:
fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(10,25))

j = 0
for i in [6,9,10]:
    iasi_trajc.trajectories[i].plot(legend='true',
                                    legend_kwds={
                                          "shrink":.55
                                          },
                                    column="brightnessTemp",
                                    capstyle='round',
                                    ax=ax[j])
    ax[j].set_title(f"Trajectory {i}")
    j += 1

    
#plt.tight_layout()

#### Let us zoom in on trajectory 6

- We want to remove the two horizontal lines.
- We need to check the positions at the beginning and at the end.

In [None]:
df6 = iasi_trajc.trajectories[6].df
df6

We observe that the first and last points are out of place and we can remove them.

In [None]:
df6.drop(df6.head(1).index, inplace=True)
df6

In [None]:
df6.drop(df6.tail(1).index, inplace=True)
df6

If we plot again the trajectory, we have:

In [None]:
mpd.Trajectory(df6, 1).plot(legend='true',
                            legend_kwds={"shrink":.55},
                            column="brightnessTemp",
                            capstyle='round')

## <font color="blue"> Analyze Mars Rover & Heli  Data</font>

- [The car-sized Perseverance and its little helicopter buddy Ingenuity landed together inside Mars](https://www.space.com/perseverance-rover-100-mars-days) by Mike Wall published June 02, 2021)
   - One `sol` lasts about 24 hours and 40 minutes, slightly longer than an Earth day.

### Load the data

In [None]:
def to_timestamp(row):
    start_time = dt.datetime(2021,2,18,0,0,0)  #  sol 0 
    try: 
        sol = row['sol']  # rover
    except KeyError:
        sol = row['Sol']  # heli 
    td = dt.timedelta(hours=24*sol, minutes=40*sol)
    return start_time + td

In [None]:
m20_waypoints_json = "https://mars.nasa.gov/mmgis-maps/M20/Layers/json/M20_waypoints.json"
m20_filename = Path(m20_waypoints_json).name

In [None]:
heli_waypoints_json = "https://mars.nasa.gov/mmgis-maps/M20/Layers/json/m20_heli_waypoints.json"
heli_filename = Path(heli_waypoints_json).name

In [None]:
def get_df_from_file(filename):
    gdf = gpd.read_file(filename)
    gdf['time'] = gdf.apply(to_timestamp, axis=1)
    gdf.set_index('time', inplace=True)
    return gdf

In [None]:
download_remote_file(m20_waypoints_json)
download_remote_file(heli_waypoints_json)

In [None]:
m20_df = get_df_from_file(m20_filename)
heli_df = get_df_from_file(heli_filename)

In [None]:
m20_df.head()

In [None]:
m20_df.info()

In [None]:
m20_df.describe().transpose()

In [None]:
heli_df.info()

In [None]:
heli_df.describe().transpose()

### Plot

In [None]:
m20_df.hvplot(title="M20 & heli waypoints", hover_cols=['sol'], **hvplot_defaults) * heli_df.hvplot(hover_cols=['Sol'], color="red")


### Determine Trajectory

In [None]:
m20_traj = mpd.Trajectory(m20_df, 'm20')
heli_traj = mpd.Trajectory(heli_df, 'heli')

In [None]:
traj_plot = m20_traj.hvplot(title="M20 & heli trajectories", line_width=3, **hvplot_defaults) * heli_traj.hvplot(line_width=3, color='red', **hvplot_defaults)
traj_plot 

In [None]:
m20_traj.hvplot(title="Rover speed (only suitable for relative comparison)", 
                c='speed', line_width=7, **hvplot_defaults) 

### Select 

In [None]:
m20_detector = mpd.TrajectoryStopDetector(m20_traj)
stop_points = m20_detector.get_stop_points(min_duration=dt.timedelta(seconds=60), 
                                           max_diameter=100)
stop_points['duration_days'] = stop_points['duration_s']/(60*60*24)
stop_points.head()

In [None]:
heli_detector = mpd.TrajectoryStopDetector(heli_traj)
heli_stop_points = heli_detector.get_stop_points(min_duration=dt.timedelta(seconds=60), max_diameter=100)
heli_stop_points['duration_days'] = heli_stop_points['duration_s']/(60*60*24)
heli_stop_points.head()

In [None]:
stop_point_plot = stop_points.hvplot(title='M20 & heli stops ', 
                                     geo=True, size=np.log(hv.dim('duration_days'))*10, 
                                     hover_cols=['duration_days'], color='blue', alpha=0.5) 
heli_stop_plot = heli_stop_points.hvplot(geo=True, size=np.log(hv.dim('duration_days'))*10, 
                                     hover_cols=['duration_days'], color='red', alpha=0.5) 
stop_point_plot * heli_stop_plot * traj_plot

### Mars background map

https://mars.nasa.gov/mars2020/mission/where-is-the-rover/

In [None]:
from bokeh.models import TMSTileSource

tile_url = 'http://s3-eu-west-1.amazonaws.com/whereonmars.cartodb.net/celestia_mars-shaded-16k_global/{Z}/{X}/{Y}.png'

def mars_tiles(plot, element):
    plot.state.add_tile(TMSTileSource(url=tile_url), level='underlay')

traj_map = m20_traj.hvplot(title="M20 & heli trajectories", tiles=None) * heli_traj.hvplot(color='red', **hvplot_defaults)
traj_map.opts(hooks=[mars_tiles])

## <font color="blue"> Analyze Ship Data</font>

#### Obtain the remote data file

In [None]:
url = 'https://github.com/movingpandas/movingpandas/raw/main/tutorials/data/demodata_ais.gpkg'
ais_filename = Path(url).name

In [None]:
download_remote_file(url)

#### Read the `.gpkg` file as a Pandas DataFrame

- A file with a `.gpkg` extension consists of a GIS implemented as a SQLite database container containing data and metadata tables with typical definitions, format limitations, integrity assertions and content constraints.
- The GeoPackage (GPKG) standard describes a set of conventions for   
    - Storing tile matrix sets of imagery
    - Vector features
    - Raster maps at various scales
    - Metadata and schema

In [None]:
df_ais = gpd.read_file(ais_filename)
df_ais

In [None]:
df_ais.info()

- `MMSI`: Maritime Mobile Service Idenity used as unique id for a ship.
- `NavStatus`: Navigational status
- `SOG`: Speed over ground in 1/10 knot steps (0-102.2 knots)
- `COG`: Course over ground in 1/10 = (0-3599)
- `ShipType`: Ship type

#### Focus on the ship type:

In [None]:
df_ais['ShipType'].unique()

In [None]:
df_ais['ShipType'].value_counts()

In [None]:
total_ships = df_ais['ShipType'].value_counts().sum()
percent = 100*df_ais['ShipType'].value_counts()/total_ships

dist_ship = pd.concat([df_ais['ShipType'].value_counts(), percent],
                    axis=1, keys=['Total', 'Percent'])
dist_ship

In [None]:
dist_ship['Total'].plot(kind='bar', figsize=(15,3));

#### Focus on `SOC`

In [None]:
df_ais['SOG'].hist(bins=100, figsize=(15,3))

- There are a lot of records with values equal to zero.
- Remove the ships with speed over ground equal to zero.

In [None]:
print(f"Original size: {len(df_ais)} rows")
df_ais = df_ais[df_ais.SOG > 0]
print(f"Reduced to {len(df_ais)} rows after removing 0 speed records")

In [None]:
df_ais['SOG'].hist(bins=100, figsize=(15,3))

Basic plot:

In [None]:
df_ais.plot(figsize=(12,12), markersize=0.7, alpha=0.7);

#### Create trajectories

- We transform the content of `Timestamp` into Datetime objects.
- We specify that the minimum length for a trajectory needs to be at least 100 meters.

In [None]:
df_ais['t'] = pd.to_datetime(df_ais['Timestamp'], 
                             format='%d/%m/%Y %H:%M:%S')

In [None]:
df_ais

In [None]:
df_ais.info()

In [None]:
df_ais['MMSI'].value_counts()

Specify minimum length for a trajectory (in meters):
- Desired minimum length of trajectories (shorter trajectories are discarded)

In [None]:
minimum_length = 100

In [None]:
traj_collection = mpd.TrajectoryCollection(df_ais, 'MMSI', 
                                           t='t', 
                                           min_length=minimum_length)

In [None]:
traj_collection

We can plot all the trajectories:

In [None]:
traj_collection.plot()

#### Down-sample the trajectories to ensure a certain time delta between records

The goal is to increase the time interval between records.

In [None]:
traj_collection = mpd.MinTimeDeltaGeneralizer(traj_collection).generalize(tolerance=dt.timedelta(minutes=1))

Plot the trajectories with colors:

In [None]:
shiptype_to_color = {'Passenger': 'blue', 'HSC': 'green', 'Tanker': 'red', 
                     'Cargo': 'orange', 'Sailing': 'grey', 'Other': 'grey', 
                     'Tug': 'grey', 'SAR': 'grey', 'Undefined': 'grey', 
                     'Pleasure': 'grey', 'Dredging': 'grey', 'Law enforcement': 'grey',
                    'Pilot': 'grey', 'Fishing': 'grey', 'Diving':'grey', 'Spare 2': 'grey'}
traj_collection.plot(column='ShipType', 
                     column_to_color=shiptype_to_color, 
                     linewidth=1, capstyle='round')

Use `hvplot`:

In [None]:
passenger = traj_collection.filter('ShipType', 'Passenger')
passenger.hvplot(title='Passenger ferries', line_width=2, 
                 frame_width=700, frame_height=500)

#### Focus on one trajectory

In [None]:
len(traj_collection.trajectories)

In [None]:
id = 5
my_traj = traj_collection.trajectories[id]
my_traj.df

In [None]:
my_traj.hvplot(title=f'Trajectory {my_traj.id}', 
               frame_width=700, frame_height=500, 
               line_width=5.0, c='NavStatus', cmap='Dark2') 

### Finding ships passing under a bridge

In [None]:
area_of_interest = shpgeom.Polygon([(11.89935, 57.69270), (11.90161, 57.68902), 
                                    (11.90334, 57.68967), (11.90104, 57.69354), 
                                    (11.89935, 57.69270)])

In [None]:
intersecting = traj_collection.get_intersecting(area_of_interest)

In [None]:
print(f"Found {len(intersecting)} intersections with the bridge")

In [None]:
bridge_traj = intersecting.trajectories[0]
bridge_traj.hvplot(title=f'Trajectory {bridge_traj.id}', 
                   line_width=5.0, c='NavStatus', cmap='Dark2')

In [None]:
bridge_traj.df.head()