# About

* **Author**: Adil Rashitov
* **Creation Date**: 22.02.2020
* **Goal**: This notebook is created to test automated version of pipeline(function) of GPSClustering. It is required to perform end-to-end optimization.
* **Deliverable**: Fast movement detecting pipeline

In [1]:
# Imports
import os
import numpy as np
import pandas as pd
import logging
import plotly.express as px
import geopandas as gpd
import plotly.express as px
from multiprocessing import Pool
from sklearn.model_selection import train_test_split
from GPSOdyssey import Polaris, Kepler, Void, Vega
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

logger = logging.getLogger()
logger.setLevel(logging.INFO)

%load_ext autoreload
%autoreload 2

# Data

In [2]:
GPS_DATE = '2020-10-01'
S_MAP_MATCHING_REPORT = '/Data/Intermediate/MapMatchingReports/MapMatching_october2020.csv'

map_matching_report = pd.read_csv(S_MAP_MATCHING_REPORT, parse_dates=[
                                  'waste_collection_date']).sort_values(by=['waste_collection_date', 'truck_id'])

map_matching_report = map_matching_report[map_matching_report['waste_collection_date'] == GPS_DATE]
fnames = map_matching_report['csv_file']
fnames

459    XE-5559L_2020-10-1.csv
292    XE-5565T_2020-10-1.csv
386    XE-5577J_2020-10-1.csv
458    XE-5598Z_2020-10-1.csv
337    XE-5609B_2020-10-1.csv
492    XE-5610X_2020-10-1.csv
322    XE-5612R_2020-10-1.csv
300    XE-5620S_2020-10-1.csv
152    XE-5628X_2020-10-1.csv
13     XE-5629T_2020-10-1.csv
234    XE-5630M_2020-10-1.csv
193    XE-5632H_2020-10-1.csv
206    XE-5636Y_2020-10-1.csv
475    XE-5638S_2020-10-1.csv
77     XE-5639P_2020-10-1.csv
316    XE-5640J_2020-10-1.csv
137    XE-5665M_2020-10-1.csv
46     XE-5680T_2020-10-1.csv
426    XE-5705G_2020-10-1.csv
500    XE-5748T_2020-10-1.csv
Name: csv_file, dtype: object

## October GPS records

In [3]:
%%time
S_CLUSTERED_GPS = '/Data/Source/OctoberGPS/'


vega = Vega(engine='pandas')
gps = vega.read_from_dir(directory=S_CLUSTERED_GPS,
                             file_extensions='.csv',
                             args={'parse_dates': ['time'], 'dtype':{'lon': 'float', 'lat': 'float',
                                                                     'lon_match': 'float', 'lat_match': 'float',
                                                                    }},
                             concatenate=False, amt_in_parallel=6, filenames=fnames)
len(gps)

CPU times: user 113 ms, sys: 47.4 ms, total: 160 ms
Wall time: 166 ms


20

# GPS clustering pipeline

In [13]:
from sklearn.pipeline import Pipeline
from Pipeline import FastMovementPreprocessor, FastMovementClassifier, GPSDBSCAN


fastMovementPreprocessorArgs = {
    'select_columns': ['truck_id', 'lon', 'lat', 'lon_match', 'lat_match', 'time'],
    'datetime_column': 'time',
    'vehicle_id_col': 'truck_id',
    'lon_col': 'lon_match',
    'lat_col': 'lat_match'
}
fastMovementClassifierArgs = {
    'model_path': '/models/fast_movement_detector.sav',
    'X_cols': ['velocity', 'velocity_lag_1', 'velocity_lag_2',
               'velocity_lag_3', 'velocity_lag_4', 'velocity_lag_5'],
    'y_col': 'is_fast_moving'
}
args_GPSDBSCAN= {
    'eps': 130,
    'min_n_samples': 20,
    'cols2select': ['x', 'y', 'unixtime'],
    'fast_movement_col': 'is_fast_moving',
    'cluster_colname': 'cluster_id',
    'post_sort_cols': ['truck_id', 'unixtime'],
}


clustering_pipeline = Pipeline([
     ('feature_extractor', FastMovementPreprocessor(**fastMovementPreprocessorArgs)),
     ('FastMovementClassifier', FastMovementClassifier(**fastMovementClassifierArgs)),
     ('clustering_model', GPSDBSCAN(**args_GPSDBSCAN))
])

In [14]:
%%time
with Pool(4) as pool:
    gps_outputs = pool.starmap(clustering_pipeline.transform, zip(gps))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = val

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row

CPU times: user 112 ms, sys: 80.9 ms, total: 193 ms
Wall time: 3.48 s


In [15]:
gps_outputs

[     lat_match       lat      unixtime        date             x         lon  \
 0     1.309453  1.309354  1.601559e+09  2020-10-01  19116.381928  103.753584   
 1     1.309453  1.309244  1.601559e+09  2020-10-01  19116.381928  103.753519   
 2     1.309297  1.309141  1.601559e+09  2020-10-01  19096.905501  103.753460   
 3     1.309297  1.309110  1.601559e+09  2020-10-01  19096.905501  103.753426   
 4     1.309297  1.309079  1.601559e+09  2020-10-01  19096.905501  103.753407   
 ..         ...       ...           ...         ...           ...         ...   
 210   1.332410  1.332312  1.601560e+09  2020-10-01   7503.695676  103.649251   
 211   1.332103  1.332053  1.601560e+09  2020-10-01   7471.085115  103.648908   
 212   1.331759  1.331730  1.601560e+09  2020-10-01   7434.801661  103.648559   
 213   1.331411  1.331383  1.601560e+09  2020-10-01   7397.516553  103.648223   
 214   1.331110  1.331086  1.601560e+09  2020-10-01   7366.018917  103.647937   
 
      truck_id      time  

# Kepler visualization

In [16]:
kepler = Kepler({"GPS": pd.concat(gps_outputs).reset_index(drop=True)}, height=800)
kepler.render_kepler_map()
kepler.get_rendered_map()

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(data={'GPS':        lat_match       lat      unixtime        date             x  \
0       1.309453  …