Arctox : New version 2025

@author: Christine Plumejeaud-Perreau, UMR 7301 Migrinter,
- Master M2 SPE, UE '270-3-71 - Geospatial and web development' 
- Created on 12 november 2025
- Updated on 12/11/2025

This work to was proposed as TEA in 2020

- Import GPS values from the CSV file ‘Kap Hoegh GLS 20102011_sun3.csv’ (there are outliers, because of the false latitudes) : was the code to 04_arctox.ipynb
  - Instead, in 2025, import the other part of the dataset coming from the XLSX file
  
- Build the bird path : make a GROUP BY bird_id, and sort in chronological order each point per bird 
- Compute the total length of the path
- Connect through a python program to database
- Plot a bokeh map and/or a folium map (you can use geopandas)
- Remove/clean abnormal values : outliers detection

- replace the bad latitude values with clever values using python / SQL : outliers detection
- redo the job of computing points and paths of birds using python / SQL 

## 1. Read Data

In [None]:
import pandas as pd

tousLesPointsGPS = r"C:\Travail\Enseignement\Cours_M2_python\Projet_Arctox\complet.xls"
allGPS = pd.read_excel(tousLesPointsGPS, sheet_name='complete')

allGPS.head()
allGPS.shape

(20380, 17)

In [None]:
#1. rename some columns
allGPS = allGPS.rename(columns={"ID" : "id", "date": "dategps", "time": "timegps", "Long2" : "long", "Lat2" : "lat"})

#2. remove useless columns
allGPS = allGPS.drop(['ID_ID', 'Lat1', 'Long', 'blabla', 'transition1',	'transition2'], axis=1)
#Sex	period	distance	direction	velocity	confidence


print(allGPS.columns)

Index(['id', 'Sex', 'period', 'dategps', 'timegps', 'lat', 'long', 'distance',
       'direction', 'velocity', 'confidence'],
      dtype='object')


### Vous avez besoin d'un axe temporel : une colonne timestamp

- https://realpython.com/python-datetime/ 
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
- https://www.delftstack.com/fr/howto/python-pandas/how-to-convert-dataframe-column-to-datetime-in-pandas/

In [None]:
#3. Add a timestamp column
allGPS.dategps = allGPS.dategps.astype(str)
allGPS.timegps = allGPS.timegps.astype(str)

allGPS['timestampgps'] = allGPS.dategps+' '+allGPS.timegps
format_string = "%Y-%m-%d %H:%M:%S"
from datetime import datetime

allGPS.timestampgps = allGPS.timestampgps.apply(lambda x: datetime.strptime(x, format_string))
allGPS.head()

Unnamed: 0,id,Sex,period,dategps,timegps,lat,long,distance,direction,velocity,confidence,timestampgps
0,3606,F,midnight,2009-09-01,23:25:00,77.16,8.7,0.0,0.0,0.0,9,2009-09-01 23:25:00
1,3606,F,noon,2009-09-02,10:59:00,77.21,15.04,0.0,0.0,0.0,9,2009-09-02 10:59:00
2,3606,F,midnight,2009-09-02,23:32:00,75.31,6.87,162.95,-45.6,12.98,9,2009-09-02 23:32:00
3,3606,F,noon,2009-09-03,11:55:00,75.45,1.08,88.09,84.53,7.11,9,2009-09-03 11:55:00
4,3606,F,midnight,2009-09-03,23:18:00,77.83,10.17,190.46,-41.43,16.73,9,2009-09-03 23:18:00


## 2. Create a GeoDataFrame

In [None]:
#4. Create a GeoDataFrame
import geopandas

allGPS_geo = geopandas.GeoDataFrame(
    allGPS, geometry=geopandas.points_from_xy(allGPS.long, allGPS.lat), crs="EPSG:4326"
)
allGPS_geo.head()

Unnamed: 0,id,Sex,period,dategps,timegps,lat,long,distance,direction,velocity,confidence,timestampgps,geometry
0,3606,F,midnight,2009-09-01,23:25:00,77.16,8.7,0.0,0.0,0.0,9,2009-09-01 23:25:00,POINT (8.70000 77.16000)
1,3606,F,noon,2009-09-02,10:59:00,77.21,15.04,0.0,0.0,0.0,9,2009-09-02 10:59:00,POINT (15.04000 77.21000)
2,3606,F,midnight,2009-09-02,23:32:00,75.31,6.87,162.95,-45.6,12.98,9,2009-09-02 23:32:00,POINT (6.87000 75.31000)
3,3606,F,noon,2009-09-03,11:55:00,75.45,1.08,88.09,84.53,7.11,9,2009-09-03 11:55:00,POINT (1.08000 75.45000)
4,3606,F,midnight,2009-09-03,23:18:00,77.83,10.17,190.46,-41.43,16.73,9,2009-09-03 23:18:00,POINT (10.17000 77.83000)


In [None]:
allGPS_geo.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

## 3. Enregistrer le geodataframe en BDD

In [None]:
import pandas as pd
from pandas.io import sql
from sqlalchemy import create_engine, text as sql_text

#5.1. Créer le schema 'arctic' s'il n'existe pas
connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie')
ORM_conn = connection.connect()
sql.execute(sql_text('create schema if not exists arctic '), ORM_conn)
ORM_conn.commit()
ORM_conn.close()



In [None]:
#5.2. Insérer les données dans une table 'gps_complet' du schema 'arctic'
connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie')
ORM_conn = connection.connect()
allGPS_geo.to_postgis('gps_complet', con=ORM_conn , schema='arctic', if_exists='replace', index=False)
ORM_conn.commit()
ORM_conn.close()
#kap_hoegh_gls?

## 4. Se servir de la BDD pour faire des calculs spatiaux

Par exemple, calculer la trajectoire des oiseaux

### Calculate bird paths

In [None]:
#6.Calculate bird paths

sql_query = """create table bird_paths as (
	select id, st_makeline(geometry) as linepath
	from (select id, geometry, timestampgps from gps_complet order by id, timestampgps) as q 
	group by id
	)"""
 
connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie', 
                           connect_args={'options': '-csearch_path={}'.format('arctic,public')})
ORM_conn = connection.connect()
sql.execute(sql_text(sql_query), ORM_conn)
ORM_conn.commit()
ORM_conn.close()

### Calculer la longueur de la migration en km

In [None]:
#7. Compute migration lengths
 
sql_query = """alter table bird_paths add column migration_length float;
update bird_paths set migration_length = round(st_length(linepath, true)/ 1000);"""

connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie', 
                           connect_args={'options': '-csearch_path={}'.format('arctic,public')})
ORM_conn = connection.connect()
sql.execute(sql_text(sql_query), ORM_conn)
ORM_conn.commit()
ORM_conn.close()

## 5. Do a map to visualize bird paths

### First load data from DB

In [None]:
#8. Visualize bird paths

#8.1. Load bird paths into a GeoDataFrame
import geopandas as gpd
query = """ SELECT id, migration_length, st_transform(linepath, 3857) as linepath from bird_paths """
connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie', 
                           connect_args={'options': '-csearch_path={}'.format('arctic,public')})
ORM_conn = connection.connect()
data = gpd.GeoDataFrame.from_postgis(sql_text(query), ORM_conn, geom_col='linepath')

ORM_conn.close() #Close the connection
print(data.shape) #18, 3
print(data.shape) #44, 3

(44, 3)


### Use Bokeh for mapping

In [17]:
#8.2 Do the mapping with Bokeh
from bokeh.io import output_notebook, show
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar

# Make the plot

from bokeh.palettes import GnBu, PiYG11,Set3, Category20, Category20c, viridis
from bokeh.models.callbacks import CustomJS
from bokeh.transform import linear_cmap, factor_cmap, transform
from bokeh.models import GeoJSONDataSource
from bokeh.plotting import figure, show

palette = viridis(data.shape[0])
   
            
data['Color'] = 'black'
for index, row in data.iterrows():
    data.loc[index, 'Color'] = palette[index]

    
# slight modification to have the GeoJSONDataSource working
geo_source = GeoJSONDataSource(geojson=data.to_json())


# Bokeh converts the GeoJSON coordinates into columns called x and y or xs and ys (depending on whether the features are Points, Lines, MultiLines, Polygons, or MultiPolygons). 
# Properties with clashing names will be overridden when the GeoJSON is converted and should be avoided.
# https://docs.bokeh.org/en/latest/docs/user_guide/interaction/js_callbacks.html
TOOLTIPS = [('migration length', '@migration_length'), ('bird id', '@id')]

p = figure(x_range=(-9587947, 1113194), y_range=(3503549, 13195489),
           x_axis_type="mercator", y_axis_type="mercator", 
           background_fill_color="lightgrey",  tooltips=TOOLTIPS)

p.add_tile("CartoDB Positron", retina=True)

p.multi_line(xs='xs', ys='ys', line_color='Color', source=geo_source, line_width=1)

show(p)

## 6. Smooth bad geographic coordinates (lat and long) within python

- https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html  
- https://stackoverflow.com/questions/20618804/how-to-smooth-a-curve-in-the-right-way 


In [18]:
#9. Smooth bad latitude points

from tsmoothie.smoother import * #pip install tsmoothie
import pandas.io.sql as sql
from sqlalchemy import create_engine, text

connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie', 
                           connect_args={'options': '-csearch_path={}'.format('arctic,public')})
ORM_conn = connection.connect()

query= """select id, timestampgps, clean_lat, clean_long 
    from arctic.gps_complet 
    where clean_lat is not null and clean_long is not null
    order by id, timestampgps """
df = sql.read_sql_query(text(query), ORM_conn)

#x = df.loc[:, ['timestampgps']].values #timestampgps
#y = df.loc[:,['clean_lat']].values
x = df[df.id==3648].timestampgps.values
y = df[df.id==3648].clean_lat.values

#https://pypi.org/project/tsmoothie/
#https://fr.wikipedia.org/wiki/Fen%C3%AAtrage 
#Second one : moving weighted average of span = 10, using hamming function
smoother = ConvolutionSmoother(window_len=20, window_type='hamming')
smoother.smooth(y)

# generate intervals
low, up = smoother.get_intervals('sigma_interval', n_sigma=2)
 

In [19]:
#10. Visualisise smoothed  latitude points

# plot the smoothed timeseries with intervals
from bokeh.plotting import show, figure, output_file, output_notebook

#output_notebook() 
output_file("smoothed_data.html")

p = figure(width=1600, height=800, x_axis_type='datetime')

# add a line renderer for smoothed line
p.line(x, smoother.smooth_data[0], line_width =3, color='blue')
p.circle(x, smoother.data[0], size =3, fill_color="white")
# add an area between low and up smoothed data
p.varea(x=x,y1=low[0], y2=up[0], alpha=0.3)

show(p)

In [None]:
#11. Save the result of smoothing

import numpy as np
#df['smooth_lat'] = smoother.smooth_data[0]

df['smooth_lat'] = np.nan
#df[df.id==3648].smooth_lat = smoother.smooth_data[0]
df.loc[df.id==3648, 'smooth_lat'] = smoother.smooth_data[0]


In [None]:
import pandas as pd

df['smooth_long'] = np.nan
df['smooth_lat'] = np.nan

birds = pd.unique(df.id)
for bird_id  in birds:   
    #print(row.id, row.timestampgps, row.clean_lat, row.smooth_lat)
    print(bird_id)
    x = df[df.id==bird_id].timestampgps.values
    y = df[df.id==bird_id].clean_lat.values
    smoother = ConvolutionSmoother(window_len=20, window_type='hamming')
    smoother.smooth(y)
    df.loc[df.id==bird_id, 'smooth_lat'] = smoother.smooth_data[0]
    #Smooth longitudes also
    y = df[df.id==bird_id].clean_long.values
    smoother.smooth(y)
    df.loc[df.id==bird_id, 'smooth_long'] = smoother.smooth_data[0]


In [15]:
# Save the result
connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie')
ORM_conn = connection.connect()
df.to_sql('gps_complet_smoothed', con=ORM_conn , schema='arctic', if_exists='replace', index=False)
ORM_conn.commit()
ORM_conn.close()

## 7. Visualize bird paths


Now, just redo the birdpaths

```SQL
create table bird_paths_smoothed as (
	select id, st_makeline(gpspoint) as linepath
	from (
	select id, st_setsrid(st_makepoint(smooth_long,smooth_lat), 4326) as gpspoint, 
		timestampgps 
		from gps_complet_smoothed 
		where  smooth_lat is not null and smooth_long is not null
		order by id, timestampgps) as q 
	group by id
	);

alter table bird_paths_smoothed add column migration_length float;
update bird_paths_smoothed set migration_length = round(st_length(linepath, true)/ 1000);
```

In [16]:
import geopandas as gpd
from pandas.io import sql
from sqlalchemy import create_engine, text as sql_text

query = """ SELECT id, migration_length, st_transform(linepath, 3857) as linepath from bird_paths_smoothed """
connection = create_engine('postgresql://postgres:postgres@localhost:5432/savoie', 
                           connect_args={'options': '-csearch_path={}'.format('arctic,public')})
ORM_conn = connection.connect()
data = gpd.GeoDataFrame.from_postgis(sql_text(query), ORM_conn, geom_col='linepath')

ORM_conn.close() #Close the connection
print(data.shape) #44, 3

(44, 3)
