# ```sampley``` exemplar: the point approach
Before going through this exemplar, please consult the Introduction to sampley exemplars (```intro.ipynb```).
<br>This exemplar illustrates an application of the point approach to data contained within a single file (```trackpoints.csv```) containing continuous datapoints (i.e., datapoints recorded at frequent, regular intervals) that can be joined to construct survey tracks.
<br>It differs from the standard point exemplar as having continuous datapoints allows for certain procedures to be applied in Stage 3 that may be more efficient and precise.

## Setup

### Import the package

In [1]:
from sampley import *

### Set the input folder
To run this exemplar, download the mock data files, put them in a folder, and set the path to the folder below.

In [2]:
input_folder = './input/'

### Set the output folder
To run this exemplar, make a folder to save the outputs in and set the path to the folder below.

In [3]:
output_folder = './output/'

## Stage 1
In Stage 1, we import a single file (```trackpoints.csv```) to make a ```DataPoints``` object, from which we then make a ```Sections``` object.
<br>Although we use a CSV file in this exemplar, there are other options for file types (including XLSX, GPKG, and SHP files). Please see the Stage 1 exemplar (```stage-1.ipynb```) in the horizontal exemplars folder or the User Manual for more details. Note that, regardless of the input file type, once any ```DataPoints``` and/or ```Sections``` objects have been made, the subsequent processing will be the same.

In [4]:
u_trackpoints = DataPoints.from_file(
    filepath=input_folder+'trackpoints.csv',
    x_col='lon',
    y_col='lat',
    crs_input='EPSG:4326',
    crs_working='EPSG:32619',
    datetime_col='datetime',
    tz_input='UTC-05:00',
    section_id_col='section_id'  
)

Success: file successfully input.
Success: x and y (lon/lat) coordinates successfully parsed.
Success: reprojected to CRS 'EPSG:32619'
Success: the column 'datetime' successfully reformatted to datetimes.
Success: the timezone of column 'datetime' successfully set to 'UTC-05:00'.
Success: datapoint IDs successfully generated.


In [5]:
u_sections = Sections.from_datapoints(datapoints=u_trackpoints)

## Stage 2
In Stage 2, we use the ```DataPoints``` object containing sightings data to make a ```Presences``` object which we thin with a spatial threshold of 10000 m and a temporal threshold of 5 days.
<br>Then, we use that ```Presences``` object and the ```Sections``` object to make an ```AbsenceLines``` object with the same thresholds.
<br>Finally, we use the ```AbsenceLines``` object to make an ```Absences``` object which we also thin with the same thresholds.

In [6]:
u_presences = Presences.delimit(
    datapoints=u_trackpoints,
    presence_col='individuals')
u_presences.thin(
    sp_threshold=10000,
    tm_threshold=5,
    tm_unit='day')

In [7]:
u_absencelines = AbsenceLines.delimit(
    sections=u_sections,
    presences=u_presences,
    sp_threshold=10000,
    tm_threshold=5,
    tm_unit='day',
)

Note: absence lines to be generated with a temporal threshold of 5 day(s).


In [8]:
u_absences = Absences.delimit(
    absencelines=u_absencelines,
    var='along',
    target=20,
    dfls=None)
u_absences.thin(
    sp_threshold=10000,
    tm_threshold=5,
    tm_unit='day',
    target=9)

## Stage 3
In Stage 3, we make a ```Samples``` object from the ```DataPoints``` object, the ```Presences``` object, the ```Absences``` object, and the ```Sections``` object.
<br>_Note that, as the absence lines are made from sections that are made from datapoints, we can match those datapoints to the absences by their distance from the beginning of the absences lines in order to give values to the absences. To do so we must input the sections (```sections=u_sections```)._

In [9]:
u_samples = Samples.point(
    datapoints=u_trackpoints,
    presences=u_presences,
    absences=u_absences,
    cols=['individuals', 'bss'],
    sections=u_sections)

## Output
Finally, we save the ```Samples``` object to the output folder.

In [10]:
u_samples.save(
    folder=output_folder,
    filetype='csv'
)

In the output folder, there should be two new CSVs: the first should have the same name as the ```Samples``` object (run the box below to see the name) while the second should also have this name but with ```-parameters``` added at the end.

In [11]:
u_samples.name

'samples-presences-trackpoints-+-absences-a-10000m-5day'

The first CSV should contain the samples, like those shown in the box below. 
<br>In this dataframe, each row represents a given presence or absence, i.e., a sample. 
<br>The column ```point``` delimits the location of the presence/absence.
<br>At the end are the data columns. In this particular example, they are ```p-a``` (presence-absence), ```individuals```, and ```bss```.

In [12]:
u_samples.samples

Unnamed: 0,point_id,point,date,p-a,individuals,bss,datapoint_id
0,p01,POINT (579166.78 4742872.701),2019-01-25,1,1.0,2,d0004
1,p03,POINT (548599.876 4742700.214),2019-01-25,1,5.0,2,d0082
2,p04,POINT (520909.741 4714855.058),2019-02-02,1,1.0,1,d0480
3,p05,POINT (532548.249 4714899.835),2019-02-02,1,2.0,1,d0510
4,p06,POINT (512817.407 4705582.465),2019-02-02,1,1.0,1,d0910
5,p08,POINT (654449.136 4716189.584),2019-02-05,1,5.0,1,d1306
6,p10,POINT (643532.681 4716066.52),2019-02-05,1,1.0,2,d1336
7,p11,POINT (629124.489 4706545.106),2019-02-05,1,3.0,2,d1527
8,p12,POINT (620560.818 4697116.949),2019-02-05,1,2.0,3,d1715
9,a01,POINT (566513.11 4742813.278),2019-01-25,0,,2,d0036


The second CSV should contain the parameters, like those shown in the box below (but arranged in a table). This information may prove useful if, later, we need to know  how the samples were generated.

In [13]:
u_samples.parameters

{'approach': 'point',
 'resampled': 'datapoints',
 'presences_name': 'presences-trackpoints',
 'presences_crs': 'EPSG:32619',
 'presences_sp_threshold': 10000,
 'presences_tm_threshold': 5,
 'presences_tm_unit': 'day',
 'absences_name': 'absences-a-10000m-5day',
 'absences_var': 'along',
 'absences_target': 20,
 'absences_crs': 'EPSG:32619',
 'absences_sp_threshold': 10000,
 'absences_tm_threshold': 5,
 'absences_tm_unit': 'day'}