# ```sampley``` exemplar: the point approach
Before going through this exemplar, please consult the Introduction to sampley exemplars (```intro.ipynb```).
<br>This exemplar illustrates an application of the point approach to data contained within two files: one containing survey tracks (```sections.gpkg```) and one containing sightings data (```sightings.gpkg```).

## Setup

### Import the package

In [1]:
from sampley import *

### Set the input folder
To run this exemplar, download the mock data files, put them in a folder, and set the path to the folder below.

In [2]:
input_folder = './input/'

### Set the output folder
To run this exemplar, make a folder to save the outputs in and set the path to the folder below.

In [3]:
output_folder = './output/'

## Stage 1
In Stage 1, we import two files (```sightings.csv``` and ```sections.gpkg```) and from them make a ```DataPoints``` and a ```Sections``` object, respectively.
<br>Although we use a CSV file and a GPKG file in this exemplar, there are other options for file types (including XLSX and SHP files). Please see the Stage 1 exemplar (```stage-1.ipynb```) in the horizontal exemplars folder or the User Manual for more details. Note that, regardless of the input file type, once any ```DataPoints``` and/or ```Sections``` objects have been made, the subsequent processing will be the same.

In [4]:
u_sightings = DataPoints.from_file(
    filepath=input_folder+'sightings.csv',
    x_col='lon',
    y_col='lat',
    crs_input='EPSG:4326',
    crs_working='EPSG:32619',
    datetime_col='datetime',
    tz_input='UTC-05:00'
)

Success: file successfully input.
Success: x and y (lon/lat) coordinates successfully parsed.
Success: reprojected to CRS 'EPSG:32619'
Success: the column 'datetime' successfully reformatted to datetimes.
Success: the timezone of column 'datetime' successfully set to 'UTC-05:00'.
Success: datapoint IDs successfully generated.


In [5]:
u_sections = Sections.from_file(
    filepath=input_folder+'sections.gpkg',
    crs_working='EPSG:32619',
    datetime_col='datetime_beg',
    tz_input='UTC-05:00'
)

Success: file successfully input.
Success: reprojected to CRS 'EPSG:32619'
Success: the column 'datetime_beg' successfully reformatted to datetimes.
Success: the timezone of column 'datetime_beg' successfully set to 'UTC-05:00'.
Note: column 'datetime_beg' renamed to 'datetime'.
Success: section IDs successfully generated.


## Stage 2
In Stage 2, we use the ```DataPoints``` object containing sightings data to make a ```Presences``` object which we thin with a spatial threshold of 10000 m and a temporal threshold of 5 days.
<br>Then, we use that ```Presences``` object and the ```Sections``` object to make an ```AbsenceLines``` object with the same thresholds.
<br>Finally, we use the ```AbsenceLines``` object to make an ```Absences``` object which we also thin with the same thresholds as well as a target equal to the number of presences kept after thinning.

In [6]:
u_presences = Presences.delimit(datapoints=u_sightings)
u_presences.thin(
    sp_threshold=10000,
    tm_threshold=5,
    tm_unit='day')

In [7]:
u_absencelines = AbsenceLines.delimit(
    sections=u_sections,
    presences=u_presences,
    sp_threshold=10000,
    tm_threshold=5,
    tm_unit='day',
)

Note: absence lines to be generated with a temporal threshold of 5 day(s).


In [8]:
u_absences = Absences.delimit(
    absencelines=u_absencelines,
    var='along',
    target=20)
u_absences.thin(
    sp_threshold=10000,
    tm_threshold=5,
    tm_unit='day',
    target=len(u_presences.kept))

## Stage 3
In Stage 3, we make a ```Samples``` object from the ```DataPoints``` object, the ```Presences``` object, and the ```Absences``` object.

In [9]:
u_samples = Samples.point(
    datapoints=u_sightings,
    presences=u_presences,
    absences=u_absences,
    cols=['individuals'])

## Output
Finally, we save the ```Samples``` object to the output folder.

In [10]:
u_samples.save(
    folder=output_folder,
    filetype='csv'
)

In the output folder, there should be two new CSVs: the first should have the same name as the ```Samples``` object (run the box below to see the name) while the second should also have this name but with ```-parameters``` added at the end.

In [11]:
u_samples.name

'samples-presences-sightings-+-absences-a-10000m-5day'

The first CSV should contain the samples, like those shown in the box below. 
<br>In this dataframe, each row represents a given presence or absence, i.e., a sample. 
<br>The column ```point``` delimits the location of the presence/absence.
<br>At the end are the data columns. In this particular example, they are ```p-a``` (presence-absence) and ```individuals```.

In [12]:
u_samples.samples

Unnamed: 0,point_id,point,date,datapoint_id,p-a,individuals
0,p01,POINT (579166.78 4742872.701),2019-01-25,d01,1,1.0
1,p03,POINT (548599.876 4742700.214),2019-01-25,d03,1,5.0
2,p04,POINT (520909.741 4714855.058),2019-02-02,d04,1,1.0
3,p05,POINT (532548.249 4714899.835),2019-02-02,d05,1,2.0
4,p07,POINT (504710.41 4705553.392),2019-02-02,d07,1,3.0
5,p08,POINT (654449.136 4716189.584),2019-02-05,d08,1,5.0
6,p10,POINT (643532.681 4716066.52),2019-02-05,d10,1,1.0
7,p11,POINT (629124.489 4706545.106),2019-02-05,d11,1,3.0
8,p13,POINT (611976.857 4696974.111),2019-02-05,d13,1,4.0
9,a01,POINT (528139.94 4742605.103),2019-01-25,,0,


The second CSV should contain the parameters, like those shown in the box below (but arranged in a table). This information may prove useful if, later, we need to know  how the samples were generated.

In [13]:
u_samples.parameters

{'approach': 'point',
 'resampled': 'datapoints',
 'presences_name': 'presences-sightings',
 'presences_crs': 'EPSG:32619',
 'presences_sp_threshold': 10000,
 'presences_tm_threshold': 5,
 'presences_tm_unit': 'day',
 'absences_name': 'absences-a-10000m-5day',
 'absences_var': 'along',
 'absences_target': 20,
 'absences_crs': 'EPSG:32619',
 'absences_sp_threshold': 10000,
 'absences_tm_threshold': 5,
 'absences_tm_unit': 'day'}