**Personal Information** <br>
Student: *Ruun Streur* <br>
StudentID: *12751901* <br>
Email: *ruun.streur@student.uva.nl* <br>
Project: *MSc Data Science Thesis: Assessing the optimal temporal resolution for the CampTrap DP standard - Exploratory Data Analysis* <br>
Supervisor: *Dr. rer. nat. W.D. Kissling* <br>
Submission date: *22/03/2024* <br>
*Universiteit van Amsterdam*


**Data Context**

Due to the nature of the project this EDA will slightly defer from the standard Data Science EDA (provided by https://www.kaggle.com/code/ekami66/detailed-exploratory-data-analysis-with-python ). In this EDA the basic structure of the CamtrapDP standard format for Camera Trap data will be elaborated upon, and our personal dataset from the Artis Zoo in Amsterdam will be analysed. <br>
Camera Trap Data Package (or Camtrap DP for short) is a community developed data exchange format for camera trap data. A Camtrap DP is a Frictionless Data Package that consists of <br>
*datapackage.json*:	Metadata about the data package and camera trap project.<br>
*deployments.csv*:	Table with camera trap placements (deployments).<br>
*media.csv*:	        Table with media files recorded during deployments.<br>
*observations.csv*:	Table with observations derived from the media files.<br>

For this EDA only the deployments and observations data will be of use.

In [1]:
# Library imports and plotstyle

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

plt.style.use('bmh')

In [2]:
# Data import into variables

observations = pd.read_csv('observations.csv')
deployments = pd.read_csv('deployments.csv')

print('Length of observations.csv: ', len(observations))
print('Length of deployments.csv: ', len(deployments))


Length of observations.csv:  36543
Length of deployments.csv:  21


First we will examine the contents of the observations df

In [3]:
# Non-null information and structure of observations df
print(observations.info())
observations.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36543 entries, 0 to 36542
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   observationID              36543 non-null  object 
 1   deploymentID               36543 non-null  object 
 2   mediaID                    4892 non-null   object 
 3   eventID                    31651 non-null  object 
 4   eventStart                 31651 non-null  object 
 5   eventEnd                   31651 non-null  object 
 6   observationLevel           36543 non-null  object 
 7   observationType            36543 non-null  object 
 8   scientificName             22983 non-null  object 
 9   count                      36543 non-null  int64  
 10  lifeStage                  11520 non-null  object 
 11  sex                        11888 non-null  object 
 12  behavior                   0 non-null      float64
 13  individualID               0 non-null      flo

Unnamed: 0,observationID,deploymentID,mediaID,eventID,eventStart,eventEnd,observationLevel,observationType,scientificName,count,...,behavior,individualID,bboxX,bboxY,bboxWidth,bboxHeight,classificationMethod,classifiedBy,classificationProbability,observationComments
0,obs_20220801001635_artis_18_wildlifecamera1_20...,artis_18_wildlifecamera1,,artis_18_wildlifecamera1_2022-08-01_00-16-35_(...,2022-07-31T22:16:35+00Z,2022-07-31T22:16:43+00Z,event,blank,,0,...,,,,,,,human,Jitske Schreijer,,
1,obs_20220801001635_artis_18_wildlifecamera1_20...,artis_18_wildlifecamera1,,artis_18_wildlifecamera1_2022-08-01_00-16-35_(...,2022-07-31T22:16:35+00Z,2022-07-31T22:16:43+00Z,event,blank,,0,...,,,,,,,human,Jitske Schreijer,,
2,obs_20220801003924_artis_18_wildlifecamera1_20...,artis_18_wildlifecamera1,,artis_18_wildlifecamera1_2022-08-01_00-39-24_(...,2022-07-31T22:39:24+00Z,2022-07-31T22:39:32+00Z,event,blank,,0,...,,,,,,,machine,Western Europe species model Version 2 (deprec...,,
3,obs_20220801120024_artis_18_wildlifecamera1_20...,artis_18_wildlifecamera1,artis_18_wildlifecamera1_2022-08-01_12-00-24_(12),,,,media,blank,,0,...,,,,,,,human,Jitske Schreijer,,
4,obs_20220802120027_artis_18_wildlifecamera1_20...,artis_18_wildlifecamera1,artis_18_wildlifecamera1_2022-08-02_12-00-27_(13),,,,media,blank,,0,...,,,,,,,machine,Western Europe species model Version 2 (deprec...,,


As we can see many columns contain a lot of NaN entries, most notably 'classificationProbability', 'individualID', 'behavior' with only NaN entries and all the bbox columns are mostly empty. The other columns, although not all complete, contain information that might be interesting.

Column explanations:

**Observations**
|Column|Definition|
|---|---|
|observationID |Unique identifier of the observation.|
|deploymentID|Identifier of the deployment the observation belongs to.|
|mediaID|Identifier of the media file that was classified.|
|eventID|Identifier of the event the observation belongs to.|
|eventStart and eventEnd|Date and time at which the event started/ended|
|observationType|Type of the observation|
|scientificName|Scientific name of the observed individual(s).|
|count|Number of observed individuals|
|lifeStage|Age class or life stage of the observed individual|
|sex|Sex of the observed individual|
|classificationMethod|Method (most recently) used to classify the observation.|
|classifiedBy|Name or identifier of person or AI that classified the observation|
|observationComments|Comments or notes about the observation|

|

In [4]:
# Non-null information and structure of deployments df
print(deployments.info())
deployments.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   deploymentID           21 non-null     object 
 1   locationID             0 non-null      float64
 2   latitude               21 non-null     float64
 3   longitude              21 non-null     float64
 4   deploymentStart        21 non-null     object 
 5   deploymentEnd          21 non-null     object 
 6   cameraID               21 non-null     object 
 7   cameraModel            21 non-null     object 
 8   coordinateUncertainty  15 non-null     float64
 9   cameraHeight           20 non-null     float64
 10  cameraHeading          20 non-null     float64
 11  baitUse                21 non-null     bool   
 12  habitatType            0 non-null      float64
 13  deploymentTags         0 non-null      float64
 14  deploymentComments     0 non-null      float64
dtypes: bool(

Unnamed: 0,deploymentID,locationID,latitude,longitude,deploymentStart,deploymentEnd,cameraID,cameraModel,coordinateUncertainty,cameraHeight,cameraHeading,baitUse,habitatType,deploymentTags,deploymentComments
0,artis_27_wildlifecamera1,,52.364714,4.918863,2021-10-04T12:00:00+0200Z,2023-12-07T12:32:00+0100Z,SY2103000161,SnyperCommander4G-SY4.0CG-Rtimelapsecamera,7.521,0.2,180.0,False,,,
1,artis_22_wildlifecamera1,,52.366402,4.91604,2021-08-19T16:38:00+0200Z,2022-12-31T00:00:00+0100Z,SY2103000164,SnyperCommander4G-SY4.0CG-Rwildlifecamera,0.3405,0.2,90.0,False,,,
2,artis_18_wildlifecamera1,,52.365167,4.918036,2021-08-20T15:53:00+0200Z,2024-01-15T22:08:18+0100Z,SY2103000168,SnyperCommander4G-SY4.0CG-Rwildlifecamera,0.7012,0.2,45.0,False,,,
3,artis_24_wildlifecamera1,,52.365634,4.916194,2021-08-20T15:28:00+0200Z,2023-12-07T12:35:00+0100Z,SY2103000169,SnyperCommander4G-SY4.0CG-Rwildlifecamera,0.4613,0.2,315.0,False,,,
4,artis_20_wildlifecamera1,,52.367182,4.914594,2021-08-21T05:49:00+0200Z,2022-12-06T12:05:00+0100Z,SY2103000170,SnyperCommander4G-SY4.0CG-Rwildlifecamera,0.1966,0.2,90.0,False,,,


As we can see not many columns contain NaN entries, only 'locationID', 'habitatType', 'doploymentTags' and 'deploymentComments' with only NaN entries.
We have 21 different camera deployments in our dataset.

In [None]:
CAMERA PLACEMENT

In [None]:
DEPLOYMENT DURATIONS