Getting started

Using mosey_db

This section assumes you have a mosey.db sqlite file. See below if you want to create and populate a new database.

library(DBI)
library(dplyr)
library(RSQLite)

.wd <- '~/projects/mycoolproject/analysis'
.dbPF <- file.path(.wd,'data/mosey.db')

db <- DBI::dbConnect(RSQLite::SQLite(), .dbPF)

invisible(assert_that(length(dbListTables(db))>0))

stdtb <- tbl(db,'study')
indtb <- tbl(db,'individual')
evttb <- tbl(db,'event')

Each example below shows you how to run a query on the database using dplyr, and the equivilent syntax using sql.

Get info on all studies in the database

dplyr

stdtb %>% select(study_id,study_name) %>% as_tibble

sql

'select study_id, study_name from study' %>%
  dbGetQuery(db,.)

How many individuals are in the database?

Note the differences between these two approaches. The first pulls all data into R and then counts the rows. The second counts the rows in the database and just returns a single number.

dplyr

indtb %>% as_tibble %>% nrow

sql

'select count(*) as num from individual' %>%
  dbGetQuery(db,.)

How many individuals per study?

dplyr

indtb %>% group_by(study_id) %>% summarize(num=n())

sql

'select study_id, count(*) as num 
  from individual 
  group by study_id' %>%
  dbGetQuery(db,.)

How many locations per individual, for a particular study?

dplyr

evttb %>% 
  inner_join(indtb,by='individual_id') %>% 
  filter(study_id==12345) %>% 
  group_by(individual_id) %>%
  summarize(num=n())

sql

'select i.individual_id, count(*) as num
  from event e 
  inner join individual i
  on e.individual_id = i.individual_id
  where study_id = 12345
  group by i.individual_id' %>%
  dbGetQuery(db,.)

Joining to a really large event table can take a long time. An alternative way to run this query is to use an sql subquery. I dont think this approach is possible using dplyr.

'select individual_id, count(*) as num
  from event e 
  where individual_id in (
    select individual_id from individual where study_id = 12345
  )
  group by individual_id' %>%
  dbGetQuery(db,.)

Find start and end dates, per project

dplyr

evttb %>% 
  inner_join(indtb,by='individual_id') %>% 
  group_by(study_id) %>%
  summarize(min=min(timestamp), max=(timestamp))

sql

'select i.individual_id, min(timestamp) as min, max(timestamp) as max
  from event e 
  inner join individual i
  on e.individual_id = i.individual_id
  group by i.study_id' %>%
  dbGetQuery(db,.)

Get all data between start and end dates, for all projects

dplyr

evttb %>% 
  filter(between(timestamp,'2019-09-01','2019-10-01'))

sql

'select * from event 
  where timestamp between "2019-09-01" and "2019-10-01"' %>%
  dbGetQuery(db,.)

Create and populate a new movebankdb

Create a new database

See wf_create_db.sh for example script.

Set up the environment. Set MOSEYDB_SRC to the location of the mosey_db source code

wd=~/projects/myproj/analysis/mosey_db
export MOSEYDB_SRC=~/projects/mosey_db/src

cd $wd

mkdir -p data

Create the database

cat $MOSEYDB_SRC/db/create_db.sql | sqlite3 data/mosey.db

You can already view the database, even without any data. One great option is to use db browser

Populate the database

The mosey_db scripts will download data from movebank, then clean and load the data into the database.

Auth file

In order to access movebank, make a file called auth.yml and put it into the working directory. Add your movebank credentials to this file. See examples/auth.yml for an example.

Study file

You specify which studies to download by creating a file in ctfs/study.csv. See examples/study.csv for an example. Use the field run to specify whether a study should be processed. Changing the value to a 1 will process the study and a 0 will ignore the study.

Load script

Run the load studies script to download, clean, and import data.

$MOSEYDB_SRC/db/load_studies.sh

The raw data is downloaded into the ./<study_id>/raw directory. After it is cleaned the data is saved into the ./<study_id>/clean directory. mosey_db then loads the clean data into the database.

You can specify a different directory to hold the raw and clean csvs (e.g. an external disk)

csvdir=/ExternalDrive/projects/myproj/analysis/mosey_db/csvs

$MOSEYDB_SRC/db/load_studies.sh --csvdir $csvdir

You can also specify a different directory to hold the database

db=/ExternalDrive/projects/myproj/analysis/mosey_db/data/mosey.db

$MOSEYDB_SRC/db/load_studies.sh --db $db

You can specify which of the subprocesses to run. To specify which subprocesses to run, use the process parameter with the string 'dciv' (d=download, c=clean, i=import, v=validate).

For example, you might have already downloaded the data so you just need to clean, import and validate. In this case do the following.

$MOSEYDB_SRC/db/load_studies.sh --process civ

TOOD: examples using seperate download, clean, import, validate scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
db		db
examples		examples
figs		figs
funs		funs
poc		poc
.gitignore		.gitignore
Readme.md		Readme.md
fields.csv		fields.csv
project_notes.md		project_notes.md
sensor_type.csv		sensor_type.csv
startup.r		startup.r

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Using mosey_db

Get info on all studies in the database

dplyr

sql

How many individuals are in the database?

dplyr

sql

How many individuals per study?

dplyr

sql

How many locations per individual, for a particular study?

dplyr

sql

Find start and end dates, per project

dplyr

sql

Get all data between start and end dates, for all projects

dplyr

sql

Create and populate a new movebankdb

Create a new database

Populate the database

Auth file

Study file

Load script

About

Releases

Packages

Languages

benscarlson/mosey_db

Folders and files

Latest commit

History

Repository files navigation

Getting started

Using mosey_db

Get info on all studies in the database

dplyr

sql

How many individuals are in the database?

dplyr

sql

How many individuals per study?

dplyr

sql

How many locations per individual, for a particular study?

dplyr

sql

Find start and end dates, per project

dplyr

sql

Get all data between start and end dates, for all projects

dplyr

sql

Create and populate a new movebankdb

Create a new database

Populate the database

Auth file

Study file

Load script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages