# PNAS paper submission analysis

Analysing the wind characteristics of streaked shearwaters as they approach foraging points allows us to examine what conditions the birds travel in throughout foraging trips. 

## Previous study

In the most similar previous study ([Nevitt et al., 2008](https://www.pnas.org/content/105/12/4576)) investigating similar phenomena, the tracks of albatross approaching foraging points were examined and classified by their characteristics (in 4 groups: Direct, Turn, Zigzag, and Circle):

<p align="center">
   <img src=https://www.pnas.org/cms/10.1073/pnas.0709047105/asset/eb810a4f-38f4-4a23-beee-eb585774c043/assets/graphic/zpq0080896790001.jpeg alt="Albatross tracks approaching foraging" width="400">
</p>

And the histograms of relative wind bearings were examined for those categories:

 <p align="center">
    <img src=https://www.pnas.org/cms/10.1073/pnas.0709047105/asset/7d4e5b13-0f76-4fc1-b230-f77569c6a300/assets/graphic/zpq0080896790002.jpeg alt="Relative wind histograms" width="400">
 </p>
<!-- 
<p align="center>
   <img src= alt="Relative wind histograms">
</p> -->

This study used recordings of wandering albatross with GPS and stomach temperature transmitters. Foraging points were estimates as periods with rapid drops in stomach temperature, and surface landings taken from times when birds moved at under 2.8 m/s. Wind directions and speeds from all touchdown points were estimated using QuikSCAT daily level 3 gridded ocean wind vectors. This data comes from the SeaWinds scatterometer and records wind vector fields twice daily. Wind values cannot be produced for locations within 30km of land/ice. Backscatter recorded by the satellite is then processed through a Ku-band geophysical model function which derives surface wind speeds and direction. Rain contamination can be an issue due to difficulty in separating the backscatter effect of wind vs rain. The level value refers to the processing level of the data, starting at 0 (raw) through to 3 or 4.

In [1]:
using DataFrames, CSV, RCall, Plots, Geodesy, Dates, Distances, Statistics, Glob

│ 1: Setting LC_COLLATE failed, using "C" 
│ 2: Setting LC_TIME failed, using "C" 
│ 3: Setting LC_MESSAGES failed, using "C" 
│ 4: Setting LC_MONETARY failed, using "C" 
└ @ RCall /Users/aran/.julia/packages/RCall/6kphM/src/io.jl:172


In [2]:
# FUNCTION FOR READING IN FORAGING AND WIND DATASETS
function readDat(dataLocation, pattern, IDpattern, colnames, header, years, DateFormats)
    files = glob(pattern, dataLocation)
    yrIDs = unique(getindex.(match.(r"(\d+)Shearwater.*",files),1) .* "_" .* getindex.(match.(IDpattern,files),1))
    ret = [DataFrame() for _ in 1:length(yrIDs)]
    for tg in 1:length(ret)
        tgFiles = files[occursin.(yrIDs[tg][1:4],files) .& occursin.("/"*yrIDs[tg][6:end],files)]
        for file in tgFiles
            append!(ret[tg], hcat(CSV.read(file, DataFrame, header = header), repeat([yrIDs[tg]], nrow(CSV.read(file, DataFrame, header = header)))), cols = :union)
        end
        rename!(ret[tg], colnames)
        # assign datetime
        ret[tg].DT = DateTime.(ret[tg].DT, DateFormats[occursin.(yrIDs[tg][1:4],years)])
    end
    return ret
end
# file locations for foraging and wind estimates
if Sys.iswindows()
    dataloc = "E:/My Drive/PhD/Data/"
else
    dataloc = "/Volumes/GoogleDrive-112399531131798335686/My Drive/PhD/Data/"
end
# bring in FORAGING AND GPS DATA
fDat = readDat(dataloc,"*/*/*/*/PredictedForage/*ForageGPS.txt",r".*PredictedForage/(.*)-20.*",[:DT,:lat,:lon,:forage,:yrID],1,["2018","2019"],[dateformat"d/m/y H:M:S.s",dateformat"d-u-y H:M:S.s"])

# bring in WIND ESTIMATES
wDat = readDat(dataloc, "*/*/MinDat/*.csv", r".*MinDat/(.*).csv", [:DT,:lat,:lon,:head,:X,:Y,:yrID], 0, ["2018","2019"],[dateformat"y-m-d H:M:S",dateformat"y-m-d H:M:S"]);

In [176]:
# add distance (m) and speed (kph) values
function dist(lat1,lon1,lat2,lon2)
    Ll1 = LLA(lat1,lon1)
    Ll2 = LLA(lat2,lon2)
    utmz = UTMZfromLLA(wgs84)
    dx = diff(DataFrame(map(utmz,[Ll1,Ll2])).x)[1]
    dy = diff(DataFrame(map(utmz,[Ll1,Ll2])).y)[1]
    return sqrt(dx^2 + dy^2)
end
function speed(dt,lat,lon)
    tdiff = Dates.value.(Second.(diff(dt)))
    spTrav = (dist.(lat[1:(end-1)],lon[1:(end-1)],lat[2:end],lon[2:end])./tdiff).*3.6
    return spTrav
end

# find the nearest time (index)
function findNearest(dt, time)
    argmin(abs.(dt .- time))
end
# calculate linearity
function linearity(dt,lat,lon,distance,twindow)
    out = repeat([NaN], length(dt))
    for b = 1:findNearest(dt,(dt[end] - Minute(twindow)))
        nextPoint = b + findNearest(dt, dt[b] + (Minute(twindow))) - 1
        if abs(dt[nextPoint] - (dt[b] + Minute(twindow))) < Minute(5)
            out[b] = dist(lat[b],lon[b],lat[nextPoint],lon[nextPoint])/sum(distance[b:nextPoint])
        else
            out[b] = NaN
        end
    end
    return out
end

for x in fDat
    x.distTrav = [dist.(x.lat[1:(end-1)],x.lon[1:(end-1)],x.lat[2:end],x.lon[2:end]);NaN]
    x.spTrav = [speed(x.DT, x.lat, x.lon);NaN]
    x.linearity = linearity(x.DT,x.lat,x.lon,x.distTrav,51)
end

In [173]:
# assign data to the wind data
df = vcat(fDat...)
for wf in wDat
    for b = 1:nrow(wf.DT)
        if any(df.DT > wf.DT[b] & df.forage == 1 & df.yrID == wf.yrID
            nxtFor = 
        else
            break
        end


9812.85409146625

Unnamed: 0_level_0,DT,lat,lon,forage,yrID,distTrav,spTrav,linearity
Unnamed: 0_level_1,DateTime,Float64,Float64,Int64,String,Float64,Float64,Float64
1,2019-08-23T04:18:02,39.3895,142.02,0,2019_1_S1,95.1508,31.1403,0.386628
2,2019-08-23T04:18:13,39.3894,142.021,0,2019_1_S1,1204.81,13.025,0.385313
3,2019-08-23T04:23:46,39.3855,142.035,0,2019_1_S1,13.5378,16.2454,0.383849
4,2019-08-23T04:23:49,39.3854,142.035,0,2019_1_S1,45.8906,27.5344,0.386047
5,2019-08-23T04:23:55,39.385,142.034,0,2019_1_S1,57.4482,51.7034,0.38939
6,2019-08-23T04:23:59,39.3848,142.035,0,2019_1_S1,64.8943,46.7239,0.387473
7,2019-08-23T04:24:04,39.3849,142.036,0,2019_1_S1,34.1442,24.5838,0.387325
8,2019-08-23T04:24:09,39.3852,142.036,0,2019_1_S1,100.876,60.5253,0.387841
9,2019-08-23T04:24:15,39.3845,142.037,0,2019_1_S1,26.5088,23.858,0.390667
10,2019-08-23T04:24:19,39.3845,142.037,0,2019_1_S1,41.3422,24.8053,0.392704


In [99]:
# out=repeat([NaN],length(fDat[1].DT))
b = 30
nextPoint = b + findNearest(fDat[1].DT, (fDat[1].DT[b] + Minute(51))) - 1
abs(fDat[1].DT[nextPoint] - (fDat[1].DT[b] + Minute(51))) < Minute(5)
# out[b] = 
dist([fDat[1].lat[b],fDat[1].lat[nextPoint]],[fDat[1].lon[1],fDat[1].lon[nextPoint]])/sum(fDat[1].distTrav[b:nextPoint])
# sum(fDat[1].distTrav[b:nextPoint])
# dist([fDat[1].lat[b],fDat[1].lat[nextPoint]],[fDat[1].lon[1],fDat[1].lon[nextPoint]])/sum(fDat[1].distTrav[b:nextPoint])


UndefVarError: UndefVarError: array not defined

In [182]:
wDat[1].yrID

151-element Vector{String}:
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 ⋮
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"
 "2019_1_S1"