-
Notifications
You must be signed in to change notification settings - Fork 6
Expand file tree
/
Copy pathg_flexmanip2post.Rmd
More file actions
3683 lines (2770 loc) · 246 KB
/
g_flexmanip2post.Rmd
File metadata and controls
3683 lines (2770 loc) · 246 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Bayesian reinforcement learning models reveal how great-tailed grackles improve their behavioral flexibility in serial reversal learning experiments."
author:
- '[Lukas D](http://dieterlukas.mystrikingly.com)^1^*'
- '[McCune KB](https://www.kelseymccune.com)^2^'
- '[Blaisdell AP](http://pigeonrat.psych.ucla.edu)^3^'
- 'Johnson-Ulrich Z^2^'
- '[MacPherson M](http://maggiepmacpherson.com)^2^'
- '[Seitz B](https://benjaminseitz.wixsite.com/mysite)^3^'
- 'Sevchik A^4^'
- '[Logan CJ](http://CorinaLogan.com)^1,2^'
always_allow_html: yes
output:
pdf_document:
keep_tex: yes
latex_engine: xelatex
github_document:
toc: true
html_document:
toc: true
toc_depth: 4
toc_float:
collapsed: false
code_folding: hide
md_document:
toc: true
bibliography: MyLibrary.bib
csl: Peer-Community-Journal-PCI.csl
#https://raw.githubusercontent.com/citation-style-language/styles/master/apa.csl
urlcolor: blue
classoption: fleqn #allows equations in asmath to be aligned left on the PDF page
header-includes:
- \usepackage[left]{lineno}
- \linenumbers
- \usepackage{fancyhdr}
- \usepackage{pdflscape} #allows you to call landscape for certain pages using the commands on the next 2 lines
- \newcommand{\blandscape}{\begin{landscape}}
- \newcommand{\elandscape}{\end{landscape}}
- \usepackage{mathtools} #for equations, but maybe not needed because asmath is there
- \usepackage{amsmath} #for equations
- \usepackage{amssymb} #for Greek symbols in equations
---
**Affiliations:** 1) Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 2) University of California Santa Barbara, USA, 3) University of California Los Angeles, USA, 4) Arizona State University, Tempe, AZ USA. \*Corresponding author: [dieter_lukas\@eva.mpg.de](mailto:dieter_lukas@eva.mpg.de){.email}
```{r preparermd, include=FALSE}
library(knitr)
library(formatR)
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=70),tidy=TRUE)
#Make code chunks wrap text so it doesn't go off the page when knitting to PDF
knitr::opts_chunk$set(echo=F, include=T, results='asis', warning=F, message=F)
#sets global options to display code along with the results https://exeter-data-analytics.github.io/LitProg/r-markdown.html
#set echo=F for knitting to PDF (hide code), and echo=T for knitting to HTML (show code)
### make a bibtex file that has only the references cited in this rmd
#load the Rmd file
Rmd <- readChar("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/g_flexmanip2post.Rmd",nchars=1e7)
#find all in text citations that start with @, but are preceded by a space
pattern <- "\\ @(.*?)\\ "
m <- regmatches(Rmd,gregexpr(pattern,Rmd))[[1]]
m
res<- gsub("\\ ","",m) #delete spaces
res<- gsub("\\]","",res) #delete ]
res<- gsub("\\;","",res) #delete ;
res<- gsub("\\,","",res) #delete ,
res<- gsub("\\.","",res) # delete .
#find all in text citations that start with @, but are preceded by a "["
pattern2 <- "\\[@(.*?)\\ "
m2 <- regmatches(Rmd,gregexpr(pattern2,Rmd))[[1]]
m2
res2<- gsub("\\[","",m2)
res2<- gsub("\\]","",res2)
res2<- gsub("\\]","",res2)
res2<- gsub("\\;","",res2)
res2<- gsub("\\,","",res2)
res2<- gsub("\\.","",res2)
res2<- gsub("\\ ","",res2)
#combine both patterns
allbibtexkeys<-c(res,res2)
#write to a new file and then clean it up manually
write(allbibtexkeys,file="g_flexmanip2post_bibtexkeys.txt")
#load the cleaned txt file
allbibtexkeys<-read.csv("g_flexmanip2post_bibtexkeys.txt")
#use bib2df to convert the bibliography file into a dataframe
library(bib2df)
#load the bib from GitHub
df <- bib2df("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/MyLibrary.bib")
#remove the @ to match the entry in the bib file
allbibtexkeys2<- gsub("\\@","",allbibtexkeys[,1])
#filter the full bib to only keep the entries that are cited here
df_filtered<-df[df$BIBTEXKEY %in% allbibtexkeys2,]
#use bib2df to convert the data frame into the bibliography file that only contains the citations in this Rmd
df2bib(df_filtered, file = "_flexmanip2post_refs.bib", append = FALSE)
```
| [{width="90%"}](https://doi.org/10.24072/pci.ecology.100468) | **This [manuscript](https://ecoevorxiv.org/repository/view/3689/) ([rmd with code](https://github.com/corinalogan/grackles/blob/master/Files/Preregistrations/g_flexmanip2post.Rmd)) was peer reviewed and received a Recommendation by:** Aurélie Coulon (2024) Changes in behavioral flexibility to cope with environment instability: theoretical and empirical insights from serial reversal learning experiments. [*Peer Community in Ecology*, 100468](https://doi.org/10.24072/pci.ecology.100468). Reviewers: Maxime Dahirel and 1 anonymous reviewer. **This is the third of three manuscripts from the [preregistration](https://github.com/corinalogan/grackles/blob/master/Files/Preregistrations/g_flexmanipPassedPreStudyPeerReview26Mar2019.pdf) that was pre-study peer reviewed and received an In Principle Recommendation by:** Aurélie Coulon (2019) Can context changes improve behavioral flexibility? Towards a better understanding of species adaptability to environmental changes. [*Peer Community in Ecology*, 100019](https://doi.org/10.24072/pci.ecology.100019). Reviewers: Maxime Dahirel and Andrea Griffin|
|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
# Abstract
Environments can change suddenly and unpredictably and animals might benefit from being able to flexibly adapt their behavior through learning new associations. Serial (repeated) reversal learning experiments have long been used to investigate differences in behavioral flexibility among individuals and species. In these experiments, individuals initially learn that a reward is associated with a specific cue before the reward is reversed back and forth between cues, forcing individuals to reverse their learned associations. Cues are reliably associated with a reward, but the association between the reward and the cue frequently changes. Here, we apply and expand newly developed Bayesian reinforcement learning models to gain additional insights into how individuals might dynamically modulate their behavioral flexibility if they experience serial reversals. We derive mathematical predictions that, during serial reversal learning experiments, individuals will gain the most rewards if they 1) increase their *rate of updating associations* between cues and the reward to quickly change to a new option after a reversal, and 2) decrease their *sensitivity* to their learned association to explore the alternative option after a reversal. We reanalyzed reversal learning data from 19 wild-caught great-tailed grackles (*Quiscalus mexicanus*), eight of whom participated in serial reversal learning experiment, and found that these predictions were supported. Their estimated association-updating rate was more than twice as high at the end of the serial reversal learning experiment than at the beginning, and their estimated sensitivities to their learned associations declined by about a third. The changes in behavioral flexibility that grackles showed in their experience of the serial reversals also influenced their behavior in a subsequent experiment, where individuals with more extreme rates or sensitivities solved more options on a multi-option puzzle box. Our findings offer new insights into how individuals react to uncertainty and changes in their environment, in particular, showing how they can modulate their behavioral flexibility in response to their past experiences.
# Introduction
Most animals live in environments that undergo changes that can affect key components of their lives, such as where to find food or which areas are safe. Accordingly, individuals that cannot react to these changes should have reduced survival and/or reproductive success [@boyce2006demography; @starrfelt2012bet]. One of the ways animals react to changes is through behavioral flexibility, the ability to change behavior when circumstances change [@shettleworth2010cognition]. The level of behavioral flexibility present in a given species is often assumed to have been shaped by selection, with past levels of change in the environment determining how well species might be able to cope with more rapidly changing [@sih2013understanding] or novel environments [@sol2002behavioural]. However, in another conception, behavioral flexibility is itself plastic [@wright2010behavioral]. Behavioral flexibility arises because individuals update their information about the environment through personal experience and make that information available to other cognitive processes [@mikhalevich_is_2017]. Such modulation of behavioral flexibility is presumably relevant if the rate and extent of environmental change is variable and unpredictable [@donaldson2013unreliable; @tello2019spatial]. We are still limited in our understanding of when and how individuals might react to their experiences of environmental change.
Evidence that animals can change their behavioral flexibility based on their recent experience comes from serial reversal learning experiments. Serial reversal learning experiments have long been used to understand how individuals keep track of biologically important associations in changing environments [@dufort1954one; @mackintosh1968factors; @bitterman1975comparative]. In these experiments, individuals are presented with multiple options associated with cues, such as different colors or locations, that differ in their reward. Individuals can repeatedly choose among the options to learn the associations between rewards and cues. After they show a clear preference for the most rewarded option, the rewards are reversed across cues, and individuals are observed to see how quickly they learn the changed associations. When they have reversed their preference, the reward is changed back to the other option, until the individual reverses their preference again, and these reversals continue in a process called serial reversals. Their performance during the reversal task is taken as a measure of their behavioral flexibility, with the more flexible individuals being those that need fewer trials to consistently choose the rewarded option after a reversal [@bond2007serial]. While the primary focus of these serial reversal learning experiments has been to measure differences in behavioral flexibility across individuals and species [@lea2020behavioral], several of these experiments show that behavioral flexibility is not a fixed trait, but that individuals can improve their performance if they experience repeated reversals [@bond2007serial; @liu2016learning; @cauchoix2017cognition]. Here, we investigate how individuals might change their behavioral flexibility during serial reversal learning experiments to better understand what cognitive processes could lead to the observed differences and adjustments in behavioral flexibility [@izquierdo2017neural; @danwitz2022parameter].
We recently found that great-tailed grackles (*Quiscalus mexicanus*; hereafter grackles) can be trained to improve how quickly they learn to change associations in a serial reversal learning experiment [@logan2023flexmanippcj]. After training birds to search for food in a yellow tube, the reversal learning experiment consisted of presenting birds with a light gray and a dark gray tube, only one of which contained a reward. After individuals chose one of the tubes, thus experiencing whether this color was rewarded or not, the experiment was reset, with the reward being in the same colored tube as before. Once an individual chose the rewarded color more than expected by chance (passing criterion of choosing correctly in at least 17 out of the last 20 trials, which represents a significant association according to the chi-square test), the reward was switched to the other color. Again, individuals made choices until they chose the now rewarded tube above the passing criterion. For one set of individuals, the trained group, we repeated the reversal of rewards from one color to the other until the birds reached the serial reversal passing criterion of forming a preference in 50 trials or less in two consecutive reversals. The median number of trials birds in this trained group needed to reach the passing criterion during their first reversal was 75, which improved to 40 trials in their final reversal. Importantly, we found that, in comparison to a control group who only experienced a single reversal, trained grackles who experienced serial reversals also showed increased behavioral flexibility and innovativeness in other contexts. In particular, trained grackles performed better on multi-option puzzle boxes than control grackles, being faster to switch to a new access option on a box if the previous option was closed, and they solved more of the available access options [@logan2023flexmanippcj]. This indicates that individuals did not just learn an abstract rule about the serial reversal learning experiment, but rather changed their overall behavioral flexibility in response to their experience. To understand these changes in behavioral flexibility, we need approaches that can reflect how individuals might update their cognitive processes based on their experience.
Previous analyses of serial reversal learning experiments were limited in understanding the potential changes in behavioral flexibility because they focused on summaries of the choices that individuals make [e.g. @bond2007serial]. These approaches are more descriptive, making it difficult to link flexibility differences to specific processes and to predict how variation in behavior might transfer to other tasks. While there have been attempts to identify potential rules that individuals might learn during serial reversal learning [@spence1936nature; @warren1965primate; @warren1965comparative; @le2023mixtures], these rules were often about abstract switches to extreme behaviors (e.g. win-stay / lose-shift) and therefore could not account for the full variation of behavior. A number of theoretical models have recently been developed that appear to reflect the potential cognitive processes individuals seem to rely on when making choices in reversal learning experiments [for a recent review see, for example, @fromer2023belief]. These theoretical models deconstruct the behavior of individuals in a reversal learning task into two primary parameters [@camerer1999experience; @chow2015serial; @izquierdo2017neural; @bartolo2020prefrontal]. Importantly, in the Bayesian reinforcement learning models there are now also statistical approaches to infer these underlying parameters from the behavior of individuals [@camerer1999experience; @lloyd2013context]. The first process reflects the *rate of updating associations* (which we refer to hereafter as $\phi$, the Greek letter phi), or how quickly individuals learn about the associations between the cues and potential rewards (or dangers). In the reinforcement learning models, this rate is reflected by the Rescorla-Wagner rule [@rescorlawagner1972reinforcement]. The rate weights the most recent information proportionally to the previously accumulated information for that cue (as a proportion, the rate can range between 0 and 1; for details on the calculations see the section on the reinforcement learning model in the Methods). Individuals are expected to show different rates in different environments, particularly in response to the reliability of the cues (Figure 1). Lower updating rates are expected when associations are not perfect such that a single absence of a reward might be an error rather than indicating a new association. Higher updating rates are expected when associations are reliable such that individuals should update their associations quickly when they encounter new information [@dunlap2009components; @Breen_2023]. The second process, the *sensitivity to their learned associations* (which we hereafter refer to as $\lambda$, the Greek letter lambda) reflects how individuals, when presented with a set of cues, might decide between these alternative options based on their learned associations of the cues. In the reinforcement learning model, the sensitivity to learned associations modifies the relative difference in learned rewards to generate the probabilities of choosing either option [@daw2006cortical; @agrawal2012analysis; @danwitz2022parameter]. A value of zero means individuals do not pay attention to their learned associations, but choose randomly, whereas increasingly larger values mean that individuals show biases in choice as soon as there are small differences in their learned associations. Individuals with larger sensitivities will quickly prefer the option that previously gave them the highest reward (or the lowest danger), while individuals with lower sensitivities will continue to explore alternative options. Sensitivities are expected to reflect the rate of change in the environment (Figure 1), with larger sensitivities occurring when environments are static such that individuals start to exploit any differences they recognise as soon as possible. Lower sensitivities are expected when changes are frequent, such that individuals continue to explore alternative options when conditions change [@daw2006cortical; @Breen_2023].

**Figure 1.** Individuals are expected to update their associations and make decisions differently depending on the environment they experience. In serial reversal learning experiments, associations are reliable, such that if an option is associated with a reward, it is rewarded during every trial (white background). However, the associations between options and the rewards change across trials (solid line). In these reliable, but changing conditions, individuals are expected to gain the most rewards if they update their associations quickly (large $\phi$) to switch away from an option if it is no longer being rewarded, but to have small sensitivities to their learned associations to continue to explore all options to check if associations have changed again (small $\lambda$). In contrast, in unchanging and unreliable conditions, the probability that an option is rewarded stays constant across trials (dotted lines), but is closer to 50% (gray background). In these conditions, individuals are expected to gain the most rewards if they build their associations by averaging information across many trials (small $\phi$), and have high sensitivities to learned associations to exploit the option with the highest association (large $\lambda$). Grackle picture credit (CC BY 4.0): Dieter Lukas.
Here, we applied and modified the Bayesian reinforcement learning models to data from our grackle research on behavioral flexibility to assess if and how the cognitive processes might have changed as individuals experienced the serial reversal learning experiment. We previously found that the model can predict the performance of grackles in a reversal learning task with a single reversal of a color preference [@blaisdell2021causal]. Grackles experiencing the serial reversal learning experiment are expected to infer that associations can frequently change but that, before and after a change, cues reliably indicate whether a reward is present or not. Based on the theoretical models, we predict that individuals increase their association-updating rate because cues are highly reliable, such that they can change their associations as soon as there is a change in the reward [@dunlap2009components; @Breen_2023]. In addition, we predict that individuals reduce their sensitivity to the learned associations, because the option that is rewarded reverses frequently, requiring individuals to explore alternative options [@neftci2019reinforcement; @leimar2024flexible]. Given that reversals in the associations are not very frequent, we also expect some variation in individuals in whether they switch to the newly rewarded option because they find the reward quickly through continued exploration (somewhat lower $\lambda$ and higher $\phi$) or because they quickly move away from the option that is no longer rewarded (somewhat higher $\lambda$ and lower $\phi$). To assess these predictions, we addressed the following six research questions. With the first research question, we determined the feasibility and validity of our approach using simulations. As far as we were aware, Bayesian reinforcement learning models had not been used to investigate temporal changes in behavior. We therefore used simulations as a proof-of-concept assessment to show their sensitivity and ability to answer our questions. With the second research question, we derive mathematically specific predictions about the role of $\phi$ and $\lambda$ in the serial reversal learning experiment. With the other four questions, we analyzed the grackle data to determine how the association-updating rate and the sensitivity to learned associations reflect the variation and changes in behavioral flexibility in grackles.
**1) Are the Bayesian reinforcement learning models sufficiently sensitive to detect changes that occur across the limited number of serial reversals that individuals participated in?**\
We used agent-based simulations to answer this question, where simulated individuals made choices based on assigned $\phi$ and $\lambda$ values. We determined how to apply the Bayesian reinforcement learning models to recover the assigned values from the choices in each trial. Previous applications of the Bayesian reinforcement learning models always combined the full sample of observations, so it is not clear whether these models are sufficiently sensitive to detect the changes over time that we are interested in. Two problems arise when trying to infer the underlying processes from a limited number of trials. The stochasticity in which option an individual chooses based on a given set of associations introduces differences in the set of choices across trials even among individuals with the same $\phi$ and $\lambda$ values. On the flip-side, because of the probabilistic decisions, a given series of specific choices during a short number of trials can occur even if individuals have different $\phi$ and $\lambda$ values. We varied the number of trials we analyzed to determine how many trials per individual are necessary to recover the assigned $\phi$ and $\lambda$ values in light of this noise.
**2) Is a high rate of association-updating (**$\phi$) and a low sensitivity to learned associations ($\lambda$) best to reduce errors in the serial reversal learning experiment?\
We used analytical approaches to systematically vary $\phi$ and $\lambda$ to determine how the interaction of the two processes shapes the behavior of individuals throughout the serial reversal learning experiment. Previous studies made general predictions about the role of $\phi$ and $\lambda$ in different environments [@dunlap2009components; @Breen_2023]. We assessed here whether, under the specific conditions in the serial reversal experiments, where information is reliable and changes occur frequently, the best approach for individuals is to show high $\phi$ and low $\lambda$.
**3) Which of the two parameters** $\phi$ or $\lambda$ explains more of the variation in the serial reversal learning experiment performance of the tested grackles?\
Across both the trained (experienced serial reversals) and control (experienced a single reversal) grackles, we assessed whether variation in the number of trials an individual needed to reach the criterion in a given reversal is better explained by their inferred association-updating rate or by their inferred sensitivity to learned associations.
**4) Do the grackles who improved their performance through the serial reversal learning experiment show the predicted changes in** $\phi$ and $\lambda$?\
If individuals learn the contingencies of the serial reversal experiment, they should reduce their sensitivity to learned associations $\lambda$ to explore the alternative option when rewards change, and increase their association-updating rate $\phi$ to quickly exploit the new reliably rewarded option.
**5) Are some individuals better than others at adapting to the serial reversals?**\
In previous work, we found that there are individual differences that persist throughout the experiment, with individuals who required fewer trials to solve the initial reversal also requiring fewer trials in the final reversal after their training [@mccune2023flexmanippeerj]. We could expect that these individual differences are guided by consistency in how individuals solve the reversal learning paradigm, meaning they are reflected in individual consistency in $\phi$ and $\lambda$ that persist through the serial reversals. In addition, it is not clear whether some grackles change their behavior more than others. For example, it could be that individuals who have a higher association-updating rate $\phi$ at the beginning of the experiment might also be better able to quickly change their behavior to match the particular conditions of the serial reversal learning experiment. Therefore, we also analyzed whether the $\phi$ and $\lambda$ values of individuals at the beginning predict how much they changed throughout the serial reversal learning experiment.
**6) Can the** $\phi$ or $\lambda$ from the performance of the grackles during their final reversal predict variation in the performance on the multi-option puzzle boxes?\
Grackles would be expected to solve more options on the multi-option puzzle boxes if they quickly update their previously learned associations when a previous option becomes unavailable (high $\phi$). Given that, in the puzzle box experiment, individuals only receive a reward at any given option a few times, instead of repeatedly as in the reversal learning task, we predict that those individuals who are less sensitive to previously learned associations and instead continue to explore alternative options (low $\lambda$) can also gain more rewards.
# Materials and Methods
## Data
For question 1, we re-analyzed data we previously simulated for power analyses to estimate sample sizes for population comparisons [@logan2023xpoppcj]. In brief, we simulated choices in an initial association learning and single reversal experiment for a total of 640 individuals. The $\phi$ and $\lambda$ values for each individual were drawn from a distribution representing one of 32 populations, with different mean $\phi$ (8 different means) and mean $\lambda$ (4 different values) values for each population (32 populations is the combination of each $\phi$ and $\lambda$). We simulated 20 individuals in each of the 32 populations. The range for the $\phi$ and $\lambda$ values assigned to the artificial individuals in the simulations were based on the previous analysis of single reversal data from grackles in a different population (Santa Barbara, California, USA) [@blaisdell2021causal] to reflect the likely expected behavior. Based on their assigned $\phi$ and $\lambda$ values, each individual was simulated to pass first through the initial association learning phase and, after they reached criterion (chose the correct option 17 out of the last 20 times), the rewarded option switched and simulated individuals went through the reversal learning phase until they again reached criterion. Each choice that each individual made was simulated consecutively. Choices during trials were based on the associations that individuals formed between each option and the reward based on their experience. The first choice a simulated individual made in the initial association learning was random because we assumed individuals had no information about the rewards and therefore set the initial attractions to both options to be equally low. Based on their choices, individuals updated their internal associations with the two options based on their individual learning rate. We excluded simulated individuals from further analyses if they did not reach criterion either during the initial association or the reversal within 300 trials, the maximum that was also set for the experiments with the grackles. For each simulated individual, we recorded their assigned $\phi$ and $\lambda$ values, as well as the series of choices they made during the initial association and the first reversal. For a given $\phi$ and $\lambda$, the stochasticity in which option a simulated individual chooses based on their attractions, plus the experience of either receiving a reward or not during previous choices, can lead to differences in the actual choices individuals make. The aim was to see what sample is needed to correctly infer the assigned $\phi$ and $\lambda$ given the noise in the choice data. We also used the simulated data for question 3, to compare the influence of $\phi$ and $\lambda$ on the behavior of the simulated individuals with that of the grackles.
To address question 2, we used an analytical approach and did not analyze any data.
For the empirical questions 3-6, we re-analyzed data on the performance of grackles in serial reversal learning and multi-option puzzle box experiments [@logan2023flexmanippcj]. The data collection was based on our preregistration that received in principle acceptance at PCI Ecology [@coulon2023experiment]. All of the analyses reported here were not part of the original preregistration. The data we use here were published as part of the earlier article and are available at the Knowledge Network for Biocomplexity's data repository [@logan2023flexmanipdata].
In brief, grackles were caught in the wild in Tempe, Arizona, USA for individual identification (colored leg bands in unique combinations), and brought temporarily into aviaries for testing, before being released back to the wild. The first experiment individuals participated in in the aviaries was the reversal learning experiment, as described in the introduction. A total of 19 grackles participated in the serial reversal learning experiment, where they learned to associate a reward with one color before experiencing one reversal to learn that the other color was rewarded (initial rewarded option was counterbalanced and randomly assigned as either a dark gray or a light gray tube). The rewarded option was switched when grackles passed the criterion of choosing the rewarded option in 17 of the most recent 20 trials. This criterion was set based on earlier serial reversal learning studies, and is based on the chi-square test, which indicates that 17 out of 20 represents a significant association. With this criterion, individuals can be assumed to have learned the association between the cue and the reward rather than having randomly chosen one option more than the other [@logan2022manyindividuals]. A subset of 8 individuals were randomly assigned to the trained group and went through a series of reversals until they reached the criterion of having formed an association (17 out of 20 choices correct) in 50 trials or less in two consecutive reversals. The individuals in the trained group needed between 6-8 reversals to consistently reach this threshold, with the number of reversals not being linked to their performance at the beginning or at the end of the experiment. A subset of 11 grackles were part of the control group, who experienced only a single reversal, before participating in trials with two identically colored tubes (yellow) where both contained a reward. The number of yellow tube trials was set to the average number of trials it took a bird in the trained group to pass their serial reversals.
For question 6, we additionally used data from an experiment the grackles participated in after they had completed the reversal learning experiment. Both the control and trained individuals were provided access to two multi-option puzzle boxes, one made of wood and one made of plastic. The two boxes were designed with slight differences to explore how general their performance was. The wooden box was made from a natural log, thus was more representative of something the grackles might encounter in the wild. In addition, while both boxes had four possible ways (options) to access food, the four options on the wooden box were distinct compartments, each containing rewards, while the four options on the plastic box all led to the same reward. Grackles were tested sequentially on both boxes, in a counterbalanced order, where individuals could initially explore all options. After proficiency at an option was achieved (gaining food from this locus three times in a row), this option became non-functional by closing access to the option, and then the latency of the grackle to switch to attempting a different option was measured. If they again successfully solved another option, this second option was also made non-functional, and so on. The outcome measures for each individual on each box were the average latency it took to switch to a new option and the total number of options they successfully solved.
## The Bayesian reinforcement learning model
For both the simulated and the observed grackle data, we used the Bayesian reinforcement learning model to estimate for each individual their $\phi$ and $\lambda$ values based on the choices they made during the reversal learning experiments. The estimated $\phi$ and $\lambda$ values were then used as outcome and/or predictor variables in the statistical models built to assess questions 3-6. We used the version of the Bayesian model that was developed in @blaisdell2021causal and modified in @logan2023xpoppcj (see their Analysis Plan \> "Flexibility analysis" for model specifications and validation). This model uses data from every trial of reversal learning (rather than only using the total number of trials to pass criterion) and represents behavioral flexibility using two parameters: the association-updating rate ($\phi$) and the sensitivity to learned associations ($\lambda$). The model transforms the series of choices each grackle made based on two equations to estimate the most likely $\phi$ and $\lambda$ that generated the observed behavior.
Equation 1 (learning and $\phi$): $A_{b,o,t+1}$=(1−$\phi_b$)$A_{b,o,t}$+$\phi_b$ $\pi_{b,o,t}$.
Equation 1 estimates how the associations $A$, that individual $b$ forms between the two different options ($o$, option 1 or 2) and their expected rewards, change from one trial to the next (trial $t$+1) as a function of their previously formed associations $A_{b,o,t}$ (how preferable option $o$ is to grackle $b$ at trial $t$) and recently experienced payoff $\pi$ (in our case, $\pi$ = 1 when they chose the correct option and received a reward in a given trial, and 0 when they chose the unrewarded option). The parameter $\phi_b$ modifies how much individual $b$ updates its associations based on its most recent experience. The higher the value of $\phi_b$, the faster the individual updates its associations, paying more attention to recent experiences, whereas when $\phi_b$ is lower, a grackle’s associations reflect averages across many trials. Association scores thus reflect the accumulated learning history up to trial $t$. The association with the option that is not explored in a given trial remains unchanged. At the beginning of the experiment (trial $t$ equals 0), we assumed that individuals had the same low association between both options and rewards ($A_{b,1,0}$ = $A_{b,2,0}$ = 0.1).
Equation 2 (choice and $\lambda$): $P_{b,o,t}$=$\displaystyle \frac{exp(\lambda_b A_{b,o,t})}{\sum_{o = 1}^{2} exp(\lambda_b A_{b,o,t})}$.
Equation 2 is a normalized exponential (softmax) function to convert the learned associations of the two options with rewards into the probability, $P$, that an individual, $b$, chooses one of the two options, $o$, in the current trial, $t$. The parameter $λ_{b}$ represents the sensitivity of a given grackle, $b$, to how different its associations to the two options are. As $λ_{b}$ gets larger, choices become more deterministic and individuals consistently choose the option with the higher association even if associations are very similar. As $λ_{b}$ gets smaller, choices become more exploratory, with individuals choosing randomly between the two options independently of their learned associations if $λ_{b}$ is 0.
We implemented the Bayesian reinforcement learning model in the statistical language Stan [@stan2019stan], calling the model and analyzing its output in *R* (version `r getRversion()`) [@rcoreteam]. The model takes the full series of choices individuals make (which of the two options did they choose, which option was rewarded, did they make the correct choice) across all their trials to find the $\phi$ and $\lambda$ values that best fit these choices given the two equations. Which option individuals chose was estimated with a categorical distribution with the probability, $P$, as estimated from equation 2 for each of the two options (categories), before updating the associations using equation 1. The model was fit across all choices, with individual $\phi$ and $\lambda$ values estimated as varying effects. In the model, $\phi$ is estimated on the logit-scale to reflect that it is a proportion (can only take values between 0 and 1), and $\lambda$ is estimated on the log-scale to reflect that values have to be positive (there is no upper bound). The limitation that, with an estimation on the log-scale $\lambda$ can never be equal to 0, is not an issue because we only included individuals in the analyses who did not pick options at random. We set the priors for the logit of $\phi$ and the log of $\lambda$ to come from a normal distribution with a mean of zero and a standard deviation of one. We set the initial associations to both options for all individuals at the beginning of the experiment to 0.1 to indicate that they do not have an initial preference for either option but are likely to be somewhat curious about exploring the tubes because they underwent habituation and training with a differently colored tube (see below). For estimations at the end of each reversal, we set the association with the option that was rewarded before the reversal to 0.7 and to the option that was previously not rewarded to 0.1. Note that when applying equation 1 in the context of the reversal learning experiment, as is most commonly used, where there are only rewards (positive association) or no rewards (zero association) but no punishment (negative association), associations can never reach zero because they change proportionally.
For each estimation (simulated and observed grackle data), we ran four chains with 2000 samples each (half of which were warm up). We used functions in the package “posterior” [@posterior] to draw 4000 samples from the posterior (the default). We report the estimates for $\phi$ and $\lambda$ for each individual (simulated or observed grackle) as the mean from these samples from the posterior. For the subsequent analyses where the estimated $\phi$ and $\lambda$ values were response or predictor variables, we ran the analyses both with the single mean per individual as well as looping over the full 4000 samples from the posterior to reflect the uncertainty in the estimates. The analyses with the samples from the posterior provided the same estimates as the analyses with the single mean values, though with larger compatibility intervals because of the increased uncertainty. In the results, we report the estimates from the analyses with the mean values. The estimates with the samples from the posterior can be found in the code in the rmd file at the repository <https://github.com/corinalogan/grackles/blob/master/Files/Preregistrations/g_flexmanip2post.Rmd>. In analyses where $\phi$ and $\lambda$ are predictor variables, we standardized the values that went into each analysis (either the means, or the respective samples from the posterior) by subtracting the average from each value and dividing by the standard deviation. We did this to define the priors for the relationships on a more standard scale and to be able to more directly compare the respective influence of $\phi$ and $\lambda$ on the outcome variable.
## 1) Using simulations to determine whether the Bayesian serial reinforcement learning models have sufficient power to detect changes through the serial reversal learning experiment
We ran the Bayesian reinforcement learning model on the simulated data to understand the minimum number of choices per individual that would be necessary to recover the association-updating rate $\phi$ and the sensitivity to the learned associations $\lambda$ assigned to each individual.
To determine whether the Bayesian reinforcement learning model can accurately recover the simulated $\phi$ and $\lambda$ values from limited data, we applied the model first to only the choices from the initial association learning phase, next to only the choices from the first reversal learning phase, and finally from both phases combined. To estimate whether the Bayesian reinforcement learning model can recover the simulated $\phi$ and $\lambda$ values without bias from either the single or the combined phases, we correlated the estimated values with the values individuals were initially assigned:
$\phi_{b,1}$ or $\lambda_{b,1}$ \~ Normal($\mu_{b}$, $\sigma$),\
$\mu_{b}$ = $\alpha$ + $\beta$ $\times$ $\phi_{b,0}$ or $\lambda_{b,0}$,\
$\alpha$ \~ Normal(0,0.1),\
$\beta$ \~ Normal(1,1),\
$\sigma$ \~ Exponential(1),
where $\phi_{b,1}$ or $\lambda_{b,1}$, the values estimated for each bird, indexed by $b$, from the simulated behavior are assumed to come from a normal distribution with a mean that can vary for each bird, $\mu_{b}$, and overall variance, $\sigma$. The mean for each bird is constructed from an overall intercept, $\alpha$, and the change in expectation, the slope, $\beta$, depending on the values assigned to each bird at the beginning of the simulation ($\phi_{b,0}$ or $\lambda_{b,0}$). The combination of $\alpha$ close to 0 and of $\beta$ close to 1 would indicate that the estimated values matched the assigned values.
This, and all following statistical models, were implemented using functions of the package ‘rethinking’ [@rethinking2020] in *R* to call Stan and estimate the relationships. Following the social convention set in [@rethinking2020], we report the mean estimates and the 89% compatibility intervals from the posterior estimates from these models. For each model, we ran four chains with 10,000 iterations each (half of which were warm up). We checked that the number of effective samples was sufficiently high and evenly distributed across all estimated variables such that autocorrelation did not influence the estimates. We also confirmed that in all cases the Gelman-Rubin convergence diagnostic, $\hat{R}$, was 1.01 or smaller, indicating that the chains had converged on the final estimates [@gelman1995avoiding]. In all cases, we also plotted the model inferences onto the distribution of the raw data to confirm that the estimated predictions matched the observed patterns.
```{r question1_possibilityinferringphiandlambdasimulations, eval=F}
####
# Load previously simulated data from xpop
####
# There are two separate sets of simulations, with initial attractions at 0.1 and eight different phi and four different lambda combinations
simulatedreversaldata_attractionscores_1<-read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/gxpopbehaviorhabitat_SimulatedReversalData_Grackles_PhiLambda_Attraction02_Aug2021.csv"), header=T, sep=",", stringsAsFactors=F)
simulatedreversaldata_attractionscores_2<-read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/gxpopbehaviorhabitat_SimulatedReversalData_Grackles_PhiLambda_Attraction04_Aug2021.csv"), header=T, sep=",", stringsAsFactors=F)
# In both sets of simulations, populations with different phi and lambda values were counted from 1-16; for the second set we change this to 17-32
simulatedreversaldata_attractionscores_2$Site<-simulatedreversaldata_attractionscores_2$Site+16
# In both simulations, individuals were counted from 1-320; for the second set we change the ids to start at 321
simulatedreversaldata_attractionscores_2$Bird_ID<-simulatedreversaldata_attractionscores_2$Bird_ID+320
# We combine the two data sets for the further analyses
library(dplyr)
simulatedreversaldata_attractionscores<-bind_rows(simulatedreversaldata_attractionscores_1,simulatedreversaldata_attractionscores_2)
####
# In the simulations, trials were counted continuously for each bird. We now want to change this so that it restarts counting trials from 1 upward once a bird switches to reversal.
for (birds in 1:length(unique(simulatedreversaldata_attractionscores$Bird_ID))){
currentbird<-unique(simulatedreversaldata_attractionscores$Bird_ID)[birds]
maximuminitial<-max(simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID==currentbird & simulatedreversaldata_attractionscores$Reversal == "initial",]$Trial)
simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID==currentbird & simulatedreversaldata_attractionscores$Reversal == "reversal",]$Trial<-simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID==currentbird & simulatedreversaldata_attractionscores$Reversal == "reversal",]$Trial - maximuminitial
}
# We need to adjust the coding during the reversal learning so that "correct" now matches whether it is correct or not.
simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Choice==0,]$Choice<-2
# To use the model to estimate the phi and lambda parameters, we first need to change the column names to match these to the specifications in the model: change Bird_ID to id; change Reversal to Choice, change CorrectChoice to Correct, change Site to Expid
colnames(simulatedreversaldata_attractionscores)<-c("counter","id","Session","Trial","Reversal","Choice","Correct","Phi_mean","Lambda_mean","Site","Phi_sd","Lambda_sd","ThisBirdsPhi","ThisBirdsLambda","Attraction1","Attraction2")
# There are several simulated individuals who never reached the criterion during the initial learning phase. We need to remove these from the dataset
birdswithreversal<-as.data.frame(simulatedreversaldata_attractionscores %>% group_by(id) %>% summarise(experiments=length(unique(Reversal))))
birdswithreversal<-birdswithreversal[birdswithreversal$experiments==2,]
simulatedreversaldata_attractionscores<-simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$id %in% birdswithreversal$id,]
# Next, we need to change the ids of the birds to be continuous again so the STAN model will include them all
simulatedreversaldata_attractionscores$id<-as.integer(as.factor(simulatedreversaldata_attractionscores$id))
#### QUESTION 1: Power of Bayesian reinforcement learning model to detect short-term changes in the association-updating rate $\phi$ and the sensitivity to learned associations $\lambda$
# We first focus only on the performance in the reversal trials
simulatedreversaldata_attractionscores_reversalphase<-simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Reversal=="reversal",]
# Let's start with 30 individuals for comparison
firstreversal_simulated<-simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$id %in% c(20,40,60,80,100,120,140,160,180,200,220,240,260,300,320,340,360,380,400,420,440,460,480,500,520,540,560,580,600,620),]
firstreversal_simulated$id<-as.numeric(as.factor(firstreversal_simulated$id))
# We can now extract the relevant data from the first reversal for the STAN model to estimate phi and lambda at the beginning
datfirstsimulated <- as.list(firstreversal_simulated)
datfirstsimulated$N <- nrow(firstreversal_simulated)
datfirstsimulated$N_id <- length(unique(firstreversal_simulated$id))
# Next, we also look at the estimation of the phi and lambda values based on their performance in the initial association learning phase
# We focus only on the performance in the initial association trials
simulatedreversaldata_attractionscores_learningphase<-simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Reversal=="initial",]
# Let's start with 30 individuals for comparison
initiallearning_simulated<-simulatedreversaldata_attractionscores_learningphase[simulatedreversaldata_attractionscores_learningphase$id %in% c(20,40,60,80,100,120,140,160,180,200,220,240,260,300,320,340,360,380,400,420,440,460,480,500,520,540,560,580,600,620),]
initiallearning_simulated$id<-as.numeric(as.factor(initiallearning_simulated$id))
# We can now extract the relevant data from the first reversal for the STAN model to estimate phi and lambda at the beginning
datinitialsimulated <- as.list(initiallearning_simulated)
datinitialsimulated$N <- nrow(initiallearning_simulated)
datinitialsimulated$N_id <- length(unique(initiallearning_simulated$id))
# The STAN model is set up to have the inital attraction for each option set to 0.1, and that individuals only learn the reward of the option they chose in a given trial.
reinforcement_model_nonzeroattraction_alternativepriors <- "
data{
int N;
int N_id;
array[N] int id;
array[N] int Trial;
array[N] int Choice;
array[N] int Correct;
}
parameters{
real logit_phi;
real log_L;
// Varying effects clustered on individual
matrix[N_id,2] v_ID;
}
model{
matrix[N_id,2] A; // attraction matrix
logit_phi ~ normal(0,1);
log_L ~ normal(0,1);
// varying effects
to_vector(v_ID) ~ normal(0,1);
// initialize attraction scores
for ( i in 1:N_id ) {
A[i,1] = 0.1; A[i,2] = 0.1;
}
// loop over Choices
for ( i in 1:N ) {
vector[2] pay;
vector[2] p;
real L;
real phi;
// first, what is log-prob of observed choice
L = exp(log_L + v_ID[id[i],1]);
p = softmax(L*A[id[i],1:2]' );
Choice[i] ~ categorical( p );
// second, update attractions conditional on observed choice
phi = inv_logit(logit_phi + v_ID[id[i],2]);
pay[1:2] = rep_vector(0,2);
pay[ Choice[i] ] = Correct[i];
A[ id[i] , Choice[i] ] = ( (1-phi)*(A[ id[i] , Choice[i] ]) + phi*pay[Choice[i]]);
}//i
}
"
# We run this model for the data from only the first reversal
# Two options to call stan from R - option 1 calling stan directly
m_firstsimulated <- stan( model_code = reinforcement_model_nonzeroattraction_alternativepriors, data=datfirstsimulated ,iter = 5000, cores = 4, chains=4, control = list(adapt_delta=0.9, max_treedepth = 12))
sfirstsimulated <- extract.samples(m_firstsimulated)
firstreversal_simulatedlambda <- sapply(1 : datfirstsimulated$N_id, function(x) exp( mean(sfirstsimulated$log_L) + mean(sfirstsimulated$v_ID[ ,x, 1])))
firstreversal_simulatedphi <- sapply(1 : datfirstsimulated$N_id, function(x) inv_logit( mean(sfirstsimulated$logit_phi) + mean(sfirstsimulated$v_ID[ ,x, 2])))
# Two options to call stan from R - option 2 calling stan using cmdstanr
library(cmdstanr)
currentlocation<-getwd()
cmdstanlocation <- cmdstan_path()
setwd(cmdstanlocation)
# access the output file created by the model running the reinforcement model
write(reinforcement_model_nonzeroattraction_alternativepriors,file="myowntrial.stan")
file <- file.path(cmdstan_path(), "myowntrial.stan")
mod <- cmdstan_model(file)
options(mc.cores=4)
datfirstsimulated$Reversal<-as.numeric(as.factor(datfirstsimulated$Reversal))
# RUN the model
fit <- mod$sample(
data = datfirstsimulated,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500
)
# Extract relevant variables
outcome_firstsimulated<-data.frame(fit$summary())
rownames(outcome_firstsimulated)<-outcome_firstsimulated$variable
library(posterior)
library(rethinking)
drawsarray_firstsimulated<-fit$draws()
drawsdataframe_firstsimulated<-as_draws_df(drawsarray_firstsimulated)
drawsdataframe_firstsimulated<-data.frame(drawsdataframe_firstsimulated)
firstsimulated_lambda <- sapply(1 : datfirstsimulated$N_id, function(x) exp( mean(drawsdataframe_firstsimulated$log_L) + mean(drawsdataframe_firstsimulated[,x+3])))
firstsimulated_phi <- sapply(1 : datfirstsimulated$N_id, function(x) inv_logit( mean(drawsdataframe_firstsimulated$logit_phi) + mean(drawsdataframe_firstsimulated[,x+33])))
firstsimulated_lambda_individuals <- sapply(1 : datfirstsimulated$N_id, function(x) exp( (drawsdataframe_firstsimulated$log_L) + (drawsdataframe_firstsimulated[,x+3])))
firstsimulated_lambda_range<-sapply(1 : datfirstsimulated$N_id, function(x) HPDI(firstsimulated_lambda_individuals[,x],0.89))
firstsimulated_phi_individuals<-sapply(1 : datfirstsimulated$N_id, function(x) inv_logit( (drawsdataframe_firstsimulated$logit_phi) + (drawsdataframe_firstsimulated[,x+33])))
firstsimulated_phi_range<-sapply(1 : datfirstsimulated$N_id, function(x) HPDI(firstsimulated_phi_individuals[,x],0.89))
# Remove the stan command line file we created for this particular model from your computer
fn<-"myowntrial"
file.remove(fn)
# Reset your working directory to what it was before we ran the model
setwd(currentlocation)
# We now can get back the phi and lambda values 30 individuals were assigned at the beginning of the simulation
simulatedphis<-unique(simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$id %in% c(20,40,60,80,100,120,140,160,180,200,220,240,260,300,320,340,360,380,400,420,440,460,480,500,520,540,560,580,600,620),]$ThisBirdsPhi)
simulatedlambdas<-unique(simulatedreversaldata_attractionscores_reversalphase[simulatedreversaldata_attractionscores_reversalphase$id %in% c(20,40,60,80,100,120,140,160,180,200,220,240,260,300,320,340,360,380,400,420,440,460,480,500,520,540,560,580,600,620),]$ThisBirdsLambda)
# Some of the phi values estimated from the performance during the initial learning are estimated as higher than what the individuals had during the simulation.
plot(firstsimulated_phi~simulatedphis,xlim=c(0,0.08),ylim=c(0,0.08))
abline(a=0,b=1)
# In contrast, some of the lambda values estimated from the performance during the initial learning are estimated as lower than what the individuals had during the simulation
plot(firstsimulated_lambda~simulatedlambdas,xlim=c(0,10),ylim=c(0,10))
abline(a=0,b=1)
# The issue likely arises because the STAN model assumes that the phi and lambda values are correlated - whereas in the simulations they were allowed to vary independently from each other
plot(firstsimulated_phi~firstsimulated_lambda)
plot(simulatedphis~simulatedlambdas)
# In the simulation, we set some high lambda values and low phi values - because of the assumed correlation, the STAN model estimates higher phi values than simulated in cases when lambda was high, and lower lambda values than simulated when phi was low
plot(firstsimulated_phi[simulatedlambdas<5]~simulatedphis[simulatedlambdas<5],xlim=c(0,0.08),ylim=c(0,0.08))
points(firstsimulated_phi[simulatedlambdas>5]~simulatedphis[simulatedlambdas>5],xlim=c(0,0.08),ylim=c(0,0.08),col="red")
abline(a=0,b=1)
# In these estimations based on the performance during single setups (either just the initial learning or the first reversal learning) the model always estimates that lambda and phi are correlated. This likely reflects equifinality - individuals can achieve the same performance with a range of phis and lambdas, and the model will slide to the middle along the line for each individual:
plot(x="lambda",y="phi",xlim=c(0,10),ylim=c(0,0.1))
# Individuals who needed a long time to learn the association will be in the bottom left corner
abline(a=0.04,b=-0.01,lty=2)
abline(a=0.06,b=-0.01,lty=2)
abline(a=0.08,b=-0.01,lty=2)
# Individuals who needed a short time to learn the association will be in the top right corner
abline(a=0.10,b=-0.01,lty=2)
abline(a=0.12,b=-0.01,lty=2)
abline(a=0.14,b=-0.01,lty=2)
points(x=1,y=0.03,cex=2)
points(x=2,y=0.04,cex=2)
points(x=3,y=0.05,cex=2)
points(x=4,y=0.06,cex=2)
points(x=5,y=0.07,cex=2)
points(x=6,y=0.08,cex=2)
abline(a=0.02,b=0.01,col="red",lwd=1.5)
points(initiallearning_simulatedphi~initiallearning_simulatedlambda,pch=2)
# Maybe the model can better separate the lambda and phi values when combining data from multiple runs - in the case of the simulations that means combining the data from the initial learning with the data of the first reversal
# Let's start with the same 30 individuals for comparison
initialandreversal_simulated<-simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$id %in% c(20,40,60,80,100,120,140,160,180,200,220,240,260,300,320,340,360,380,400,420,440,460,480,500,520,540,560,580,600,620),]
initialandreversal_simulated$id<-as.numeric(as.factor(initialandreversal_simulated$id))
# We can now extract the relevant data from the first reversal for the STAN model to estimate phi and lambda at the beginning
datinitialandreversalsimulated <- as.list(initialandreversal_simulated)
datinitialandreversalsimulated$N <- nrow(initialandreversal_simulated)
datinitialandreversalsimulated$N_id <- length(unique(initialandreversal_simulated$id))
# Option 1: calling stan directly from R
m_initialandreversal <- stan( model_code = reinforcement_model_nonzeroattraction, data=datinitialandreversalsimulated ,iter = 5000, cores = 4, chains=4, control = list(adapt_delta=0.9, max_treedepth = 12))
sinitialandreversal <- extract.samples(m_initialandreversal)
initialandreversal_lambda <- sapply(1 : datinitialandreversalsimulated$N_id, function(x) exp( mean(sinitialandreversal$log_L) + mean(sinitialandreversal$v_ID[ ,x, 1])))
initialandreversal_phi <- sapply(1 : datinitialandreversalsimulated$N_id, function(x) inv_logit( mean(sinitialandreversal$logit_phi) + mean(sinitialandreversal$v_ID[ ,x, 2])))
plot(initialandreversal_phi~simulatedphis)
abline(a=0,b=1)
plot(initialandreversal_lambda~simulatedlambdas)
abline(a=0,b=1)
plot(initialandreversal_phi~initialandreversal_lambda)
# Option 1: calling stan through cmdstanr
# setup with cmdstanr
currentlocation<-getwd()
cmdstanlocation <- cmdstan_path()
setwd(cmdstanlocation)
datinitialandreversalsimulated$Reversal<-as.numeric(as.factor(datinitialandreversalsimulated$Reversal))
# access the output file created by the model running the reinforcement model / reinforcement_model_nonzeroattraction_alternativepriors
write(reinforcement_model_nonzeroattraction_alternativepriors,file="myowntrial.stan")
file <- file.path(cmdstan_path(), "myowntrial.stan")
mod <- cmdstan_model(file)
options(mc.cores=4)
# RUN the model
fit <- mod$sample(
data = datinitialandreversalsimulated,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500
)
#Extract relevant variables
drawsarray<-fit$draws()
drawsdataframe<-as_draws_df(drawsarray)
drawsdataframe<-data.frame(drawsdataframe)
initialandreversal_lambda <- sapply(1 : datinitialandreversalsimulated$N_id, function(x) exp( mean(drawsdataframe$log_L) + mean(drawsdataframe[,x+3])))
initialandreversal_phi <- sapply(1 : datinitialandreversalsimulated$N_id, function(x) inv_logit( mean(drawsdataframe$logit_phi) + mean(drawsdataframe[,x+33])))
initialandreversal_lambda_individuals <- sapply(1 : datfirstsimulated$N_id, function(x) exp( (drawsdataframe$log_L) + (drawsdataframe[,x+3])))
initialandreversal_lambda_range<-sapply(1 : datfirstsimulated$N_id, function(x) HPDI(initialandreversal_lambda_individuals[,x],0.89))
initialandreversal_phi_individuals<-sapply(1 : datfirstsimulated$N_id, function(x) inv_logit( (drawsdataframe$logit_phi) + (drawsdataframe[,x+33])))
initialandreversal_phi_range<-sapply(1 : datfirstsimulated$N_id, function(x) HPDI(initialandreversal_phi_individuals[,x],0.89))
# Remove the stan command line file we created for this particular model from your computer
fn<-"myowntrial"
file.remove(fn)
# Reset your working directory to what it was before we ran the model
setwd(currentlocation)
simulatedphi<-initialandreversal_simulated %>% group_by(id) %>% summarise(mean(Phi_mean))
simulatedphi<-as.data.frame(simulatedphi)
simulatedphis<-simulatedphi[,2]
library(rethinking)
dat_simulatedvsestimated_onlyone<- list(
simulatedphis = simulatedphis,
estimatedphis = firstsimulated_phi
)
model_simulatedvsestimated_onlyone <- ulam(
alist(
estimatedphis ~ normal( mu , sigma ),
mu <- a + b*simulatedphis,
a ~ normal(0,0.1),
b ~ normal(1,1),
sigma ~ dexp(1)
) , data=dat_simulatedvsestimated_onlyone , chains=4 , cores=4,iter=10000 ,cmdstan = T)
precis(model_simulatedvsestimated_onlyone,depth=2)
# mean sd 5.5% 94.5% n_eff Rhat4
#a 0.01 0.00 0.00 0.01 6711 1
#b 0.15 0.05 0.06 0.23 6526 1
#sigma 0.00 0.00 0.00 0.00 7926 1
dat_simulatedvsestimated_acrossreversal<- list(
simulatedphis = simulatedphis,
estimatedphis = initialandreversal_phi
)
model_simulatedvsestimated_acrossreversal <- ulam(
alist(
estimatedphis ~ normal( mu , sigma ),
mu <- a + b*simulatedphis,
a ~ normal(0,0.1),
b ~ normal(1,1),
sigma ~ dexp(1)
) , data=dat_simulatedvsestimated_acrossreversal , chains=4 , cores=4,iter=10000, cmdstan = T)
precis(model_simulatedvsestimated_acrossreversal,depth=2)
# mean sd 5.5% 94.5% n_eff Rhat4
#a 0.00 0.00 -0.01 0.01 6680 1
#b 0.96 0.16 0.70 1.21 6677 1
#sigma 0.01 0.00 0.01 0.01 8210 1
dat_simulatedvsestimated_onlyone_l<- list(
simulatedlambdas = simulatedlambdas,
estimatedlambdas = firstsimulated_lambda
)
model_simulatedvsestimated_onlyone_l <- ulam(
alist(
estimatedlambdas ~ normal( mu , sigma ),
mu <- a + b*simulatedlambdas,
a ~ normal(0,0.1),
b ~ normal(1,1),
sigma ~ dexp(1)
) , data=dat_simulatedvsestimated_onlyone_l , chains=4 , cores=4,iter=10000 ,cmdstan = T)
precis(model_simulatedvsestimated_onlyone_l,depth=2)
# mean sd 5.5% 94.5% n_eff Rhat4
#a 0.13 0.10 -0.03 0.29 1 19299.05
#b 6.04 0.11 5.86 6.22 1 19167.92
#sigma 18.04 0.50 17.27 18.86 1 19884.38
dat_simulatedvsestimated_acrossreversal_l<- list(
simulatedlambdas = simulatedlambdas,
estimatedlambdas = initialandreversal_lambda
)
model_simulatedvsestimated_acrossreversal_l <- ulam(
alist(
estimatedlambdas ~ normal( mu , sigma ),
mu <- a + b*simulatedlambdas,
a ~ normal(0,0.1),
b ~ normal(1,1),
sigma ~ dexp(1)
) , data=dat_simulatedvsestimated_acrossreversal_l , chains=4 , cores=4,iter=10000, cmdstan = T)
precis(model_simulatedvsestimated_acrossreversal_l,depth=2)
# mean sd 5.5% 94.5% n_eff Rhat4
#a 0.01 0.10 -0.15 0.16 15584 1
#b 0.98 0.04 0.92 1.05 14582 1
#sigma 1.20 0.16 0.97 1.47 16119 1
dat_simulatedvsestimated_onlyone_both<- list(
estimatedlambdas = firstsimulated_lambda,
estimatedphis = firstsimulated_phi
)
model_simulatedvsestimated_onlyone_both <- ulam(
alist(
estimatedlambdas ~ normal( mu , sigma ),
mu <- a + b*estimatedphis,
a ~ normal(0,10),
b ~ normal(0,250),
sigma ~ dexp(1)
) , data=dat_simulatedvsestimated_onlyone_both , chains=4 , cores=4,iter=10000 ,cmdstan = T)
precis(model_simulatedvsestimated_onlyone_both,depth=2)
# mean sd 5.5% 94.5% n_eff Rhat4
#a -0.81 0.39 -1.42 -0.19 6193 1
#b 504.19 42.73 435.31 570.69 6066 1
#sigma 0.61 0.09 0.49 0.76 7213 1
####
#### Plot for Figure 2
plot(firstsimulated_phi~simulatedphis,xlim=c(0,0.06),ylim=c(0,0.06),bty="n",cex=3,pch=18,col="#00DADF",ann=F)
points(initialandreversal_phi~simulatedphis,col="#007477",pch=16,cex=2)
abline(a=0,b=1,lty=2)
legend(x="topleft", legend=c(pch16="Initial plus Reversal",pch18="Single Reversal"), pch=c(16,18), col=c("#007477","#00DADF"), box.lty=0, cex=1.2,pt.cex=1.4)
mtext("Assigned association-updating rate Φ",side=1,at=0.03,line=3,cex=1.5)
mtext("Estimated association-updating rate Φ",side=2,at=0.03,line=2.5,cex=1.5)
mtext("1:1 line",side=1,at=0.055,line=-20)
for(i in 1:30) {
lines(y=c(initialandreversal_phi_range[1,i],initialandreversal_phi_range[2,i]),x=c(simulatedphis[i],simulatedphis[i]),col=col.alpha(acol="#007477",alpha=0.4))
lines(y=c(firstsimulated_phi_range[1,i],firstsimulated_phi_range[2,i]),x=c(simulatedphis[i],simulatedphis[i]),col=col.alpha(acol="#00DADF",alpha=0.4))
}
par(mfrow=c(1,2))
# jitter simulated phi values to make it easier to see the 89% compatibility ranges
jsimulatedphis<-jitter(simulatedphis)
# Alternative Plot for Figure 2 based on full simulated data
plot(firstsimulated_phi~jsimulatedphis,xlim=c(0,0.05),ylim=c(0,0.12),bty="n",cex=3,pch=18,col="#31688EFF",ann=F)
points(initialandreversal_phi~jsimulatedphis,col="#35B779FF",pch=16,cex=2)
abline(a=0,b=1,lty=2)
legend(x="topleft", legend=c(pch16="Initial plus Reversal",pch18="Single Reversal"), pch=c(16,18), col=c("#35B779FF","#31688EFF"), box.lty=0, cex=1.2,pt.cex=1.4)
mtext("Assigned association-updating rate Φ",side=1,at=0.02,line=3,cex=1.5)
mtext("Estimated association-updating rate Φ",side=2,at=0.06,line=2.5,cex=1.5)
mtext("1:1 line",side=1,at=0.05,line=-10)
mtext("a)", side=1,line=-15,font=2,at=-0.004,cex=1.7)
for(i in 1:626) {
lines(y=c(initialandreversal_phi_range[1,i],initialandreversal_phi_range[2,i]),x=c(jsimulatedphis[i],jsimulatedphis[i]),col=col.alpha(acol="#35B779FF",alpha=0.4))
lines(y=c(firstsimulated_phi_range[1,i],firstsimulated_phi_range[2,i]),x=c(jsimulatedphis[i],jsimulatedphis[i]),col=col.alpha(acol="#31688EFF",alpha=0.4))
}
plot(firstsimulated_lambda~simulatedlambdas,xlim=c(0,20),ylim=c(0,120),bty="n",cex=3,pch=18,col="#31688EFF",ann=F)
points(initialandreversal_lambda~simulatedlambdas,col="#35B779FF",pch=16,cex=2)
abline(a=0,b=1,lty=2)
legend(x="topleft", legend=c(pch16="Initial plus Reversal",pch18="Single Reversal"), pch=c(16,18), col=c("#35B779FF","#31688EFF"), box.lty=0, cex=1.2,pt.cex=1.4)
mtext("Assigned sensitivity λ",side=1,at=10,line=3,cex=1.5)
mtext("Estimated sensitivity λ",side=2,at=60,line=2.5,cex=1.5)
mtext("1:1 line",side=1,at=20,line=-6)
mtext("b)", side=1,line=-15,font=2,at=-1.5,cex=1.7)
for(i in 1:626) {
lines(y=c(initialandreversal_lambda_range[1,i],initialandreversal_lambda_range[2,i]),x=c(simulatedlambdas[i],simulatedlambdas[i]),col=col.alpha(acol="#35B779FF",alpha=0.4),lwd=2)
}
```
## 2) Using mathematical derivations to determine whether variation in $\phi$ or $\lambda$ has a stronger influence on the number of trials individuals might need to reach criterion in serial reversal learning experiments
We mathematically derived predictions about the choice behavior of individuals using equations 1-3. We determined the values for $\phi$ and $\lambda$ that individuals would need to reach the passing criterion in 50 trials or fewer in the serial reversal learning experiment. To derive the learning curves for individuals with different $\phi$ and $\lambda$, we incorporated the dynamic aspect of change over time by inserting the probabilities of choosing either the rewarded or the non-rewarded option from trial $t$ as the likelihood for the changes in associations at trial $t$+1.
Equation 3a (dynamic association for the rewarded option):
$A_{r,t+1}$ = ((1-$\phi$) $\times$ $A_{r,t}$ + $\phi$ $\times$ $\pi$) $\times$ $P_{t}$ + (1-$P_{t}$) $\times$ $A_{r,t}$.
Equation 3b (dynamic association for the non-rewarded option):
$A_{n,t+1}$ = (1-$P_{t}$) $\times$ (1-$\phi$) $\times$ $A_{n,t}$ + $P_{t}$ + (1-$P_{t}$) $\times$ $A_{n,t}$.
In equations 3a and 3b, the association with both the rewarded, $A_{r}$, and the non-rewarded, $A_{n}$, options change from trial $t$ to trial $t$+1 depending on the association updating rate $\phi$ and the probability, $P$, that the association was chosen during trial $t$. The probability, $P$, is calculated using equation 2. The reward $\pi$ is set to 1. We used these equations to explore which combinations of $\phi$ and $\lambda$ would lead to an individual choosing the rewarded option above the passing criterion in 50 trials or less after a reversal in the rewarded option. We assumed serial reversals, and therefore set the initial associations after the reversal to 0.1 for the now rewarded option (previously unrewarded, so low association) and to 0.7 for the now unrewarded option (previously rewarded, so high association). We obtained these associations from the end of the reversal learning simulation in question 1. For a given combination of $\phi$ and $\lambda$, we first used equation 2 to calculate the probability that an individual would choose the rewarded option during this first trial after the reversal (where the remaining probability reflects the individual choosing the non-rewarded option). We then used equations 3a and 3b to update the associations. We repeated the calculations of the probabilities and the updates of the associations 50 times to determine whether individuals with a given combination of $\phi$ and $\lambda$ would reach the passing criterion within either 50 (the serial reversal passing criterion) or 40 trials (the average observed among the trained grackles). For $\phi$ ranging between 0.02 and 0.10, we manually explored which $\lambda$ would be needed such that an individual would choose the rewarded option with more than 50% probability at trial 31 (or 21) and with more than 85% probability at trial 50 (or 40), to match the passing criterion of 17 correct out of the last 20 trials (17/20=0.85).
```{r question2_phiversuslambdasimulateddata, eval=F}
####
# Load previously simulated data from xpop
####
# There are two separate sets of simulations, with initial attractions at 0.1 and eight different phi and four different lambda combinations
simulatedreversaldata_attractionscores_1<-read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/gxpopbehaviorhabitat_SimulatedReversalData_Grackles_PhiLambda_Attraction02_Aug2021.csv"), header=T, sep=",", stringsAsFactors=F)
simulatedreversaldata_attractionscores_2<-read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/gxpopbehaviorhabitat_SimulatedReversalData_Grackles_PhiLambda_Attraction04_Aug2021.csv"), header=T, sep=",", stringsAsFactors=F)
# In both sets of simulations, populations with different phi and lambda values were counted from 1-16; for the second set we change this to 17-32
simulatedreversaldata_attractionscores_2$Site<-simulatedreversaldata_attractionscores_2$Site+16
# In both simulations, individuals were counted from 1-320; for the second set we change the ids to start at 321
simulatedreversaldata_attractionscores_2$Bird_ID<-simulatedreversaldata_attractionscores_2$Bird_ID+320
# We combine the two data sets for the further analyses
library(dplyr)
simulatedreversaldata_attractionscores<-bind_rows(simulatedreversaldata_attractionscores_1,simulatedreversaldata_attractionscores_2)
# In the simulation, trials were counted continuously. We change it so that the count of trials starts at 1 again when birds switch to the reversal
for (birds in 1:length(unique(simulatedreversaldata_attractionscores$Bird_ID))){
currentbird<-unique(simulatedreversaldata_attractionscores$Bird_ID)[birds]
maximuminitial<-max(simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID==currentbird & simulatedreversaldata_attractionscores$Reversal == "initial",]$Trial)
simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID==currentbird & simulatedreversaldata_attractionscores$Reversal == "reversal",]$Trial<-simulatedreversaldata_attractionscores[simulatedreversaldata_attractionscores$Bird_ID==currentbird & simulatedreversaldata_attractionscores$Reversal == "reversal",]$Trial - maximuminitial
}
####
# Are phi or lambda more important to reach criterion in 50 or fewer trials?
colnames(simulatedreversaldata_attractionscores)<-c("counter","id","Session","Trial","Reversal","Choice","Correct","Phi_mean","Lambda_mean","Site","Phi_sd","Lambda_sd","ThisBirdsPhi","ThisBirdsLambda","Attraction1","Attraction2")
summarysimulateddata<-matrix(nrow=length(unique(simulatedreversaldata_attractionscores$id)),ncol=5)
summarysimulateddata<-as.data.frame(summarysimulateddata)
colnames(summarysimulateddata)<-c("id","ThisBirdsPhi","ThisBirdsLambda","TrialsInitial","TrialsReversal")
summarysimulateddata$id<-unique(simulatedreversaldata_attractionscores$id)
for (i in 1:nrow(summarysimulateddata)){
summarysimulateddata[i,]$TrialsInitial<-max(filter(simulatedreversaldata_attractionscores,id==unique(simulatedreversaldata_attractionscores$id)[i],Reversal=="initial")$Trial)
}
for (i in 1:nrow(summarysimulateddata)){
summarysimulateddata[i,]$TrialsReversal<-max(filter(simulatedreversaldata_attractionscores,id==unique(simulatedreversaldata_attractionscores$id)[i],Reversal=="reversal")$Trial)
}
for (i in 1:nrow(summarysimulateddata)){
summarysimulateddata[i,]$ThisBirdsPhi<-max(filter(simulatedreversaldata_attractionscores,id==unique(simulatedreversaldata_attractionscores$id)[i])$ThisBirdsPhi)
}
for (i in 1:nrow(summarysimulateddata)){
summarysimulateddata[i,]$ThisBirdsLambda<-max(filter(simulatedreversaldata_attractionscores,id==unique(simulatedreversaldata_attractionscores$id)[i])$ThisBirdsLambda)
}
# Remove individuals who did not proceed to initial association
summarysimulateddata<-summarysimulateddata[summarysimulateddata$TrialsReversal != -Inf,]
plot(summarysimulateddata$TrialsReversal~summarysimulateddata$ThisBirdsPhi)
plot(summarysimulateddata$TrialsReversal~summarysimulateddata$ThisBirdsLambda)
library(rethinking)
library(cmdstanr)
dat_trialsphiandlambda<- list(
Trials = (summarysimulateddata$TrialsReversal),
bird = c(as.numeric(as.factor(summarysimulateddata$id))),
phi = standardize(c(summarysimulateddata$ThisBirdsPhi)),
lambda = standardize(c(summarysimulateddata$ThisBirdsLambda))
)
trials.phiandlambda <- ulam(
alist(
Trials ~ dpois(poislambda ),
log(poislambda) <- a + b*phi+c*lambda,
a ~ normal(5,1),
b ~ normal(0,1),
c ~ normal(0,1)
) , data=dat_trialsphiandlambda , chains=4 , cores=4,iter=10000,cmdstan=T )
precis(trials.phiandlambda,depth=2)
# mean sd 5.5% 94.5% n_eff Rhat4
# a 4.49 0 4.48 4.49 17433 1
# b -0.23 0 -0.24 -0.23 18136 1
# c -0.17 0 -0.18 -0.16 18856 1
#Plot to assess fit of the model:
mu_1<- link( trials.phiandlambda, data=data.frame(phi=seq(from=-2,to=2,by=0.25),lambda=rep(0,17) ) )
mu_1_mean <- apply( mu_1 , 2 , mean )
mu_1_ci <- apply( mu_1 , 2 , PI , prob=0.97 )
plot(dat_trialsphiandlambda$Trials~dat_trialsphiandlambda$phi) # plot the predicted relationship between the chance to marry a cousin on the y-axis and the birth year of the individuals on the x-axis
shade(mu_1_ci,seq(from=-2,to=2,by=0.25), col=col.alpha("blue",0.4))
# Repeat analyses with unstandardized predictors to compare estimated slopes with those of the grackles
summarysimulateddata_ll<-summarysimulateddata[summarysimulateddata$ThisBirdsLambda>2,]
summarysimulateddata_ll<-summarysimulateddata_ll[summarysimulateddata_ll$ThisBirdsPhi>0.01,]
summarysimulateddata_ll<-summarysimulateddata_ll[summarysimulateddata_ll$TrialsReversal <300,]
dat_trialsphiandlambda<- list(
Trials = (summarysimulateddata_ll$TrialsReversal),
bird = c(as.numeric(as.factor(summarysimulateddata_ll$id))),
phi = (c(summarysimulateddata_ll$ThisBirdsPhi)),
lambda = (c(summarysimulateddata_ll$ThisBirdsLambda))
)
trials.phiandlambda <- ulam(
alist(
Trials ~ dpois(poislambda ),
log(poislambda) <- a + b*phi+c*lambda,
a ~ normal(5,1),
b ~ normal(0,25),
c ~ normal(0,1)
) , data=dat_trialsphiandlambda , chains=4 , cores=4,iter=10000,cmdstan=T )
precis(trials.phiandlambda,depth=2)
# mean sd 5.5% 94.5% n_eff Rhat4
# a 5.31 0.01 5.29 5.33 5471 1
# b -21.21 0.39 -21.84 -20.58 5904 1
# c -0.05 0.00 -0.06 -0.05 7955 1
mu_1<- link( trials.phiandlambda, data=data.frame(phi=seq(from=0.01,to=0.11,length=17),lambda=rep(4,17) ) )
mu_1_mean <- apply( mu_1 , 2 , mean )
mu_1_ci_simulated <- apply( mu_1 , 2 , PI , prob=0.97 )
summarysimulateddata_forplotting<-matrix(ncol=3,nrow=2*nrow(summarysimulateddata))
summarysimulateddata_forplotting<-as.data.frame(summarysimulateddata_forplotting)
colnames(summarysimulateddata_forplotting)<-c("TrialsReversal","Predictor","Value")
summarysimulateddata_forplotting$TrialsReversal<-c(summarysimulateddata$TrialsReversal,summarysimulateddata$TrialsReversal)
summarysimulateddata_forplotting$Predictor<-c(rep("phi",nrow(summarysimulateddata)),rep("lambda",nrow(summarysimulateddata)))
summarysimulateddata_forplotting$Value<-c(standardize(summarysimulateddata$ThisBirdsPhi),standardize(summarysimulateddata$ThisBirdsLambda))
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>181,]$TrialsReversal<-8
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>151,]$TrialsReversal<-7
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>131,]$TrialsReversal<-6
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>111,]$TrialsReversal<-5
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>91,]$TrialsReversal<-4
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>71,]$TrialsReversal<-3
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>51,]$TrialsReversal<-2
summarysimulateddata_forplotting[summarysimulateddata_forplotting$TrialsReversal>31,]$TrialsReversal<-1
summarysimulateddata_forplotting$TrialsReversal<-as.factor(summarysimulateddata_forplotting$TrialsReversal)
library(ggplot2)
library(ggtext)
####
#### Plot for earlier figure, no longer included
summarysimulateddata_forplotting$color<-NA
summarysimulateddata_forplotting[summarysimulateddata_forplotting$Predictor=="phi",]$color<-"#007477"
summarysimulateddata_forplotting[summarysimulateddata_forplotting$Predictor=="lambda",]$color<-"indianred1"
summarysimulateddata_forplotting_alt<-summarysimulateddata_forplotting[order(summarysimulateddata_forplotting$Predictor),]
summarysimulateddata_forplotting_alt$TrialsReversal<-as.integer(summarysimulateddata_forplotting_alt$TrialsReversal)
ggplot(summarysimulateddata_forplotting_alt, aes(x=TrialsReversal, y=Value, fill=Predictor)) +
geom_jitter(colour=summarysimulateddata_forplotting_alt$color,width=0.2,size=2)+xlab("Trials simulated individuals needed to reach criterion")+ ylab("<span style='font-size:20pt;font-weight:bold'>Assigned <b style='color:#007477'>Φ</b> and <b style='color:#FF6A6A'>λ</b> (standardized) </span>")+
theme_classic()+scale_x_continuous(name="Number of trials to reach criterion",breaks=1:8, labels=c("31-50","51-70","71-90","91-110","111-130","131-150","151-180","181-220"))+theme(axis.text.x = element_text(size = 14, colour ="black",hjust = 0.5,angle = 0)) +
theme(axis.title.x = element_text(size = 18, colour ="black", face = "bold",
hjust = 0.5,
vjust = -0.5,
angle = 0))+
theme(axis.text.y = element_text(size = 14, colour ="black",
hjust = 0.5,
angle = 0)) +
theme(axis.title.y =element_markdown() )+theme(legend.position = "none") +geom_smooth(col="grey")
# With high phi values, lambda can be smaller because birds will after few trials have such large differences in their learned association that they do not require to be very sensitive to those differences.
# Example: assume that in first 20 trials birds choose randomly either the rewarded or the non-rewarded option, so they experience each 10 times. We check how much their associations start from initially similar values. Next, based on these differences in the two associations, we estimate the lambda with which individuals will choose the rewarded option in 85% of the next trials (17 out of 20 = 85%). Across the range of phis 0.02 through 0.10, as phi increases, lambda decreases exponentially because birds change both their association to the rewarded and the non-rewarded option more quickly. For example, with a phi of 0.05, after 20 trials the attraction to the rewarded option will be 0.46, while the attraction to the non-rewarded option will be 0.06. With a lambda of 4.4, individuals will choose the rewarded option with a probability of
phi<-0.04
lambda<-6.5
initial_attraction_rewarded<-0.05
initial_attraction_nonrewarded<-0.7
trial20_attraction_rewarded<-NA
trial20_attraction_nonrewarded<-NA
for(i in 1:20){
ifelse(i==1,trial20_attraction_rewarded<-(1-phi)*initial_attraction_rewarded+phi*1, trial20_attraction_rewarded<-(1-phi)*trial20_attraction_rewarded+phi*1)
ifelse(i==1,trial20_attraction_nonrewarded<-(1-phi)*initial_attraction_nonrewarded+phi*0, trial20_attraction_nonrewarded<-(1-phi)*trial20_attraction_nonrewarded+phi*0)
}
probability_choose_rewarded<-exp(trial20_attraction_rewarded*lambda)/( exp(trial20_attraction_rewarded*lambda)+exp(trial20_attraction_nonrewarded*lambda))
probability_choose_rewarded
# We use the equations below to assess which combination of phi and lambda mean an indiviudal will reach the passing criterion 40 trials after a reversal. We assume that they will reach the passing criterion if the probability to choose the rewarded option is >0.5 at trial 21 and >0.85 at trial 40
phi<-0.04 # does not work with phi of 0.03
lambda<-6.3 # 4 vs 3
attraction_rewarded<-NA
attraction_nonrewarded<-NA
probability_rewarded<-NA
attraction_rewarded[1]<-0.1 # adjust depending on phi
attraction_nonrewarded[1]<-0.7 # adjust depending on phi
probability_rewarded[1]<-exp(attraction_rewarded*lambda)/(exp(attraction_rewarded*lambda)+exp(attraction_nonrewarded*lambda))
for(i in 1:50){
attraction_rewarded[i+1]<-((1-phi)*attraction_rewarded[i]+phi*1)*probability_rewarded[i]+(1-probability_rewarded[i])*attraction_rewarded[i]
attraction_nonrewarded[i+1]<-(1-probability_rewarded[i])*(1-phi)*attraction_nonrewarded[i]+attraction_nonrewarded[i]*probability_rewarded[i]
probability_rewarded[i+1]<-exp(attraction_rewarded[i]*lambda)/(exp(attraction_rewarded[i]*lambda)+exp(attraction_nonrewarded[i]*lambda))
}
probability_rewarded
####
### Plot for Figure 3
library(ggplot2)
# 40 trials
phis_85percent<-c(0.04,0.05,0.06,0.07,0.08,0.09,0.10)
lambda_85percent<-c(7.5,4.4,3.4,2.9,2.6,2.4,2.3)
lambdamax_85percent<-c(7.8,8.5,9.1,9.5,9.7,9.8,9.9)
# 50 trials
phis_85percent_50<-c(0.04,0.05,0.06,0.07,0.08,0.09,0.10)
lambda_85percent_50<-c(6.3,3.5,2.7,2.4,2.2,2.1,2.1)
lambdamax_85percent_50<-c(7.9,9.3,9.6,9.8,9.9,9.9,10)
d<-matrix(data=c(phis_85percent,lambda_85percent,lambdamax_85percent),nrow=7,ncol=3)
d<-as.data.frame(d)
colnames(d)<-c("Association-updating rate Φ","Sensitivity to learned associations λ","Maximum Sensitivity to learned associations λ")
spline_int <- as.data.frame(spline(d$`Association-updating rate Φ`, d$`Sensitivity to learned associations λ`))
spline_int_max <- as.data.frame(spline(d$`Association-updating rate Φ`, d$`Maximum Sensitivity to learned associations λ`))
spline_int$y2<-spline_int_max$y
d_50<-matrix(data=c(phis_85percent_50,lambda_85percent_50,lambdamax_85percent_50),nrow=7,ncol=3)
d_50<-as.data.frame(d_50)
colnames(d_50)<-c("Association-updating rate Φ","Sensitivity to learned associations λ","Maximum Sensitivity to learned associations λ")
spline_int_50 <- as.data.frame(spline(d_50$`Association-updating rate Φ`, d_50$`Sensitivity to learned associations λ`))
spline_int_max_50 <- as.data.frame(spline(d_50$`Association-updating rate Φ`, d_50$`Maximum Sensitivity to learned associations λ`))
spline_int_50$y2<-spline_int_max_50$y
eachbirdslearningparameters<-read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/g_flexmanip_ArizonaBirds_EstimatedPhiLambdaReversalLearning.csv"), header=T, sep=",", stringsAsFactors=F)
trainedgrackles<-eachbirdslearningparameters[is.na(eachbirdslearningparameters$manipulatedlambda)==F,]
ggplot(d) +
geom_ribbon(data = spline_int,aes(x = x,
ymin = y,
ymax = y2),
fill = "gray30",
alpha = 0.4)+
geom_ribbon(data = spline_int_50,aes(x = x,
ymin = y,
ymax = y2),
fill = "gray50",
alpha = 0.4)+theme_classic() +theme(axis.text.x = element_text(size = 14, colour ="black",
hjust = 0.5,
angle = 0)) +
theme(axis.title.x = element_text(size = 18, colour ="black", face = "bold",
hjust = 0.5,
vjust = -0.5,
angle = 0))+
theme(axis.text.y = element_text(size = 14, colour ="black",
hjust = 0.5,
angle = 0)) +
theme(axis.title.y = element_text(size = 18, colour ="black", face = "bold",
hjust = 0.5,
angle = 90),axis.line.x = element_line(colour = 'black', linewidth = 1, linetype='solid'),
axis.line.y = element_line(colour = 'black', linewidth = 1, linetype='solid'),panel.background = element_rect(fill = "white",
colour = "white",
size = 0.5, linetype = "solid")) + theme(legend.title = element_text(size = 13))+
ylim(0.4,10)+xlim(0.004,0.10)+
annotate("text", x=0.031, y=1.5, label= "Unlikely to reach criterion in 50 trials",size=5)+
annotate("text", x=0.072, y=10, label= "Likely to reach criterion in 50 trials",size=5)+
annotate("text", x=0.072, y=7, label= "Likely to reach criterion in 40 trials",size=5)+
geom_point(data=trainedgrackles,aes(x=beginningphi, y=beginninglambda),shape=21, fill="#FDE725FF",colour="black",size=6)+
geom_point(data=trainedgrackles,aes(x=manipulatedphi, y=manipulatedlambda), colour="#440154FF",size=6)+
annotate("text", x=0.02, y=3.6, label= "Grackles first",size=5,colour="black",fontface="bold")+
annotate("text", x=0.085, y=3.6, label= "Grackles trained",size=5,colour="#440154FF",fontface="bold")+
labs(x="Association-updating rate Φ",y="Sensitivity to learned associations λ")
```
## 3) Estimating $\phi$ and $\lambda$ from the observed reversal learning performances of grackles to determine which has more influence on variation in how many trials individuals needed to reach the passing criterion
We fit the Bayesian reinforcement learning model to the data of both the control and the trained grackles. Based on the simulation results indicating that the minimum sample per individual required for accurate estimation are two learning phases across one reversal (see below), we fit the model first to only the choices from the initial association learning phase and the first reversal learning phase for both control and trained individuals. For the control grackles, these estimated $\phi$ and $\lambda$ values also reflected their behavioral flexibility at the end of the reversal learning experiment. For the trained grackles, we additionally calculated $\phi$ and $\lambda$ separately for their final two reversals at the end of the serial reversal to infer the potential changes in the parameters.
We determined how the $\phi$ and $\lambda$ values influenced the number of trials individuals needed during a reversal by building a regression model to determine which of the two parameters had a more direct influence on the number of trials individuals needed to reach the passing criterion. We fit this model to the data from the simulated individuals, as well as to the data from the grackles. We assumed that the number of trials followed a Poisson distribution because the number of trials to reach criterion is a count that is bounded at smaller numbers (individuals need at least 20 trials to reach the criterion) with a log-linear link because we expect there are diminishing influences of further increases in $\phi$ or $\lambda$. The model is as follows:
$v_{b}$ \~ Poisson($\mu$),\
log $\mu$ = $\alpha$ + $\beta_{1}$ $\times$ $\phi_{b}$ + $\beta_{2}$ $\times$ $\lambda_{b}$,\
$\alpha$ \~ Normal(4.5,1),\
$\beta_{1}$ \~ Normal(0,1),\
$\beta_{2}$ \~ Normal(0,1),
where the number of trials each individual needed during their reversal, $v_{b}$, was linked with separate slopes, $\beta_{1}$ and $\beta_{2}$, to both the $\phi$ and $\lambda$ of each individual. The mean of the prior distribution for the intercept, $\alpha$, was based on the average number of trials (90) grackles in Santa Barbara were observed to need to reach the criterion during their one reversal (mean of 4.5 is equal to logarithm of 90, standard deviation set to 1 to constrain the estimate to the range observed across individuals). The priors for the relationships $\beta_{1}$ and $\beta_{2}$ with $\phi$ and $\lambda$ were centered on zero, indicating that, *a priori*, we did not bias these toward a relationship.
```{r question3_estimatephiandlambdagrackles, eval=F}
### Code below copied from Blaisdell et al. 2021