/
44. Why do implicit and explicit attitude tests diverge? The role of structural fit.txt
1858 lines (1641 loc) · 98.8 KB
/
44. Why do implicit and explicit attitude tests diverge? The role of structural fit.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Journal of Personality and Social Psychology
2008, Vol. 94, No. 1, 16 –31
Copyright 2008 by the American Psychological Association
0022-3514/08/$12.00 DOI: 10.1037/0022-3514.94.1.16
Why Do Implicit and Explicit Attitude Tests Diverge? The Role of
Structural Fit
B. Keith Payne
Melissa A. Burkley
University of North Carolina at Chapel Hill
Oklahoma State University
Mark B. Stokes
University of North Carolina at Chapel Hill
Implicit and explicit attitude tests are often weakly correlated, leading some theorists to conclude that
implicit and explicit cognition are independent. Popular implicit and explicit tests, however, differ in
many ways beyond implicit and explicit cognition. The authors examined in 4 studies whether correlations between implicit and explicit tests were influenced by the similarity in task demands (i.e., structural
fit) and, hence, the processes engaged by each test. Using an affect misattribution procedure, they
systematically varied the structural fit of implicit and explicit tests of racial attitudes. As test formats
became more similar, the implicit– explicit correlation increased until it became higher than in most
previous research. When tests differ in structure, they may underestimate the relationship between
implicit and explicit cognition. The authors propose a solution that uses procedures to maximize
structural fit.
Keywords: implicit, attitude, automatic, measurement, prejudice
attitudes (Fazio & Olson, 2003; Hofmann, Gawronski, Gschwendner, Le, & Schmitt, 2005).
But why do implicit and explicit measures diverge? One view is
that the two kinds of measures reflect separate attitude representations (Devine, 1989; Wilson, Lindsey, & Schooler, 2000). By
this account, people hold multiple attitudes toward a topic at the
same time. When attitudes change, a new attitude is layered on top
of older attitudes. When people introspect they report the most
contemporary attitudes, but the ruins of older layers can be unearthed by probing deeper, using implicit tests.
A different view is that a lack of correlation between measures
does not turn up separate attitudes at all. Instead, the two kinds of
measures allow people to edit their responses to different degrees.
From this point of view, measuring implicit responses is less like
an archeological dig and more like fishing in a river. Implicit tests
tap attitudes upstream, but explicit tests catch what flows downstream, muddied in the editing for public report (Fazio, Jackson,
Dunton, & Williams, 1995; Nier, 2005).
Both perspectives assume that the chief reason for the implicit–
explicit divide can be found in the distinction between “implicitness” and “explicitness.” Either implicit measures tap something
unconscious and explicit measures tap something conscious, or
implicit measures tap automatic responses and explicit measures
tap intentionally edited responses. It may seem obvious that the
principal difference between implicit and explicit tests is that one
is implicit and the other is explicit. To see what other possibilities
exist, it helps to shift perspectives and ask how implicit and
explicit tests differ beyond implicit and explicit cognition. We
propose that, independent of differences in underlying cognitive
processes, when implicit and explicit tests have radically different
structures, they will correlate with each other only weakly. But
Implicit tests have been compared with such revolutionary inventions as the telescope and the microscope. The hope is that
implicit tests, too, can make clear what is invisible to the naked
eye. In many studies, implicit tests have created images of attitudes
and beliefs that look very different from those reported on questionnaires. This kind of divergence is especially common for tests
of racial attitudes and stereotypes. But what exactly does it mean
when a person reports one attitude yet scores differently on an
implicit test of race bias? The answer to that question is controversial, but it is important. It will shape not only theories of
attitudes and stereotypes, but also the way that men and women
taking the tests understand their own minds (see Arkes & Tetlock,
2004, and commentaries; Blanton & Jaccard, 2006, and commentaries).
In a typical study of this sort, a sample of research volunteers is
compared on two tests of racial attitudes. One test is explicit,
asking them to report their attitudes on a questionnaire. The other
test is implicit. Rather than asking for self-report, it uses performance on another task to reveal attitudes. Readers familiar with
implicit social-cognition research over the past decade will have
no trouble predicting that in this kind of study, the two measures
will likely diverge, capturing two very different snapshots of racial
B. Keith Payne and Mark B. Stokes, Department of Psychology, University of North Carolina at Chapel Hill; Melissa A. Burkley, Department
of Psychology, Oklahoma State University.
This research was supported by the National Science Foundation Grant
0615478 to B. Keith Payne.
Correspondence concerning this article should be addressed to B. Keith
Payne, Department of Psychology, University of North Carolina at Chapel
Hill, Campus Box 3270, Chapel Hill, NC 27599. E-mail: Payne@unc.edu
16
WHY DO IMPLICIT AND EXPLICIT TESTS DIVERGE?
when they have similar structures, they will show much greater
agreement.
If this analysis is correct, then it has two important implications
for the study of implicit social cognition. First, it would call into
question whether the many null or weak implicit– explicit correlations that have been reported should be interpreted as evidence
that underlying implicit and explicit cognitions are independent.
Second, to draw conclusions about implicit and explicit thought
processes from divergence between implicit and explicit tests, one
must first equate the tests on extraneous differences while systematically varying the differences of interest. Our aim is not, therefore, to test whether the archeological metaphor or the river metaphor is correct. Our aim is, instead, to show that poor structural
fit creates a stumbling block for investigating such theories with
implicit tests and to propose a way around this obstacle.
Structural Fit Between Implicit and Explicit Tests
By the structure of a test, we mean the parts that make it up and
how they work together to measure attitudes. Most explicit attitude
tests share several structural elements. The items are usually verbal
statements. For example, an item on the Likert-style Attitudes
Toward Blacks Scale (ATB) reads, “Racial integration (of schools,
businesses, residences, etc.) has benefited both Blacks and Whites”
(Brigham, 1993, p. 1942). In a semantic differential, participants
might be asked to rate racial groups on traits such as “pleasant,”
“aggressive,” and “friendly.” And on a feeling thermometer, participants might be presented with several racial groups and asked
to rate their feelings toward each group from very cold and
unfavorable to very warm and favorable. In each of these cases,
participants read a verbal phrase or sentence. They must either
retrieve a previously stored attitude from memory or construct a
new evaluation in the moment. Finally, they must decide how to
best express their response to each statement on a numerical scale.
If explicit attitude measures involve considering statements,
evaluating one’s response, and formulating it on a scale, implicit
measures avoid all of these. Although the structures of implicit
tests differ from one procedure to another, certain commonalities
are clear. Complex propositions are replaced with simple words or
pictures. In the implicit association test, for example, words or
pictures denote the target items to be evaluated (Greenwald,
McGhee, & Schwartz, 1998). Participants are asked to classify that
item using four categories mapped onto only two overlapping
response keys (e.g., White or good, Black or bad, White or bad,
Black or good). In evaluative priming (Fazio et al., 1995), a prime
word or picture is flashed briefly before a target word or picture.
The target item is then evaluated as “good” or “bad.” Similar items
are presented for many other kinds of tasks (e.g., De Houwer,
2003; Wittenbrink, Judd, & Park, 1997).
When presented with a word or a picture, implicit test takers are
asked to simply evaluate it, categorize it, or decide whether it is a
word. This does not typically require the formulation of any
opinion, as there is usually a correct answer (e.g., death is bad). In
response-latency tasks, which make up the bulk of implicit measurement, the content of the response is irrelevant (incorrect answers are typically excluded). The measure of interest is the time
it takes to register a response.
The list of differences described here between implicit and
explicit tests is not intended to be exhaustive; these examples
17
merely highlight some key ways that implicit and explicit tasks
differ. They include the stimuli presented (e.g., propositions vs.
simple words or pictures), the level of abstractness of the judgments to be made (e.g., broad social opinions vs. concrete classifications), and the metric in which responses are measured (e.g.,
numerical scales vs. response latencies). It is an important fact that
none of these differences is inherently related to consciousness or
unconsciousness, automaticity or voluntary control. Instead, they
are incidental properties that are confounded with the implicit–
explicit distinction as it has been instantiated in popular methods.
Are these methodological differences important for understanding implicit– explicit correlations? The issue calls to mind earlier
debates about the predictive value of attitudes in general. Faced
with many failures to detect relationships between attitudes and
behaviors (Wicker, 1969), attitude theorists uncovered a number of
moderating factors that determined when attitudes and behaviors
are likely to be related and when they are not. One key factor was
conceptual correspondence (Ajzen & Fishbein, 1977).
When attitudes and behaviors are measured at the same level of
abstractness and with the same degree of specificity, they are said
to be conceptually correspondent. Under these conditions, attitudes
and behavior tend to be related. Attitudes toward good health, for
example, are not strongly related to how often a person jogs, but
attitudes toward jogging are more likely to be related. A recent
review supports the notion that implicit and explicit attitude measures are more likely to be related when they are conceptually
correspondent (Hofmann et al., 2005). Of course, conceptual correspondence is only one aspect of the structural differences we
describe between implicit and explicit tests. Differences such as
reaction times versus Likert scales are important parts of a test’s
structure but are unrelated to conceptual correspondence. For a
more inclusive and accurate description of our purposes, we refer
to the degree of methodological similarity between different tests
as structural fit.
The attitude– behavior relationship is not the only field of study
in which issues of test structure have proved important. Early
studies of implicit memory compared implicit and explicit memory
tasks that differed in many ways. For example, recall and recognition tasks measured explicit memory. In contrast, implicit memory was measured with a range of tasks, including word-fragment
completion, word identification, and lexical decision (Jacoby &
Dallas, 1981; Tulving, Schacter, & Stark, 1982; Warrington &
Weiskrantz, 1974). Researchers using this approach found many
variables that selectively influenced one kind of test but not the
other. These dissociations, however, only begged more questions
about their underlying reasons. Given all the structural differences
between, for example, a recall test and a lexical-decision test, it
was not clear whether a dissociation reflected implicit versus
explicit forms of memory or other differences in the operations
that each task requires (Roediger, 1990). Ambiguities in how to
interpret implicit tests led Schacter, Bowers, and Booker (1989) to
propose a principle they called the retrieval intentionality criterion.
The retrieval intentionality criterion says that to isolate implicit
and explicit forms of memory in a way that is empirically verifiable, implicit and explicit memory should be measured in a way
that holds everything about the memory tests constant except the
intention to remember. The intention to remember is then manipulated. For example, rather than comparing cued recall to lexical
18
PAYNE, BURKLEY, AND STOKES
decision, a study should present the same word-fragment cues for
both implicit and explicit tests. In the implicit test, instructions
should require participants to complete them in a way that does not
refer back to a studied event (e.g., “complete the fragment with the
first word completion that comes to mind”). In the explicit test,
participants should complete the same items under instructions to
remember the previous event (e.g., “complete the fragment with a
word that you studied”).
When some variable affects one kind of test but not the other,
the dissociation provides evidence for a selective effect on intentional versus unintentional uses of memory. This approach links
the operational definition of implicit memory to its conceptual
definition, because implicit memory is defined as an effect of past
experience that does not require the intent to remember. Jacoby
(1991) later provided a more conservative definition in which
participants in the implicit test condition are told to complete the
fragment with a word that they did not study. In this case, participants would produce a studied word only if it came to mind but
was not consciously remembered. This exclusion instruction defines implicit memory as an effect of past experience that influences performance despite a conscious intention to the contrary.
The retrieval intentionality criterion is based on a fundamental
principle of experimental design: Isolating a particular variable
requires that all other variables be held constant. Doing otherwise
allows a confound in the design. Although the retrieval intentionality criterion soon became a gold standard for implicit memory
research, studies of implicit attitudes have not followed the same
route. Research on the attitude– behavior link and on implicit
memory have both shown how important it can be to hold extraneous factors constant. The studies reported here explore how the
relationships between implicit and explicit attitude tests change
when the test structures are matched.
Overview of the Present Research
To test whether structural fit influenced implicit– explicit correlations, we manipulated how well these features were equated.
We reasoned that if extraneous structural differences led to underestimated correlations, then the more closely tests were equated on
these features, the stronger the correlations would be. It is difficult
to use response-latency methods with this approach, because the
structure of the test is essential to the function of the test. It would
be impossible to overcome problems such as comparing response
latencies to Likert scales without changing the nature of the tasks.
Several authors have observed, in various ways, that methodological differences might reduce the relationships between reactiontime tests and self-report tests (e.g., Hofmann et al., 2005;
Kawakami & Dovidio, 2001; Wittenbrink et al., 1997). But the
question is far from settled, because there has generally been no
alternative available. As a result, there has been no way to gauge
the effect of these differences and no way to know what the
relationships would be in their absence. In this article, we gauge
how much of an effect structural fit may have, and we propose an
approach for greatly reducing extraneous differences.
An important step in understanding implicit– explicit correlations is the finding that a latent-variable approach can greatly
increase the implicit– explicit correlation by removing measurement error (Cunningham, Preacher, & Banaji, 2001). The question
addressed in the present research, however, is different. Latent-
variable analyses can estimate what relationships would be in the
absence of random error, but the method differences on which we
focus represent systematic method differences that cannot be removed with a latent-variable approach. Instead, multiple measures
are needed to compare the systematic influence of different methods (Bagozzi & Yi, 1991; Campbell & Fiske, 1959; Podsakoff,
MacKenzie, Lee, & Podsakoff, 2003).
To solve this problem, we took advantage of the recently developed
affect misattribution procedure (AMP; Payne, Cheng, Govorun, &
Stewart, 2005), because it does not rely on response latencies. Instead,
it produces an implicit measure of attitudes in which the response
metric is an evaluation. The AMP is an approach to implicit measurement that depends on evaluation of ambiguous items. When an
ambiguous object, such as a Chinese pictograph, is preceded with a
pleasant or unpleasant picture, the picture alters impressions of the
pictograph (Murphy & Zajonc, 1993). People tend to misattribute
their affective reaction from the prime picture to the target pictograph.
As a result, participants asked to rate the pleasantness of the pictograph tend to rate it as more pleasant following a smiling face as
compared with a frowning face. The measure of interest is not
reaction time but the pictograph’s rated pleasantness.
Participants showed strong misattribution effects even when directly warned to avoid any influence from the prime photos (Payne,
Cheng, et al., 2005). A key aspect of the AMP is that participants are
warned specifically that the prime photos may bias their evaluations
of pictographs, and they are instructed that their task is to avoid any
influence from the photos. Providing such a warning sets intentional
response strategies in opposition to the automatic influence of the
primes (an exclusion instruction; Jacoby, 1991). If participants respond as intended, they will evaluate the pictographs without influence from the primes. They will judge the pictographs on the basis of
the primes only to the extent that the prime activates some evaluation
and they are unable to control that influence on their judgments. By
arranging the task in this way, any misattributions that persist despite
the intended task requirements provide evidence of automatic responses to the primes.
In a series of validation studies, misattributions provided valid
estimates of attitudes (Payne, Cheng, et al., 2005). The AMP is
notable in that it shows high reliability, and in those conditions in
which high implicit– explicit correlations were theoretically expected,
high correlations have been found. Moreover, in conditions in which
implicit and explicit tests were expected to diverge, the AMP showed
clear dissociations. These properties make the procedure well-suited
for studying relationships between implicit and explicit evaluations.
Using an implicit test with a metric that is an evaluation provides a
basis for equating many structural differences.
In Studies 1 and 2, we showed that the implicit– explicit correlation increased as structural fit increased. In Study 3, we used a
multitrait, multimethod approach to rule out the possibility that
high correlations produced by structural fit were artificially inflated by common method variance. Finally, in Study 4, we found
that implicit and explicit tests with high structural fit still showed
theoretically predicted dissociations, ruling out the idea that high
structural fit renders the tests redundant.
Study 1
Our goal in the first study was to compare performance on
implicit and explicit tests that varied in structural fit. We expected
WHY DO IMPLICIT AND EXPLICIT TESTS DIVERGE?
19
that the measures with the greatest structural fit would show the
highest implicit– explicit correlations. We surveyed participants
using two commonly used explicit measures, the Modern Racism
Scale (MRS; McConahay, 1983) and the ATB (Brigham, 1993).
We also administered the AMP to measure implicit racial evaluations. The primes for the task consisted of photos of White and
Black persons’ faces. Participants were shown face primes, followed by Chinese pictographs that they were asked to rate for
pleasantness. Unlike the method used in previous studies, participants made their ratings on a continuous 4-point scale from very
unpleasant to very pleasant. This change provided a basis for
structural fit with the explicit measures, as will be described in
more detail below. As in previous research using the AMP, participants were instructed to evaluate the Chinese pictograph and to
avoid being influenced by the primes.
tions in this way is that the tasks are equated on many extraneous
factors. In the indirect task, participants intend not to display any
evaluation of the prime objects. In the direct task, they intend to
express evaluations of those same objects. The tasks are equated
on the stimuli presented and the type of judgment to be made. They
differ only in intent.
By arranging tasks in this way, one can compare the self-report
questionnaires with indirect AMP evaluations, which differ both in
“implicitness” and in structure. The questionnaires can also be
compared with direct AMP evaluations, which differ in structure
but are both explicit measures. Finally, we can compare indirect
and direct AMP evaluations, which differ in implicitness but are
structurally matched. The structural-fit hypothesis predicts that
measures with the most similar structures will show the greatest
correlations.
Equating Implicit and Explicit Test Structures
Method
The design described so far is similar to many previous studies
assessing implicit– explicit correlations. Although the response
metric for the implicit and explicit tasks is the same, the implicit
and explicit tasks differ in the kinds of stimuli presented and the
processes in which participants must engage. That is, in the questionnaires, participants were asked to endorse or reject complex
verbal propositions about racial groups. In the AMP, participants
were asked to evaluate the pleasantness of Chinese pictographs
after being primed with the faces of White and Black individuals.
To equate these features, a second version of the AMP was
included. In this version, participants were shown the same prime–
target sequences as in the original AMP. However, rather than
being told to avoid influence from the prime and evaluate the
pictograph, participants were told to avoid the pictograph and
evaluate the prime. Figure 1 illustrates the procedure.
We use indirect evaluation to refer to the original version of the
AMP in which participants evaluate pictographs, because this task
provides an indirect measure of reactions toward the primes. In
contrast, we use direct evaluation to refer to the alternative version, in which participants directly express their evaluations of the
primes. The advantage of comparing indirect and direct evalua-
Figure 1.
Participants
Participants were seventy-five undergraduates (62 women and
13 men) who participated for partial course credit. They ranged in
age from 17–21 years (M ⫽ 18.46, SD ⫽ 0.72). Ethnic groups
included 72% White, 17% African American, 2.5% Asian, 7%
Hispanic, and 1.5% Native American.
Procedure
Participants were seated at a computer and asked to complete
several measures. Of interest were indirect AMP evaluations,
direct AMP evaluations, and racial-attitude questionnaires. Indirect and direct evaluations were completed in a counterbalanced
order, followed by the questionnaires. Participants next provided
demographic information and were debriefed.
AMP
Indirect evaluations. For the indirect rating trials, participants
were presented with one of three kinds of primes: a Black face, a
Schematic illustration of affect misattribution procedure (AMP) with indirect and direct ratings.
20
PAYNE, BURKLEY, AND STOKES
White face, or a gray square that served as a neutral prime. The
face primes were 12 Black men and 12 White men. The pictures
showed only the model’s face, with a neutral facial expression.
Based on pilot testing, the Black and White photos were matched
on attractiveness and were selected to be highly prototypical of
their respective racial category.
The prime appeared in the center of the screen for 100 ms,
followed by a blank screen for 100 ms and then a Chinese pictograph for 100 ms (see Figure 1). Following the pictograph, a
patterned mask of black and white “noise” appeared. At the bottom
of the screen was a 4-point rating scale that included ⫺2 (very
unpleasant), ⫺1 (slightly unpleasant), ⫹1 (slightly pleasant),
and ⫹ 2 (very pleasant). After participants provided their evaluation of the pictograph, the next trial began. A total of 72 randomly
ordered trials were presented, with 24 neutral, 24 Black, and 24
White primes paired with 72 unique Chinese pictographs. For each
participant, the computer paired a pictograph with a prime in a new
random order.
Participants were told that the task was about making judgments
while avoiding distraction. They were instructed to rate the pleasantness of the Chinese pictographs using the rating scale. Participants were warned to not let their rating of the pictographs be
influenced by the preceding photo. This warning was included to
ensure that AMP responses represented the effect of the prime,
despite participants’ attempts at correction, thereby serving as an
indication of the automatic influence of prime-invoked attitudes
(Payne, Cheng, et al., 2005). The instructions read as follows:
For this round of judgments you should rate the Chinese characters.
Please note that sometimes the photos flashed prior to the characters
can influence people’s ratings of the Chinese characters. Please try
your best not to be influenced by the photographs. Instead, please give
us an honest judgment of how pleasant or unpleasant is your
reaction to each Chinese character. Of course, there are no right or
wrong answers. Just report your “gut reaction.” [Emphasis was in the
original.]
Direct evaluations. The direct rating procedure was identical
to the indirect rating procedure with three exceptions. The first and
most important difference was that participants were instructed to
rate their evaluations of the prime photographs and to avoid being
influenced by the Chinese pictographs. Because the pictographs
were ambiguous and randomly paired with the primes, they could
not actually exert any systematic influence on ratings of the
primes. The instructions read as follows:
For this round of judgments you should rate the photos of people.
Please note that sometimes the Chinese characters flashed after the
photos can influence people’s ratings of the photos. Please try your
best not to be influenced by the characters. Instead, please give us an
honest judgment of how pleasant or unpleasant is your reaction to
each person’s photo. Of course, there are no right or wrong answers.
Just report your “gut reaction.”
The second difference was that no neutral primes were included,
because direct evaluations of a gray square would be uninformative. The third difference was that only 24 trials were included, one
trial for each unique prime photo. In the direct rating blocks, each
prime photo was rated only once for the same reason that each item
is only presented once on questionnaires. Because participants
were directly expressing their attitudes toward the attitude objects,
there was little need for repetitive judgments of the same items.
Self-Report Attitude Measures
Two self-report measures of racial attitudes were used: the MRS
(McConahay, 1983) and the ATB (Brigham, 1993). The MRS is a
7-item assessment of anti-Black attitudes and includes items such
as “Over the past few years, the government and news media have
shown more respect to blacks then they deserve.” Responses were
made on a 9-point scale ranging from 1 (strongly disagree) to 9
(strongly agree). The ATB is a 20-item assessment that includes
items such as “Black and White people are inherently equal” and
“It is likely that Blacks will bring violence to neighborhoods when
they move in.” Responses were made on a 9-point scale ranging
from 1 (strongly disagree) to 9 (strongly agree).
Results
The key questions concerned whether the correlations between
implicit and explicit measures depended on structural fit between
measures. But before examining those correlations, we report
mean performance on the indirect and direct evaluations. The order
of tests did not produce any main effects or interactions and so will
not be discussed in the following analyses.
Indirect Evaluations
Pleasantness ratings were averaged for Black primes, White
primes, and Neutral primes. Before analysis, we recoded the responses from a ⫺2 to ⫹2 scale to a 1 to 4 scale to simplify analysis
and presentation. We analyzed ratings using a repeated-measures
analysis of variance (ANOVA). This analysis showed a significant
effect of prime on pleasantness ratings, F(2, 148) ⫽ 9.30, p ⬍ .01.
Ratings were highest for neutral primes (M ⫽ 2.81, SD ⫽ 0.40),
followed by White primes, (M ⫽ 2.70, SD ⫽ 0.31), and they were
lowest for Black primes (M ⫽ 2.58, SD ⫽ 0.40). The contrast
between Black and White primes was significant, F(1, 74) ⫽ 5.21,
p ⬍ .05, as was each of the contrasts between neutral primes and
both Black and White primes, Fs ⬎ 4.57, ps ⬍ .05. These analyses
show more positive evaluations of the White primes than the Black
primes on the indirect test.
Direct Evaluations
We analyzed pleasantness ratings as above to compare direct
evaluations of the Black and White faces. White faces were evaluated significantly more positively (M ⫽ 2.67, SD ⫽ 0.35) than
Black faces (M ⫽ 2.34, SD ⫽ 0.40), F(1, 74) ⫽ 25.44, p ⬍ .01.
Both direct and indirect measures showed similar preferences for
White faces over Black faces at the mean level. But the main
questions concerned how these evaluations related to each other
and to the questionnaire measures of racial attitudes.
Individual Differences
Scoring. We computed a single score for each person’s indirect evaluations by taking the difference between ratings on Whiteprime trials and ratings on Black-prime trials. Direct evaluations
were scored in the same way. Higher scores reflected greater
preference for White faces relative to Black faces. We scored the
ATB and MRS scales by taking the mean of responses after
reverse coding, where appropriate. The mean score for the ATB
WHY DO IMPLICIT AND EXPLICIT TESTS DIVERGE?
was 7.00 (SD ⫽ 1.22). The mean score for the MRS was 2.63
(SD ⫽ 1.25). Higher scores on the ATB reflect more positive
attitudes toward Black people, whereas higher scores on the MRS
reflect more negative attitudes toward Black people. For the purpose of the following analyses, the ATB scale was reverse-scored
so that all measures were scored in the same direction, with higher
scores reflecting more negative attitudes toward Black people.
We calculated reliability for indirect ratings by taking a difference score between pleasantness ratings on each White-prime trial
and the rating on a randomly paired Black-prime trial. This produced 24 difference scores that were treated as items in a reliability
analysis (for a fuller description, see Payne, Cheng, et al., 2005).
Reliability for direct ratings was computed in the same way.
Reliability was acceptable for all measures (indirect ␣ ⫽ .69;
direct ␣ ⫽ .71; ATB ␣ ⫽ .80; MRS ␣ ⫽ .90).
Correlations. We hypothesized that over and above the
implicit– explicit distinction, the attitude tests with greatest structural fit would show the strongest correlations. The two tests with
the highest degree of fit were the two explicit questionnaires (ATB
and MRS). These questionnaires were matched in both structural
fit (i.e., they asked similar kinds of questions) and in the fact that
both were explicit measures. Table 1 displays the correlations
among all tests. As expected, the questionnaires were strongly
correlated with each other. In contrast, the ATB and MRS were
only weakly, though significantly, correlated with indirect evaluations. These findings are consistent with much previous research
that has showed small implicit– explicit correlations for racial
attitudes. The traditional way to interpret this finding is that
implicit and explicit tests reflected different attitudes or different
processes. That is, the reason for implicit– explicit divergence is
said to lie in the difference between implicitness and explicitness.
However, the questionnaires differed from the indirect ratings not
only in implicitness, but also in structure (e.g., pictures vs. verbal
statements, exemplars vs. groups, simple pleasantness judgments
vs. opinions on broad policies, etc.). Because they shared neither
implicit– explicitness nor structural features, the low correlations
may result from either kind of difference.
The traditional interpretation would predict that direct evaluations should be highly correlated with the ATB and MRS because
they are all explicit measures. However, as shown in Table 1,
direct ratings were no more strongly correlated with these questionnaires than the indirect ratings were. We suggest these low
Table 1
Correlations Among Self-Report Scales of Racial Attitudes and
Direct and Indirect AMP Ratings, Study 1
Measure
ATB
MRS
Direct AMP
ATB
—
MRS
***
.68
(.54–.79)
—
Direct AMP
*
.25
(.02–.46)
.26*
(.04–.46)
—
Indirect AMP
.25*
(.02–.46)
.24*
(.01–.45)
.64***
(.49–.77)
Note. In parentheses are 95% confidence intervals for each correlation.
AMP ⫽ affect misattribution procedure; ATB ⫽ Attitudes Toward Blacks
Scale; MRS ⫽ Modern Racism Scale.
*
p ⬍ .05. *** p ⬍ .001.
21
correlations reflect the lack of structural fit between the direct
ratings and the questionnaires. If this hypothesis is correct, then the
direct and indirect tests should be strongly correlated, because
despite differing in implicitness, they are equated in structure. In
fact, the correlation between direct and indirect tests (r ⫽ .64) was
nearly as large as the correlation between the ATB and MRS scales
(r ⫽ .68), which were equated on both dimensions. These two
correlations were not significantly different from each other. They
were both, however, significantly greater than the other four correlations (all ps ⬍ .05). This pattern of correlations supports the
proposal that structural fit is an important factor influencing the
size of the correlation between implicit and explicit tests.
Discussion
Some aspects of these results replicated commonly observed
findings. The two explicit questionnaires were highly correlated
with each other. They were also quite weakly correlated with an
indirect test of race attitudes. Both of these facts can be explained
by two accounts. The traditional account is that the two questionnaires were strongly related because both tests explicitly asked
participants to report their attitudes. In contrast, the questionnaires
were not strongly related to the indirect test because the questionnaires are explicit measures, whereas the indirect test is an implicit
measure.
The structural-fit account can also explain these findings. By
that account, the questionnaires were strongly related to each other
because they were well-matched on measurement features. For
example, they both asked participants to endorse or reject propositions about Black people as a group in American society. In
contrast, the questionnaires were weakly related to the indirect test
because they differed in measurement features. Unlike the questionnaires, the indirect test asked participants to make simple
feeling-based judgments of pleasantness. And the primes were
faces of individuals, not social-group labels. So both a traditional
account based on implicitness and a structural-fit account can
explain the strong correlation between questionnaires and the weak
correlation between the questionnaires and the indirect rating.
We must account for two other cells in our design. First is the
correlation between the questionnaires and the direct test. If the
implicit– explicit distinction were the only factor at work, then we
would expect this correlation to be high because both are explicit
tests. However, these weak correlations are more consistent with a
structural-fit explanation. Although these tests are all explicit
measures of race attitudes, they differ in their measurement features in much the same ways that the questionnaires differ from the
indirect test. That is, rating feelings toward pictures of individuals
is quite different from expressing attitudes toward social policies
regarding racial groups. Finally, the strong correlation between
indirect and direct tests presents a puzzle for any account based
only on the implicit– explicit distinction. This correlation is, however, consistent with the structural-fit account. These results suggest that when implicit and explicit tests are equated on extraneous
measurement features, they may be much more highly correlated
than previously thought.
Some Notes on Scaling
Despite using the same response scale, indirect and direct AMP
ratings tend to differ in extremity. One reason is that on indirect
PAYNE, BURKLEY, AND STOKES
22
ratings, participants rate Chinese pictographs, which are selected
to be fairly neutral. As a result, indirect ratings will tend toward
neutral values at the mean level. Participants are also instructed to
avoid any influence of the primes. Although participants cannot do
so completely (Payne, Cheng, et al., 2005), partial success will
tend to reduce the magnitude of differences between Black- and
White-prime trials. These features make the task a conservative
test of racial bias and also increase confidence that any race bias
that persists is beyond voluntary control. At the same time, these
considerations mean that a 1.5 indirect rating (of a pictograph)
does not necessarily mean the same as a 1.5 direct rating (of a
human face). For this reason, it would be inappropriate to directly
compare the mean levels across the two tasks. The value of making
the task structures more matched is not found in simply comparing
raw numbers across tasks. Instead, the value is in encouraging
similar cognitive processes on the two tasks, except for the systematic differences of interest. Individual-difference correlations
are more informative for testing our hypotheses than are the
absolute mean levels. (The two tasks can, of course, be standardized, which removes any meaningful mean difference and does not
change the correlations). The correlations observed here confirm
that the individual differences were systematic and interpretable.
Method
Participants
Participants were forty-eight undergraduates (28 women and 20
men) who participated for partial course credit. They ranged in age
from 17–21 years (M ⫽ 18.71, SD ⫽ 0.87). Ethnic groups included
77% Caucasian, 15% African American, 6% Asian, and 2% Hispanic.
Design and Procedure
Participants completed the AMP in four blocks under the same
instructions as described in Study 1. The four blocks (indirect/
pictures, indirect/group labels, direct/pictures, direct/group labels)
were counterbalanced for order (no order effects emerged). On
indirect blocks, participants were warned that the primes might
influence their judgments and that they should try their best to
avoid any such influence. On the direct blocks, they were warned
that the pictographs might influence their judgments and that they
should try their best to avoid that influence. After the AMP
procedure, participants completed self-report measures of racial
attitudes, provided demographic information, and were debriefed.
Broadening the Conclusion
AMP
In Study 1, we took a first step toward exploring the importance
of structural correspondence. But there are some idiosyncratic
aspects of the measures that need to be broadened. For example, all
of the explicit measures were verbal, whereas the implicit measure
was based on pictures. This difference was chosen as a way to
manipulate structural correspondence, but it is important to show
that the strong implicit– explicit correlation observed is not limited
to pictorial methods. Our structural-fit analysis leads us to predict
that when verbal group labels such as “Black” and “White” are
used as primes, both indirect and direct ratings should be correlated with other measures that ask participants to evaluate similar
verbal categories. In Study 2, we aimed to replicate these findings
and extend them by manipulating structural fit across a wider
range of items to be evaluated, including group labels.
Structure: Face versus word primes. For some blocks of trials,
participants were shown photographs of Black and White young
men prior to the pictographs (neutral primes were not used in this
study, because they were not informative for individual differences). The photographs were the same as those used in Study 1.
For the other blocks, participants were primed with words rather
than photographs. Specifically, they were shown three White
group labels (European Americans, White Americans, and Whites)
and three Black group labels (African Americans, Black Americans, and Blacks), one at a time, followed by a pictograph.
Direct versus indirect evaluations. For the indirect trials, participants evaluated the Chinese pictographs, ignoring their feelings
toward the primes. There were 48 indirect trials for the face primes
and 48 indirect trials for the word primes. For the direct trials,
participants rated the primes instead. Thus, in the face-prime
condition, participants rated their feelings toward the photographs,
ignoring their feelings toward the pictographs. Participants rated
each photo once, for a total of 24 direct/picture trials. In the
word-prime condition, participants rated their feelings toward the
verbal group labels, ignoring the pictographs. They rated each
group label once, for a total of six direct/verbal trials.
To ensure that participants could follow instructions without
becoming confused about what they were supposed to rate, we
presented a prompt on every trial that reminded them what to
evaluate. On indirect blocks, the instruction “Rate feelings toward
Chinese character” appeared just above the 4-point rating scale on
each trial. On direct blocks, the phrase “Rate feelings toward photo
of person” or “Rate feelings toward social group” appeared.
Study 2
Study 2 was designed to incorporate one of the most popular
ways to measure attitudes: the “feeling thermometer.” In this
simple method, participants are asked to rate their feelings toward,
say, African Americans, on a scale ranging from very cold and
unfavorable to very warm and favorable. If the same verbal labels
and rating scale are used for direct ratings using the AMP, we have
essentially re-created a feeling thermometer. And if the same
labels and scales are used for indirect AMP ratings, we have an
implicit feeling thermometer. The two measures can be matched on
all the relevant features as already described, and they differ only
in the intent to express an evaluation of the primes. In this study,
we manipulated structural fit by comparing evaluations on the
basis of (a) faces, (b) group labels, (c) the ATB and MRS scales,
and (d) a traditional feeling thermometer. Our hypothesis was that
more closely matched test structures would reveal larger implicit–
explicit correlations.
Self-Report Racial-Attitude Measures
Participants completed the MRS (McConahay, 1983) and the
ATB (Brigham, 1993). A traditional feeling thermometer scale
was also used, in which participants rated their feelings toward
WHY DO IMPLICIT AND EXPLICIT TESTS DIVERGE?
four racial groups, including White Americans, Asian Americans,
Black Americans, and Hispanic Americans. Ratings were made on
a 9-point scale ranging from 1 (very cold and unfavorable) to 9
(very warm and favorable). Feelings toward Black people and
White people were of primary interest.
Operational Definition of Structural Fit
Structural fit in this study was manipulated by the similarity of
the attitude objects to be evaluated. For the purpose of analyzing
how implicit– explicit correlations change depending on structural
fit, rank orders were assigned (a priori) to the similarity between
each implicit– explicit pair. The rank orders were different for the
two implicit tests because of their different structures. For the
indirect/picture task, the most similar explicit task was the direct/
picture task (rank ⫽ 1), followed by the direct/group-label task
(rank ⫽ 2), and thermometer (rank ⫽ 3), and both ATB and MRS
were assigned a tied ranking (rank ⫽ 4). The latter two were
assigned a tie, because there is little a priori reason for distinguishing between them in similarity to the AMP tasks. Task similarity
ranged, then, from pictures of group members to verbal labels for
the groups to more abstract policy-focused scales. For the indirect/
group-labels task, the most similar explicit task was the direct/
group-labels task (rank ⫽ 1), followed by the thermometer (rank ⫽
2), both ATB and MRS scales (tied at rank ⫽ 3), and the direct/
picture task (rank ⫽ 4). Task similarity in this case ranged from
highly similar group labels (in the direct/group-labels task and
traditional feeling thermometer) to policy-focused verbal questions
concerning entire groups to pictures of individual group members.
The rank ordering of similarity provided a way to test whether the
implicit– explicit correlation depended on structural fit. The question was not whether any particular pair of correlations differed
from each other, but instead whether implicit– explicit correlations
showed a general trend to increase as structural fit increased.
Results
Because our main hypotheses concerned individual difference
correlations, we summarize the mean ratings briefly and then focus
in more depth on individual differences. As in Study 1, ratings
were recoded to a 1– 4 scale.
Indirect Evaluations
This sample did not show a significant mean difference in their
indirect responses to Black versus White primes, nor for pictures,
nor for verbal labels. There was no significant difference between
ratings of Chinese pictographs when primed with Black faces
(M ⫽ 2.58, SD ⫽ 0.41) versus White faces (M ⫽ 2.55, SD ⫽ 0.34),
F(1, 47) ⫽ .32, p ⫽ .58. Nor was there a significant difference
when primed with Black verbal labels (M ⫽ 2.75, SD ⫽ 0.53)
versus White verbal labels, (M ⫽ 2.66, SD ⫽ 0.44) F(1, 47) ⫽
1.06, p ⫽ .31.
Direct Evaluations
A similar pattern emerged for direct ratings, with ratings of
Black faces (M ⫽ 2.35, SD ⫽ 0.46) slightly but nonsignificantly
lower than ratings of White faces, (M ⫽ 2.51, SD ⫽ 0.44), F(1,
47) ⫽ 3.79, p ⫽ .06. Direct ratings of the group labels were similar
23
for Blacks (M ⫽ 3.01, SD ⫽ 0.68) and Whites (M ⫽ 3.16, SD ⫽
0.63), F(1, 47) ⫽ 1.05, p ⫽ .31. Finally, the traditional feeling
thermometer showed no difference between feelings toward
Blacks (M ⫽ 6.50, SD ⫽ 1.73) versus Whites (M ⫽ 6.70, SD ⫽
1.86), F(1, 47) ⫽ .64, p ⫽ .43. At the mean level, there was little
or no race bias on indirect ratings and direct ratings, nor on a
traditional feeling thermometer. The difference in mean levels of
bias compared with Study 1 is likely a consequence of sampling
error, combined with the smaller sample size in Study 2. Completing multiple race-related tasks may also have reduced bias by
making race a salient topic or via practice effects. Nonetheless,
mean levels are not of interest for our theory-driven hypotheses
concerning individual differences, to which we turn next.
Individual Differences
Implicit– explicit correlations. Individual scores were computed for each measure as described in Study 1. All measures
showed good reliability as estimated with Cronbach’s alpha
(ATB ⫽ .92, MRS ⫽ .90, indirect/verbal ⫽ .84, indirect/picture ⫽
.71, direct/verbal ⫽ .80, direct/picture ⫽ .74). Even though this
sample evaluated Whites and Blacks about equally on average, the
reliability estimates suggest that individual differences were systematic and reliable.
We first consider implicit– explicit correlations concerning the indirect/picture task, shown in Table 2. Rather than reporting all possible pairwise comparisons between correlation coefficients, we report the 95% confidence interval for each correlation in Table 2 so
that differences between any pair of correlations can be inspected. The
correlations ranged from .21 to .48. As shown by the confidence
intervals, the highest correlation was significantly different from the
lowest correlation, but the intermediate correlations were not significantly different from each other. A similar pattern emerged for
implicit– explicit correlations concerning the indirect/group-labels
task, shown in Table 3. The most similar tasks showed a correlation
of .65, in contrast to .39 for the least similar pair. The highest
coefficient was significantly different from the lowest two coefficients, as shown by the confidence intervals.
The main question in this study was not about the difference
between any specific pair of correlations but, rather, the larger
Table 2
Implicit–Explicit Correlations Between the Indirect/Picture Test
and Each Explicit Test, in Rank Order of Structural Fit, Study 2
Fit (rank)
Test
1
Direct/picture
2
Direct/group labels
3
Thermometer
4
ATB
4
MRS
Indirect/picture
.48***
(.23–.68)
.39**
(.12–.61)
.42**
(.16–.64)
.35*
(.07–.58)
.21
(⫺.08–.47)
Note. In parentheses are 95% confidence intervals for each correlation.
Thermometer ⫽ feeling thermometer; ATB ⫽ Attitudes Toward Blacks
Scale; MRS ⫽ Modern Racism Scale.
*
p ⬍ .05. ** p ⬍ .01. *** p ⬍ .001.
PAYNE, BURKLEY, AND STOKES
Table 3
Implicit–Explicit Correlations Between the Indirect/Labels Test
and Each Explicit Test, in Rank Order of Structural Fit, Study 2
Fit (rank)
Test
Indirect/group labels
1
Direct/group labels
2
Thermometer
3
ATB
3
MRS
4
Direct/pictures
.65***
(.45–.79)
.48***
(.23–.68)
.46***
(.21–.67)
.38**
(.11–.60)
.39**
(.12–.61)
Note. In parentheses are 95% confidence intervals for each correlation.
Thermometer ⫽ feeling thermometer; ATB ⫽ Attitudes Toward Blacks
Scale; MRS ⫽ Modern Racism Scale.
**
p ⬍ .01. *** p ⬍ .001.
trend: Do implicit– explicit correlations increase as structural fit
increases? To answer this question, we reverse scored the rank
orders so that higher values represent greater structural fit. We then
plotted the size of the implicit– explicit correlation (including both
indirect tests) against the degree of structural fit. The results are
shown in Figure 2. Each point on this scatter plot represents one of
the implicit– explicit correlations in Tables 2 and 3. We tested the
correlation between structural fit and the size of implicit– explicit
correlations using Spearman’s rank-order coefficient, which
showed a very strong relationship, rs (10) ⫽ .90, p ⬍ .001. The
degree of relationship between implicit and explicit tests was
tightly linked to their structural fit.
Other correlations. The correlation between the two indirect
AMP tests was r ⫽ .50, p ⬍ .001. This correlation is higher than
is often seen when comparing different implicit measures, but it is
similar to that reported when latent variable analysis was used to
correct for random measurement error (Cunningham et al., 2001).
Table 4 shows the correlations among explicit tests. Although the
effects of structural fit theoretically apply to explicit– explicit
correlations also, it is difficult to know how to clearly rank their
structural similarities. These tests were selected so that they clearly
varied in similarity to the implicit tests, but they were not selected
to be clearly ranked in similarity to each other. In general, measures that were highly similar, such as ATB and MRS, were highly
correlated. The correlation between the thermometer and the
direct/group-labels task was also high. The direct/picture task
tended to have lower correlations with the verbal measures, which
were less similar. Because it is difficult to rank many of the other
pairs, however, we remain cautious about drawing conclusions
about interrelations among the explicit tests.
Discussion
In Study 2, we tested the hypothesis that implicit and explicit
measures of racial attitudes can be highly correlated when they are
equated on structural features. The study, in which we used several
measurement techniques, supported the hypothesis. The relation
between implicit and explicit measures steadily increased as structural similarity increased. Phrased another way, comparing im-
plicit and explicit measures that differed in irrelevant features
undermined their correlation. Had we looked only at an implicit
measure and an explicit measure that differed on many structural
features (as is typically done), we would have wrongly concluded
that the underlying attitudes were only weakly related.
The manipulation of the items to be evaluated is, of course, only
one of many possible ways that tests may differ. Commonly used
implicit and explicit tests often differ in several ways at once. The
correlation between implicit and explicit tests is important because
it has been a key piece of evidence for theories about the nature of
implicit evaluation. The current findings shed new light on that
evidence by suggesting that comparisons on the basis of these tests
may severely underestimate the relationship between underlying
implicit and explicit evaluations.
Our argument that poor structural fit underestimates implicit–
explicit correlations can be seen as an instance of unshared method
variance. On the one hand, unshared method variance can cause
true correlations to be underestimated (Bagozzi & Yi, 1991;
Campbell & Fiske, 1959; Podsakoff et al., 2003). Our proposal to
equate tests on structural features can help reduce that problem. On
the other hand, creating tests with structural fit also increases
shared method variance. Shared method variance, in turn, can
potentially inflate correlations between measures. Are wellequated test structures a cause for concern?
There is reason to think that the risks of underestimating the
implicit– explicit correlation outweigh the risks of overestimating
it. One reason is that comparing implicit and explicit tests can be
thought of just as any other within-subjects experimental design.
The ability to draw conclusions about a manipulated variable rests
on the assumption that other variables do not also differ between
conditions. Holding such extraneous factors constant across experimental conditions is rarely considered a threat to validity in other
experimental research. Instead, it represents good experimental
control.
A second reason is that the methods used here control for some
of the common sources of method variance that may inflate correlations. One common source of method variance is a general
response bias. For example, some participants in our studies may
simply like Chinese pictographs more than others. Or some people
may simply use the higher range of a response scale, whereas
0.70
Implicit-Explicit Correlation
24
r = .90
0.60
0.50
0.40
0.30
0.20
0.10
1
2
3