# Preview
Is there a relationship between your IQ and the wealth of your parents? Between your
computer skills and your GPA? Between your anxiety level and your perceived social
attractiveness? Answers to these questions require us to describe the relationship
between pairs of variables. The original data must consist of actual pairs of
observations, such as, IQ scores and parents’ wealth for each member of the freshman
class. Two variables are related if pairs of scores show an orderliness that can be
depicted graphically with a scatterplot and numerically with a correlation coefficient.

An investigator suspects that a relationship exists between the number of greeting cards sent and the number of greeting cards received by individuals. Prior to a full-fledged survey—and also prior to any statistical analysis based on variability, the investigator obtains the estimates for the most recent holiday season from five friends, as shown in Table 6.1. (The data in Table 6.1 represent a very simple observational study with two dependent variables, since numbers of cards sent and received are not under the investigator’s control.)<br>
![image.png](attachment:86826303-a245-402d-ac27-b05b9ddcc7fa.png)

## An Intuitive Approach
If the suspected relationship does exist between cards sent and cards received, then an inspection of the data might reveal, as one possibility, a tendency for “big senders” to be “big receivers” and for “small senders” to be “small receivers.” More generally, there is a tendency for pairs of scores to occupy similar relative positions in their respective distributions.<br>
### Positive Relationship
Trends among pairs of scores can be detected most easily by constructing a list of paired scores in which the scores along one variable are arranged from largest to smallest. In panel A of Table 6.2, the five pairs of scores are arranged from the largest (13)
to the smallest (1) number of cards sent. This table reveals a pronounced tendency for
pairs of scores to occupy similar relative positions in their respective distributions.
For example, John sent relatively few cards (1) and received relatively few cards (6),
whereas Doris sent relatively many cards (13) and received relatively many cards (14).
We can conclude, therefore, that the two variables are related. Furthermore, this relationship implies that “You get what you give.”<br>
#### Insofar as relatively low values are paired with relatively low values, and relatively high values are paired with relatively high values, the relationship is positive.

### Negative Relationship
Notice the pattern among the pairs in panel B. Now there is a pronounced tendency for pairs of scores to occupy dissimilar and opposite relative positions in their
respective distributions. For example, although John sent relatively few cards (1), he
received relatively many (18). From this pattern, we can conclude that the two variables are related. Furthermore, this relationship implies that “You get the opposite of what you give.”<br>
#### Insofar as relatively low values are paired with relatively high values, and relatively high values are paired with relatively low values, the relationship is negative.

### Little or No Relationship
No regularity is apparent among the pairs of scores in panel C. For instance, although
both Andrea and John sent relatively few cards (5 and 1, respectively), Andrea received
relatively few cards (6) and John received relatively many cards (14). Given this lack
of regularity, we can conclude that little, if any, relationship exists between the two
variables and that “What you get has no bearing on what you give.”<br>

![image.png](attachment:e019a6c3-8c19-4ecd-909a-54646b5a8014.png)

### Two variables are positively related if pairs of scores tend to occupy similar relative positions (high with high and low with low) in their respective distributions, and they are negatively related if pairs of scores tend to occupy dissimilar relative positions (high with low and vice versa) in their respective distributions.

## Scatterplots
### A scatterplot is a graph containing a cluster of dots that represents all pairs of scores.
## Construction
To construct a scatterplot, as in Figure 6.1, scale each of the two variables along the
horizontal (X) and vertical (Y) axes, and use each pair of scores to locate a dot within
the scatterplot. For example, the pair of numbers for Mike, 7 and 12, define points
along the X and Y axes, respectively. Using these points to anchor lines perpendicular
(at right angles) to each axis, locate Mike’s dot where the two lines intersect. Repeat
this process, with imaginary lines, for each of the four remaining pairs of scores to create the scatterplot of Figure 6.1.<br>
![image.png](attachment:f9ee7590-2d1d-4d42-9fdb-eb65d5e97e37.png)

## Positve , Negative , or Little or No Relationship?
The first step is to note the tilt or slope, if any, of a dot cluster. A dot cluster that
has a slope from the lower left to the upper right, as in panel A of Figure 6.2, reflects
a positive relationship. Small values of one variable are paired with small values of the
other variable, and large values are paired with large values. In panel A, short people
tend to be light, and tall people tend to be heavy.<br>
On the other hand, a dot cluster that has a slope from the upper left to the lower
right, as in panel B of Figure 6.2, reflects a negative relationship. Small values of
one variable tend to be paired with large values of the other variable, and vice versa.
In panel B, people who have smoked heavily for few years or not at all tend to have
longer lives, and people who have smoked heavily for many years tend to have shorter
lives.<br>
Finally, a dot cluster that lacks any apparent slope, as in panel C of Figure 6.2,
reflects little or no relationship. Small values of one variable are just as likely to be
paired with small, medium, or large values of the other variable. In panel C, notice that
the dots are strewn about in an irregular shotgun fashion, suggesting that there is little
or no relationship between the height of young adults and their life expectancies.<br>
![image.png](attachment:03d900cd-a86d-45df-bdcf-c8a60e4bcc83.png)

# Strong or Weak Relationship?
Having established that a relationship is either positive or negative, note how closely the dot cluster approximates a straight line. 
### The more closely the dot cluster approximates a straight line, the stronger (the more regular) the relationship will be.
Figure 6.3
shows a series of scatterplots, each representing a different positive relationship between
IQ scores for pairs of people whose backgrounds reflect different degrees of genetic
overlap, ranging from minimum overlap between foster parents and foster children
to maximum overlap between identical twins. (Ignore the parenthetical expressions involving r, to be discussed later.) Notice that the dot cluster more closely approximates
a straight line for people with greater degrees of genetic overlap—for parents and children in panel B of Figure 6.3 and even more so for identical twins in panel C.
## Perfect Relationship
A dot cluster that equals (rather than merely approximates) a straight line reflects a
perfect relationship between two variables. In practice, perfect relationships are most
unlikely.

## Curvilinear Relationship
The previous discussion assumes that a dot cluster approximates a straight line and,
therefore, reflects a linear relationship. But this is not always the case. Sometimes a
dot cluster approximates a bent or curved line, as in Figure 6.4, and therefore reflects
a curvilinear relationship. Descriptions of these relationships are more complex than
those of linear relationships. For instance, we see in Figure 6.4 that physical strength,
as measured by the force of a person’s handgrip, is less for children, more for adults,
and then less again for older people. Otherwise, the scatterplot can be interpreted as
before—that is, the more closely the dot cluster approximates a curved line, the stronger the curvilinear relationship will be.<br>
![image.png](attachment:41bc4bfa-61f6-486e-9290-c47985fb7995.png) <br>
![image.png](attachment:7048d13e-11ef-47e1-9877-e962d05222cd.png)

# A Correlation Coefficient For Quantitative Data: r
### A correlation coefficient is a number between –1 and 1 that describes the relationship between pairs of variables.
In the next few sections we concentrate on the type of correlation coefficient,
designated as r, <b><i><u>that describes the linear relationship between pairs of variables for
quantitative data.</b></i></u> Many other types of correlation coefficients have been introduced to
handle specific types of data, including ranked and qualitative data.
## Key Properties of r
Named in honor of the British scientist Karl Pearson, the Pearson correlation coefficient, r, can equal any value between –1.00 and +1.00. Furthermore, the following two properties apply:
1. The sign of r indicates the type of linear relationship, whether positive or negative.
2. The numerical value of r, without regard to sign, indicates the strength of the linear relationship.

## Interpretation of r
Located along a scale from –1.00 to +1.00, the value of r supplies information about the direction of a linear relationship—whether positive or negative—and, generally, information about the relative strength of a linear relationship—whether relatively weak (and a poor describer of the data) because r is in the vicinity of 0, or relatively strong (and a good describer of the data) because r deviates from 0 in the direction of either +1.00 or –1.00.<br>If, as usually is the case, we wish to generalize beyond the limited sample of actual
paired scores, r can’t be interpreted at face value. Viewed as the product of chance sampling variability, the value of r must be evaluated with tools from inferential statistics to establish whether the relationship is real or merely transitory. This evaluation depends not only on the value of r but also on the actual number of pairs of scores used to calculate r. On the assumption that reasonably large numbers of pairs of scores are involved (preferably hundreds and certainly many more than the five pairs of scores in our purposely simple greeting card example), an r of .50 or more, in either the positive or the negative direction, would represent a very strong relationship in most areas of behavioral and educational research.* But there are exceptions. An r of at least .80 or more would be expected when correlation coefficients measure “test reliability,” as determined, for example, from pairs of IQ scores for people who take the same IQ test twice or take two forms of the same test (to establish that any person’s two scores tend to be similar and, therefore, that the test scores are reproducible, or “reliable”).<br>

## r Is Independent of Units of Measurement

### The value of r can’t be interpreted as a proportion or percentage of some perfect relationship.


## Correlation Not Necessarily Cause-Effect
### A correlation coefficient, regardless of size, never provides information about whether an observed relationship reflects a simple cause-effect relationship or some more complex state of affairs.
<br>Given a correlation between the prevalence of poverty and crime in U.S. cities, you can speculate that poverty causes crime—that is, poverty produces crime with the same degree of inevitability as the flip of a light switch illuminates a room. According to this view, any widespread reduction in poverty should cause a corresponding decrease in crime. As suggested in Chapter 1, you can also speculate that a common cause such as inadequate education, overpopulation, racial discrimination, etc., or some combination of these factors produces both poverty and crime. According to this view, a widespread reduction in poverty should have no effect on crime. Which speculation is correct? Unfortunately, this issue cannot be resolved merely on the basis of an observed correlation.<br>

# Computational Formula For r
Calculate a value for r by using the following computation formula:
### CORRELATION COEFFICIENT (COMPUTATION FORMULA) - 6.1
$$ r = \frac{SP_{xy}}{\sqrt{SS_xSS_y}}$$
where the two sum of squares terms in the denominator are defined as
$$ SS_x = \Sigma{(X - \bar{X})^2} = \Sigma{X^2} - \frac{(\Sigma{X})^2}{n} $$
$$ SS_y = \Sigma{(Y - \bar{Y})^2} = \Sigma{Y^2} - \frac{(\Sigma{Y})^2}{n} $$
and the sum of the products term in the numerator, $SP_{xy}$ , is defined in Formula 6.2.<br>
### SUM OF PRODUCTS (DEFINITION AND COMPUTATION FORMULAS) - 6.2
$$ SP_{xy} = \Sigma{(X - \bar{X})(Y - \bar{Y})} = \frac{(\Sigma{X})(\Sigma{Y})}{n}$$<br>
In the case of $SP_{xy}$ , instead of summing the squared deviation scores for either X or Y,
as with $SS_x$ and $SS_y$ , we find the sum of the products for each pair of deviation scores.
Notice in Formula 6.1 that, since the terms in the denominator must be positive, only the
sum of the products, $SP_{xy}$ , determines whether the value of r is positive or negative. Furthermore, the size of $SP_{xy}$ mirrors the strength of the relationship; stronger relationships
are associated with larger positive or negative sums of products. Table 6.3 illustrates
the calculation of r for the original greeting card data by using the computation formula.

![image.png](attachment:2e24a1a2-81b1-4b2c-912a-935c57a1544d.png)