# Crosstabs and Mean Comparisons

In previous labs, we have covered describing our data and simple data cleaning and variable creation in Stata. Now let's move on to comparing values of two variables together. We will focus on two simple methods, cross-tabulations and mean comparisons. Both methods are used when you have independent variables with categorical or ordinal values. Crosstabs are used when the dependent variable is also categorical or ordinal, while mean comparisons can be used when the dependent variable is continuous/interval or is a dummy/dichotomous variable. 

For this lab, we will use the July 2020 AP-NORC Poll, available from the Roper Center. See the instructions for downloading and accessing the data from [the previous lab](https://nbviewer.jupyter.org/github/bowendc/labs/blob/master/lab_cleaningandcoding.ipynb). 

First hypothesis: respondents exposed to the coronavirus are more likely to support closing bars and restaurants than are those who have not been exposed.

Second hypothesis: respondents worried about the coronavirus infection are more likely to say the country is headed in the wrong direction. 

Third hypothesis: respondents experiencing economic hardship are more likely to say the country is headed in the wrong direction. 


In [24]:
* Change the file path below to the appropriate working directory for your machine
quietly cd c:\Users\bowen\OneDrive\Courses\Political_Analysis\Labs\F2020\
quietly use 31117583.dta, clear 

In [25]:
* Recode the variables we'll use in the analysis, making sure to code
*   missing data as periods (.)
*   We can also specify value labels directly in the recode command if
*   we are creating a new variable using the "gen" option

codebook CUR1 
recode CUR1 (1=1 "Right direction")(2=0 "Wrong direction")(99=.), gen(rightdir)
codebook politics B2AB
codebook VIRUS2A
recode VIRUS2A (1=5 "Extremely Worried")(2=4)(3=3 "Somewhat worried")(4=2)(5=1 "Not at all worried")(99=.), gen(worried)
codebook VIRUS7A 
recode VIRUS7A (1=5 "Strongly favor")(2=4)(3=3 "Neither favor nor oppose")(4=2)(5=1 "Strongly Oppose")(99=.), gen(closebars)
recode VIRUS14 (1=1 "Yes")(2=0 "No")(99=.), gen(gotcorona)



--------------------------------------------------------------------------------
CUR1                                   CUR1: Generally speaking, would you say
                                       things in this country are heading in th
--------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  CUR1

                 range:  [1,99]                       units:  1
         unique values:  3                        missing .:  0/1,057

            tabulation:  Freq.   Numeric  Label
                           197         1  (1) Right direction
                           851         2  (2) Wrong direction
                             9        99  (99) DON'T KNOW/SKIPPED ON
                                          WEB/REFUSED (VOL)

(860 differences between CUR1 and rightdir)


--------------------------------------------------------------------------------
politics                               PO

## Crosstabs

We can create a crosstab by using the ***tab2*** command. Alternatively, we can simply use ***tab*** with two variables listed. A crosstab is a two-way frequency table. It shows how your observations are jointly distributed across both variables. We can use such a table to evaluate the relationship between X and Y by seeing how the values of your Y variable become more (or less) likely as you change categories of the X variable. Be sure to specify the _col_ option to calculate column percentages. Crosstabs are interpreted by reading columns within a row. 

In [26]:
* SYNTAX: tab dv iv, col
tab closebars gotcorona, col


+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

                      |   RECODE of VIRUS14
    RECODE of VIRUS7A | (VIRUS14: Have you or
 (VIRUS7A: [Requiring | has a close friend or
 bars and restaurants |  relative been diag
 to close] In respons |        No        Yes |     Total
----------------------+----------------------+----------
      Strongly Oppose |       100         27 |       127 
                      |     13.09       9.64 |     12.16 
----------------------+----------------------+----------
                    2 |       142         50 |       192 
                      |     18.59      17.86 |     18.39 
----------------------+----------------------+----------
Neither favor nor opp |       130         39 |       169 
                      |     17.02      13.93 |     16.19 
----------------------+----------------------+----------
                    4 |       194         76 |       

## Mean Comparisons

Mean comparison tests follow a similar logic. What happens to the mean of the dependent variable when we change categories of the independent variable? Does the average value of the DV change in the hypothesized way? We can conduct a mean comparison test also using the ***tab*** command, this time with the ***sum*** option. The IV should be categorical or ordinal, and the DV should be continuous or a dummy variable.

In [29]:
tab rightdir

* SYNTAX: tab iv, sum(dv)
tab worried, sum(rightdir)



 RECODE of CUR1 |
         (CUR1: |
      Generally |
speaking, would |
 you say things |
in this country |
              a |      Freq.     Percent        Cum.
----------------+-----------------------------------
Wrong direction |        851       81.20       81.20
Right direction |        197       18.80      100.00
----------------+-----------------------------------
          Total |      1,048      100.00


  RECODE of |
    VIRUS2A |
  (VIRUS2A: |
       [The |
coronavirus |
      ] How |  Summary of RECODE of CUR1 (CUR1:
worried are |  Generally speaking, would you say
  you about |      things in this country a
   you or s |        Mean   Std. Dev.       Freq.
------------+------------------------------------
  Not at al |       .2375     .428236          80
          2 |   .30434783   .46180692         138
  Somewhat  |   .22769231    .4199896         325
          4 |   .13445378   .34185816         238
  Extremely |   .10984848   .31329473         264
------------+--------

In [28]:
 * What about the economy?
tab B2AB
tab B2AB, sum(rightdir)


  B2AB: And |
  how would |
        you |
   describe |
        the |
  financial |  Summary of RECODE of CUR1 (CUR1:
  situation |  Generally speaking, would you say
in your own |      things in this country a
household t |        Mean   Std. Dev.       Freq.
------------+------------------------------------
  (1) Very  |   .32738095   .47066043         168
  (2) Somew |   .22033898   .41506189         354
  (3) Lean  |   .16216216   .36959978         185
  (4) Neith |           1           0           1
  (5) Lean  |   .09774436   .29809145         133
  (6) Somew |   .07586207   .26569507         145
  (7) Very  |   .14516129   .35513905          62
------------+------------------------------------
      Total |    .1879771   .39088042       1,048


## Accounting for Confounding Variable Z

There are several ways to "control" for a confounding variable. In a crosstab or mean comparison, we could hold the categories of the Z variable constant and look at the relationship between X and Y inside each category of Z. Let's do this for both the crosstab test (controlling for gender) and the mean comparison (controlling for political party). 

In [40]:
* Perhaps the simplest way to control for Z is to run the 
*   crosstab command multiple times, each time selecting 
*   different categories of Z:

* Let's look at the values of Z
codebook gender


-----------------------------------------------------------------------------------
gender                                                               GENDER: Gender
-----------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  GENDER

                 range:  [1,2]                        units:  1
         unique values:  2                        missing .:  0/1,057

            tabulation:  Freq.   Numeric  Label
                           423         1  (1) Male
                           634         2  (2) Female


In [34]:
* Now let's re-run our cross tab, once for women and once for men
*    using "if" to select each categories
tab closebars gotcorona if gender==1, col
tab closebars gotcorona if gender==2, col



+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

                      |   RECODE of VIRUS14
    RECODE of VIRUS7A | (VIRUS14: Have you or
 (VIRUS7A: [Requiring | has a close friend or
 bars and restaurants |  relative been diag
 to close] In respons |        No        Yes |     Total
----------------------+----------------------+----------
      Strongly Oppose |        50         11 |        61 
                      |     15.43      11.46 |     14.52 
----------------------+----------------------+----------
                    2 |        61         17 |        78 
                      |     18.83      17.71 |     18.57 
----------------------+----------------------+----------
Neither favor nor opp |        46         21 |        67 
                      |     14.20      21.88 |     15.95 
----------------------+----------------------+----------
                    4 |        87         19 |      

We have a bit more flexibility with the mean comparison test, as Stata's ***tab*** command can be used to show the mean of the Y across the joint distribution of X and Z. For the purposes of this example, let's also select just those respondents who identify as either Democrats or Republicans.

In [41]:
* let's see how our Z variable is coded
codebook politics


-----------------------------------------------------------------------------------
politics                               POLITICS: Do you consider yourself a
                                       Democrat, a Republican, an independent or n
-----------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  POLITICS

                 range:  [1,99]                       units:  1
         unique values:  5                        missing .:  0/1,057

            tabulation:  Freq.   Numeric  Label
                           347         1  (1) Democrat
                           324         2  (2) Republican
                           258         3  (3) Independent
                           119         4  (4) None of these
                             9        99  (99) DON'T KNOW/SKIPPED ON
                                          WEB/REFUSED (VOL)


In [38]:
* Mean comparison, controlling for party
*   the "nofreq" and "nostandard" options suppress the 
*   frequencies and standard deviation output
tab worried politics if politics ==1 | politics ==2, ///
    sum(rightdir) nofreq nostandard


                                   Means
of RECODE of CUR1 (CUR1: Generally speaking, would you say things in this country a

 RECODE of |
   VIRUS2A |
 (VIRUS2A: |
      [The |
coronaviru |  POLITICS: Do you
    s] How | consider yourself a
   worried |     Democrat, a
   are you |   Republican, an
 about you |  independent or n
      or s | (1) Democ  (2) Repub |     Total
-----------+----------------------+----------
 Not at al |         0  .26190476 |       .22
         2 | .18181818  .36486486 | .34117647
 Somewhat  | .13483146  .35833333 | .26315789
         4 | .03846154  .31111111 | .12080537
 Extremely | .05263158  .24390244 | .09770115
-----------+----------------------+----------
     Total | .07246377  .32608696 | .19490255


We can run the controlled comparison again, this time looking at the relationship between economic well-being, party, and assessments of the direction of the country:

In [39]:
tab B2AB politics if politics ==1 | politics ==2, ///
    sum(rightdir) nofreq nostandard


                                   Means
of RECODE of CUR1 (CUR1: Generally speaking, would you say things in this country a

 B2AB: And |
 how would |
       you |
  describe |
       the |
 financial |  POLITICS: Do you
 situation | consider yourself a
   in your |     Democrat, a
       own |   Republican, an
 household |  independent or n
         t | (1) Democ  (2) Repub |     Total
-----------+----------------------+----------
 (1) Very  | .10638298  .46835443 | .33333333
 (2) Somew |  .1010101  .33576642 | .23728814
 (3) Lean  | .05084746  .24137931 | .14529915
 (5) Lean  | .04347826  .15789474 | .07692308
 (6) Somew | .03225806         .2 | .08045977
 (7) Very  |    .09375         .2 | .10810811
-----------+----------------------+----------
     Total | .07246377  .32817337 | .19610778


What do these results tell you about which issues are more salient for Democractic and Republican respondents, respectively?