# What information can we get by conducting a correlation test?


- By proving the two variables have association, we may:
    - Focus on studying the relevant features.
- By finding the direction (+ or -) of the association, we may:
    - Know how would the feature impact the transportation safety in general. 
    - Risk perception.
- By measuring the strength of the association, we may:
    - Know to what extent the features would impact the transportation safety (quantify the impact).
    - Find areas are more dangerous by just examining the data.
    - Pay more attention to these features when doing transportation planning.

# Understand data type

- **Categorical data (qualitative)**
    - **Nominal**: Named categories
        - Example: 
            - Humans factors: Gender, Ethnic background, Driver condition
            - Environmental factors: Day of week, Light condition
            - Roadway factors: Roadway functional class, Traffic control type
            - Vehicle factors: Vehicle condition, Vehicle type
            - Crash results: Crash type
    - **Ordinal**: Categories with an implied order. If presented by numbers, the difference between numbers does not have a meaning.
        - Example: 
            - Humans factors: Income level
            - Road evaluation: Grade level, Road condition level
            - Crash results: Injury level
- **Numerical data (quantitative)**
    - **Discrete**: Only has particular numbers. Usually obtained by counting. 
        - Example: 
            - Roadway factors: Posted speed, Number of intersections 
            - Crash results: Number of crashes, Number of times
    - **Continuous**: Numeric value, usually obtained by measuring.
        - Can be further classified as Interval data and Ratio data
            - **Interval**: No true zero. For example: Temperature
            - **Ratio**: True zero exists. For example: Age, Height, Weight
        - Example :
            - Humans factors: Age
            - Traffic factors: AADT, Bus trips
            - Roadway factors: Street length, Curve ratio, Sidewalk condition score, Sidewalk width
            - Environmental factors: Population density
            - Crash results: Crash rate
            

# What methodology should we use?


|  | Norminal X (Categorical)| Ordinal X (Categorical)| Discrete X (Numerical)| Continuous X (Numerical)|
| --- | --- | --- |--- | --- |
| **Norminal Y <br> (Categorical)** | **For 2x2 table**: <br> Chi-square + Phi-coefficient <br> Odds ratio <br> <font color='blue'>Example: <br> Test if installing road signs (Yes/No) would help prevent crash (Crash/No Crash) </font> <br> <br> **For mxn table**: <br> Chi-square + Cramer's V <br> Goodman-Kruskal tau <br> Goodman-Kruskal lambda <br> <font color='blue'>Example: <br> Test if Road functional classes and Crash type are independent </font> <br>| **For 2x2 table**: <br> phi-coefficient <br> <br> **For mxn table**: <br> Chi-square + Cramer's V <br> <font color='blue'>Example: <br> Test if Income level and Crash type are independent </font> <br> |Chi-square + Cramer's V <br> <font color='blue'>Example: <br> Test if having a lower Posted speed made any difference in helping prevent crash (Crash/No Crash)  </font> <br>| **For binary Y**: <br> Logistic regression <br> Point-Biseral correlation <br> <font color='blue'>Example: <br> Test if there is an association between Population density and Crash (Crash/No Crash) </font> <br> <br> **For Multi-class Y**: <br> Oneway ANOVA <br> Kruskal-Wallis H test <br> <font color='blue'>Example: <br> Test if there is an association between Population density and Crash type </font> <br>|
| **Ordinal Y <br> (Categorical)** | **For Binary X**: <br> Rank-biserial correlation coefficient <br> <font color='blue'>Example: <br> Test if installing road signs (Yes/No) would help reduce Crash severity level </font> <br> <br> **For Multi-class X**: <br>Chi-square + Cramer's V <br> <font color='blue'>Example: <br> Test if Road functional class is associated with Crash severity level <br> (if a Crash severity level on a specific Road functional class is significantly higher than the others </font> <br>| Goodman-Kruskall's Gamma coefficient <br> Kendal's tau <br>  Spearman rank order correlation <br> Chi-square + Cramer's V <br> Somer's d <br> <font color='blue'>Example: <br> Test if Income level is associated with Crash severity level </font> <br> |Goodman-Kruskall's Gamma coefficient <br> Kendal's tau <br> Chi-square + Cramer's V <br> <font color='blue'>Example: <br> Test if Number of intersections is associated with Crash severity level </font> <br> | Kendall's coefficient of rank correlation <br> Spearman-rank correlation coefficient <br> <font color='blue'>Example: <br> Test if AADT is associated with Crash severity level </font> <br>|
| **Discrete Y <br> (Numerical)** | **For Binary X**: <br> Point-Biseral correlation <br> Chi-square + Cramer's V <br> <font color='blue'>Example: <br> Test if installing Road sign would help reduce Crash counts </font> <br> <br> **For Multi-class X**: <br> Oneway ANOVA <br> Kruskal-Wallis H test <br> Chi-square + Cramer's V <br> <font color='blue'>Example: <br> Test if Road functional class is associated with Crash counts <br> (Test if there is a Road functional class has a significantly higher Crash counts than the others) </font> <br>| Goodman-Kruskall's Gamma coefficient <br> Kendal's tau <br> Chi-square + Cramer's V  <br> <font color='blue'>Example: <br> Test if having a better Road condition (Road condition level) would helo reduce Crash counts </font> <br> |Chi-square + Cramer's V <br> One-way ANOVA <br> Pearson correlation <br> Spearman correlation  <br> <font color='blue'>Example: <br> Test if Number of intersections is associated with Crash counts </font> <br> |  Pearson correlation <br> Spearman correlation <br> <font color='blue'>Example: <br> Test if Population density and Crash counts are monotonically associated </font> <br> |
| **Continuous Y <br> (Numerical)** | **For Binary X**: <br> Point-Biseral correlation <br> Student T test <br> <font color='blue'>Example: <br> Test if there is a significant difference between AADT of Crash/Non-Crash sites </font> <br> <br> **For Multi-class X**: <br> Oneway ANOVA <br> Kruskal-Wallis H test | **For Binary X**: <br> Point-Biseral correlation <br> Student T test <br> **For Multi-class X**: <br> Oneway ANOVA <br> Kruskal-Wallis H test  <br> <font color='blue'>Example: <br> Test if there is an association between Road condition level and the Crash rate </font> <br> | Pearson correlation <br> Spearman correlation <br> <font color='blue'>Example: <br> Test if there is an association between Number of legs of the intersection and the Intersection crash rate </font> <br>|  Pearson correlation <br> Spearman correlation <br> Linear regression <br> <font color='blue'>Example: <br> Test if there is an association between AADT and Crash rate </font> <br>|

Please refer to the the **CorrelationTestMethodology** chapter for the introduction of methodologies.