# Do Taller Presidents Get More Votes?

You are given two `DataFrame`s of US presidencies and electoral college votes. Calculate the correlation between how tall presidents are and their share of votes in the electoral college, rounded to two decimal places. Examples of the [presidencies](presidents.csv) and [electoral college votes](electoral_college_votes.csv) data are included in the repository.

## Implementation Section

In [1]:
import pandas as pd

def df_tall_votes(presidents, votes):
    # Basic type checks
    if presidents is None or votes is None:
        return None
    if not isinstance(presidents, pd.DataFrame) or not isinstance(votes, pd.DataFrame):
        return None

    # Required columns
    req_pres = {"NAME", "HEIGHT"}
    req_votes = {"WINNER", "TOTAL_VOTES", "WINNER_VOTES"}
    if not req_pres.issubset(presidents.columns):
        return None
    if not req_votes.issubset(votes.columns):
        return None

    if presidents.shape[0] == 0 or votes.shape[0] == 0:
        return None

    # Work on copies with only needed columns
    pres = presidents[["NAME", "HEIGHT"]].copy()
    vt = votes[["WINNER", "TOTAL_VOTES", "WINNER_VOTES"]].copy()

    # Coerce numeric columns
    pres["HEIGHT"] = pd.to_numeric(pres["HEIGHT"], errors="coerce")
    vt["TOTAL_VOTES"] = pd.to_numeric(vt["TOTAL_VOTES"], errors="coerce")
    vt["WINNER_VOTES"] = pd.to_numeric(vt["WINNER_VOTES"], errors="coerce")

    # Drop rows with missing essential data
    pres = pres.dropna(subset=["NAME", "HEIGHT"])
    vt = vt.dropna(subset=["WINNER", "TOTAL_VOTES", "WINNER_VOTES"])

    if pres.shape[0] == 0 or vt.shape[0] == 0:
        return None

    # Remove rows with TOTAL_VOTES == 0 to avoid illegal division
    vt = vt[vt["TOTAL_VOTES"] != 0]
    if vt.shape[0] == 0:
        return None

    # Compute share of votes
    vt["SHARE"] = vt["WINNER_VOTES"] / vt["TOTAL_VOTES"]

    # Merge on president name / winner
    merged = pd.merge(
        pres,
        vt[["WINNER", "SHARE"]],
        left_on="NAME",
        right_on="WINNER",
        how="inner"
    )

    # Need at least 2 observations to compute correlation
    if merged.shape[0] < 2:
        return None

    corr = merged["HEIGHT"].corr(merged["SHARE"])

    # Correlation can still be NaN (e.g. no variation)
    if pd.isna(corr):
        return None

    return float(round(corr, 2))


## Public Test Cases

You are free to add more test cases to explore how your code behaves with different inputs. Assert checks in the notebook are disabled when your code is graded. Keep in mind that the whole notebook needs to run; that is, the notebook cannot contain any faulty code.

In [2]:
import pandas as pd
assert df_tall_votes(pd.read_csv('presidents.csv'), pd.read_csv('electoral_college_votes.csv')) == .59