Skip to content

@briatte

François Briatte edited this page May 28, 2023 · 13 revisions

Strong views

I will die on those hills

  1. Report as few precision digits as necessary.

    Most of the time, that's one. There are receivable counter-views.

    Nathaniel Beck never published his paper in support of his leanout Stata package, but everything in there was correct and warranted. The leanout package strips regression results to their bare bones: 1-digit estimates, N, and RMSE. Damn right.

  2. RMSE is useful. R-squared is not.

    And RMSE should be renamed ‘penalized average error’ for clarity.

  3. Anything that looks like stepwise regression is a bad idea.

    It's not anyone's fault regularization came much later.

Principles

  • No synthetic data, no toy data (e.g. Iris).
    • 100% real-world data.
    • Try to study just one dimension (counts, timelines).
    • Try to study too many dimensions (text, surveys).
  • Write code for humans, write data for computers (Vince Buffalo).

References

Not listed in the syllabus:

Axioms

  1. Most people understand the difference between percentages and percentage points, but cannot explain it.
  2. Most people do not know how to compute a weighted mean, even when they understand the concept.
  3. Due to the tons of biases that apply to human probabilistic reasoning, everyone gets some probability wrong.
  • Rare independent events can occur twice in a row
  • Absolute risk ≠ Relative risk, and…
  • Relative risk ≠ Odds ratio
  1. Most people naturally understand growth rates.
  2. Most people naturally understand exponential growth (sometimes by confusing it with power laws).
  • Understanding of linear v. nonlinear relationships: squared, exponential, asymptotic (square root).
  1. Almost everyone understands fractions, i.e. ratios, i.e. normalized measures.
  • Compare: GDP, GDP/capita, GDP/household/week/year
  1. 95% of your readers will stick with simple descriptives—only the last 5% might look at the model.
  2. Do not use percentages on small samples.
  3. Do not use high levels of precision: zero or one decimal will fit most situations.
  • Units: natural, indices, fractions (percentages), quantiles.
  1. Sometimes, the answer is not in the data (John Tukey, cited by Edward Tufte).

Remember that when building/plotting stuff.

Recommended:

Claims

  • Epistemic: selected w/r/t rules of interpretation.
    • Fairness, transparency
    • Organised skepticism, detachment
    • Openness (universalism)
  • Technical: selected w/r/t method of production.
    • Effectiveness (efficiency)
    • Experience (local knowledge)
    • Expertise (sophistication)
  • Aesthetic: selected w/r/t elegance.
    • "Authenticity"
    • Introspective, "resonates with inner stuff"
    • Fame
  • Normative: selected w/r/t desirable goals.
    • Sollen: value-laden, moral
    • Justice, rightness
    • Engagement

Inspiration: Patrick Thaddeus Jackson