Intro words go here about introducing the concept of buzzfeed quizzes and what they're trying to do.

Link some fun ones: here

Link some fun ones: here

Link some fun ones: here

In all of these, though, their goal is to determine which thing you're closest to. They're trying quantify your personality in order to tell you which thing you're closest to.

How do we even begin to determine which personality yours is the closest to?

Well there are two implications in that question and we'll need to address both of them before we can arrive at a satisfying answer. The assumptions are (1) that personality is something we can quantify and, (2) that, once quantified, we have a way to measure how “close” or “far” one personality is from another.

Let’s address these, one at a time.

# 1) Personality is quantifiable

How do we turn something abstract, like personality, into something concrete, like numbers?

As it turns out, scientists studying the topic have found a pretty accurate way to do it. We can reliably piece apart a personality into just 5 different traits, assign a score for each trait, then view the unique combination of those scores as someone’s personality. Here, the traits are openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism, but the details aren’t really important for the purposes of this blog post (though do check out [the wikipedia article](https://en.wikipedia.org/wiki/Big_Five_personality_traits) if you’d like to learn more!).

What matters here is that we have a way to represent personality numerically. We ask someone a series of questions, use their answers to arrive at a score on each of the 5 dimensions, then view the unique combinations of those scores as a representation of their personality. For example, we might get an outcome like:


| Name | Openness | Consc. | Extrav. | Agree. | Nuerot. |
| ---- | ---- | ---- | ---- | ---- | ---- |
| Reader | 70 | 90 | 50 | 60 | 20 |


This person would be responsible, curious, and usually pretty easy to talk to even if they're not always seeking out conversations. And now that we have their personality represented in numbers, we can compare it with others and see which they’re most like!

| Name | Openness | Consc. | Extrav. | Agree. | Neruot. |
| ---- | ---- | ---- | ---- | ---- | ---- |
| Reader | 70 | 90 | 50 | 60 | 20 |
| Hmmm | 70 | 90 | 50 | 60 | 60 |
| Hommm | 80 | 80 | 60 | 50 | 20 |

There’s just one question remaining: how do we know which other personality is closest?

# 2) Measuring “Distance” Between Personalities 

So we have a quantified personality. And if we’d quantified it using just one number, telling how close or far one personality is from another would be really simple! We could just subtract the first personality number from the other, and the closer the result is to “0”, the closer the two personalities would be. Unfortunately, we can’t do that because we had to use more than just one number in our personality assessment. Remember, we had 5 dimensions total, so we have to compare 5 separate numbers all at once.

What if, instead, we took the differences between every trait one by one and then combined those differences together to get an aggregate difference? Well, that kind of works but there’s a problem - two different people might end up with the same total difference despite unique scores on each of the individual traits. And, actually, as the astute reader may have noticed we’ve already seen this happen!

| Name | Openness | Consc. | Extraversion | Agree. | Neruot. |
| ---- | ---- | ---- | ---- | ---- | ---- |
| Reader | 70 | 90 | 50 | 60 | 20 |
| Hmm | 70 | 90 | 50 | 60 | 60 |
| Homm | 80 | 80 | 60 | 50 | 20 |

Our main character is 40 points away from option 1 on just one dimension (neuroticism), but 10 points away from option 2 on four dimensions (all *except* neuroticism). Both of their total difference scores come out to 40, despite having substantially different personalities. But which is closer? Is it closer to be 40 points away on one thing or 10 points away on four things?

Lucky for us, an insightful mathematician came up with a way to calculate just this sort of thing - [and they did it 4,000 years ago](https://www.britannica.com/science/Pythagorean-theorem)! However, it wasn’t until Pythagorus came along about 1,500 years later that the technique became widely known: the Pythagorean theorem.

$$
A^2 + B^2 = C^2
$$

Now, that’s a formula with mathematical meanings, but I’m also going to visualize it because the visual will help us understand the math that comes next. Basically, imagine this: There are two dots on a line. The first one is at point 1 and the second one is at point 5. We can measure between them and say, “These two dots are 4 points away from each other.”


In [None]:
# Load packages
import numpy as np
import pandas as pd
import plotnine as pn

# Create points for graphing
dat = pd.DataFrame({'x': [1,5], 'y': [1,4]})

# plot data
(
pn.ggplot(dat, pn.aes(x = 'x', y = 0.02)) +
  pn.geom_point(size = 5) +
  pn.annotate(geom = 'segment', x = 1, xend = 5, y = .05, yend = .05, color = 'red', size = 1.5) +
  pn.annotate(geom = 'segment', x = 1, xend = 1, y = .0513, yend = .03, color = 'red', size = 1.5) +
  pn.annotate(geom = 'segment', x = 5, xend = 5, y = .0513, yend = .03, color = 'red', size = 1.5) +
  pn.annotate(geom = 'text', x = 3, y = .065, color = 'red', label = '4 point difference', size = 15) +
  pn.scale_x_continuous(breaks = range(0, 7), labels = range(0, 7), limits = [0, 6]) +
  pn.scale_y_continuous(limits = [0, .1]) +
  pn.theme_classic() +
  pn.theme(axis_title = pn.element_blank(),
            axis_text_y = pn.element_blank(),
            axis_line_y = pn.element_blank(),
            axis_ticks_y = pn.element_blank(),
            panel_grid_major = pn.element_blank(),
            panel_grid_minor = pn.element_blank(),
            plot_margin = 1,
            figure_size = (6, 1.7))
)

But that’s just one dimension. What happens when we add another? Now, let’s say we have one dot at point (1,1) and the next dot at point (5, 4). Using just our knowledge of numbers, how do we figure out the distance between these two points?

*Image here as well*

The Pythagorean theorem, of course! It says that the distance between these two points is equal to the square root of the sum of the squares of each of the two perpendicular sides. That's a lot of words, but basically it’s just saying we can figure out how far it is diagonally by looking at how far it is horizontally and vertically.

*Also an image here*

Alright, awesome! So that’s how we do it when there’s two dimensions. Now how do we do it when there’s five? Remember, every personality has five scores so we have to do it five times. Well thankfully, we can actually extend the Pythagorean formula to as many dimensions as we want! And even if that gets hard to visualize, the math still works.

$$
A^2 + B^2 + C^2 + D^2 + E^2 = F^2
$$

There’s just one, tiny, little issue with this approach and it's that in order for it to work, we need to make an assumption that is very is much not true in personality research. Math like this assumes that our dimensions are orthogonal. It assumes that the intersections of our axes all occur at 90 degree angles. In other words, it assumes that our dimensions are entirely uncorrelated. 

Unfortunately, personality dimensions tend to be correlated.

Why does this matter? Well, imagine that instead of having a graph with two perfectly perpendicular axes, it looked something like this instead:

*And another one here - tilted axes*

Here, as you go up on the X-axis, you’re ALSO starting to go up on the Y-axis! This is what happens when our dimensions correlate and this breaks the Pythagorean theorem. In order for the theorem to work, we need to know the coordinates from perpendicular axes, otherwise we can't make a right triangle! But if our dimensions correlate, our axes aren’t perpendicular and that means our math won’t work.

So. What can we do?

Luckily, once again, a statistician has saved the day. It was back in 1936 that a man by the name P. C. Mahalanobis kept running into this very problem and decided to try and find a way to fix it. What he figured out was that if you know the correlation between two measures, you can correct the axes using that information! To put that in terms of the visual, if we know that the X-axis is being tilted up by 1 degree, we can just adjust all of our numbers down by 1 degree to put them back in their proper place.

*Image here*

It takes a lot of fancy matrix algebra, but the important part here is this: all we have to do is figure out how much our axes are tilting (or how much they’re correlating), then we can correct the numbers and solve the distance equation once again!

Decades of research at this point have established a pretty reliable correlation table (ok, the matrix algebra actually uses a covariance table but it's basically the same idea) between the dimensions of our Big 5 personality traits, so we can use that information to correct our axes!

And finally we’ve assembled all the pieces we need:

- A way to put personality into numbers
- A way to “un-tilt” axes with their correlations
- A way to calculate distances with our un-tilted axes

And with all of that, we can figure out how “close” or “far” your personality is from any other that you can dream up! Then it’s just a matter of finding which number is the lowest and that’s the one that’s most like you.