Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The average ward in Milwaukee has 750 votes, how would Biden have 100-200 in 30% of wards? #36

Open
dlb8685 opened this issue Nov 10, 2020 · 14 comments

Comments

@dlb8685
Copy link

dlb8685 commented Nov 10, 2020

This repo's use of Benford's Law is so misleading that it discredits other claims of fraud more generally.

  • If you pull the data from Milwaukee city, the average ward has 755 votes. Biden wins an average of 595 votes per ward. Obviously if this is true, his first-digit distribution is going to be skewed towards the 4, 5, 6 range.
  • Only 20.5% of wards had over 1,000 votes, and 2.1% of wards had between 100-200 votes. These are the only wards where Biden would even have a chance to get to 1___ votes.
  • It's laughably easy to produce these kinds of anomalies with political data. 65.6% of main-party candidates in the 2018 House elections had a vote total starting with 1. Massive fraud? No, it's because the average congressional district had 264 thousand votes, and in most races one or both of the candidates had 100,000-something votes.

2020 Milwaukee Data
2018 House Election Data

If the size of the district you're looking at is the same across many different races, the results will skew towards something completely different from Benford's Law, absent any fraud whatsoever. Thus, these results from Milwaukee or any other place provide no evidence of election fraud.

@ghost
Copy link

ghost commented Nov 10, 2020

Yes. If you look around in the threads, everyone agrees with you. We have moved beyond first digits and tried to look at second digits, but this is also questionable for several reasons.

There's also a few threads that search for other statistical anomalies in the election data.

The README.md file should be updated.

@paras20xx
Copy link

Some more graphs are generated at https://github.com/paras20xx/benford-law-2020-us-election#milwaukee-wisconsin-sample-size-478 with different bases.

For other bases (3 to 10) there are odd patterns for Milwaukee (Wisconsin) and Chicago (Illinois).

For Washington, the pattern when checked through multiple bases seems fine.

For various other locations, graphs are there, but the sample size is not big enough.

@markr-github
Copy link

For Washington, the pattern when checked through multiple bases seems fine.

What do you mean by "seems fine"?

@paras20xx
Copy link

paras20xx commented Nov 10, 2020

@markr-github

If you go to https://github.com/paras20xx/benford-law-2020-us-election/tree/main/dump/washington/vote-count and see all the graphs, the pattern for Washington seems fine (as in no out-of-ordinary patterns).

For other locations, like: Chicago, Milwaukee and San Francisco, the data has discrepancy across different bases (Expand the graphs at https://github.com/paras20xx/benford-law-2020-us-election).

@ghost
Copy link

ghost commented Nov 11, 2020

@paras20xx
Please see the first post in this thread for an explanation to the "anomalies": #9

In addition:
#36
#35
#30
#17
#16
#11

@markr-github
Copy link

@paras20xx
I think you're referring to how Biden's numbers don't follow Benford's when you look at similar-sized precincts in areas that he won. You'd need to show this is "out-of-ordinary", when it's what I'd expect from competitive races when most precincts are around 500. See the response by @testes-t

@markr-github
Copy link

Further UK results, showing how in the 2015 general election the Conservative votes don't follow Benford's rule, and in the 2016 Brexit referendum neither side did. This is just what close election results with similar-sized voting areas look like and is not evidence of fraud by Conservatives or pro-Brexit people.

UK_2015

UK_brexit_2016

@ghost
Copy link

ghost commented Nov 11, 2020

It's the same story with second-digit analysis, except perhaps in a less pronounced way. Benford's Law should not be expected to be valid whenever the numbers follow a distribution with an expected value above 0 (or, for non-zero values, above 1). For instance, if the expected value is 15 +/- 1, then the second digit will always be 4, 5 or 6. And if the variance is larger, there will still be an overrepresentation of fives.

Next, one can ask what sometimes causes the expected value to be above zero, and at other times at zero. That's a funny philosophical question. For the case of elections with two candidates, if N is the number of votes cast, then candidate 1 will get x votes and candidate 2 will get (N-x) votes. There is thus a contingency between the two, which means that the distribution of votes cast for each candidate cannot just take on any form independently of the other candidate.

@charlesmartin14
Copy link

charlesmartin14 commented Nov 11, 2020

@dlb8685 These are good insights, thanks. I don't think a lot of people here have thought much about election data until about a week ago. These things take time to dig into, and people have full-time jobs and families, so it really takes a group effort.

Please see #31 A deeper dive shows some funny results that could use some insights

Specifically, why is the Biden Election Day vote distribution in Allegheny County, PA so nearly Gaussian, and , specifically, why in the districts that Trump won.

This case seems of particular interest since there are claims of voter fraud in Allegheny County, PA

https://www.post-gazette.com/news/politics-state/2020/11/09/presidential-election-2020-Trump-Biden-lawsuit-voter-fraud-Pennsylvania-poll-watchers/stories/202011090110

@markr-github Your insights are also very helpful, Let me ask, do you see near-gaussian behavior typical of other elections you have looked at when Benford's Law is violated, (and where there is enough data so that the tail can be seen)?

the distribution of votes cast for each candidate cannot just take on any form independently of the other candidate.

@testes-t That's a very good point

@flying-sheep
Copy link

flying-sheep commented Nov 11, 2020

Gaussian is expected if most vote counts are between 10^x and 10^(x+1), as that makes the first-digit histogram degrade to a histogram of the data itself.

E.g. if we have most vote counts between 100 and 1000, “first digit = 1” basically means “100 ≤ vote count < 200” and so on.

The same applies for other bases, the key is that the variance isn’t high enough.

@charlesmartin14
Copy link

charlesmartin14 commented Nov 11, 2020

@flying-sheep Yes, but I'm am getting at something else.

To me, it's not the vote counts that matter, it's the size of the districts where the vote counts can occur. When the districts are small, like in voting, we can't always apply Benford's Lqw. But I suspect we can stil say something.

In the case I show, the Trump data is heavy-tailed, both in 2020 and in 2016, indicating that there are voting districts large enough to support a large number of votes for either candidate. Moreover, the 2016 Hilary counts show that there are districts with enough democrats to have large vote counts. (yes, in 2016, the dem mean and variance are higher). But what does this mean ? -- for some reason, in 2020, Biden shows a historically unusually small number of high turnout in these larger districts.

Why are the Bident Election Day vote counts what they are ?

A naive ballot-stuffer might select a random number of ballots to stuff for each district, to make it look natural. But I am saying that natural distributions usually look like a random Pareto distribution, not a random Gaussian distribution. That's why Benford works in other cases. And that's the red flag-- not enough large voter turnout for Biden anywhere in the data.

Now maybe the option for mail-in balloting changed things so much it looks odd like this. But it did not change the data in, say, Chicago. Biden's Chicago data is non-Benford, but it is heavy-tailed / Pareto.

@charlesmartin14
Copy link

charlesmartin14 commented Nov 12, 2020

@flying-sheep see #31. The other issue

Why does the Biden Election day data show a high mean in districts where Trump won ?

That is, it seems that the Biden data in the Trump-winning districts should look Benford-like
Instead, the data looks Gaussian

@markr-github
Copy link

@charlesmartin14 I don't get it, don't you need to show that the numbers should follow Benford before there's any reason to be suspicious about wards in Pittsburgh or Milwaukee returning a reasonable number of Democratic votes?

This is basically saying: there are not many urban districts where Trump blew out Biden with 60-80 % of the vote. That sounds like reality.

Here are the Brexit results split by "Leave" and "Remain" voting areas. Both options were pretty competitive in most areas.

UK_brexit_2016_by_winner

@charlesmartin14
Copy link

charlesmartin14 commented Nov 12, 2020

@markr-github I would say "could follow Benford ..." And it when it could, it should.

So we need to test our test to make sure we can apply it

I'm saying is that natural data should be heavy-tailed. Benford's Law is 1 test for that. It's not the only test. And not being Benford is not enough to stop testing.

These are the tests I would take:

test 1: could it be Benford ?
In some cases, the districts may be so small that it is impossible for the data to be Benford.
This should be added to the code to check--is Benford's Law even a reasonable test here ?

And this needs to be tested on the size of the voting districts and /or the # of registered voters. But not on the observed counts. The whole point is that we suspect the vote counts are fraudulent, so we can't test test on suspect data.

(Many of the criticisms of this analysis/repo, which is quite well known now, are valid because the test is being applied in cases it should not. )

test 2 could it be Heavy-Tailed ?
So I would then test to see that the data could be heavy-tailed, or are the bins (voting districts) so small that the data will always look say Poisson . That's what I have done here, by comparing Biden's data to both Trump's and HIlary's 2016 data. I could probably do a better job here but I'd like to flush out the logic first.

applying the tests. when it could, it should.
Finally, if the data can be Benford, or at least heavy-tailed, I then argue, when it could, it should.
Here, that means, the data should be heavy-tailed, and / or not perfectly (or nearly) Gaussian

(Perfect anything is odd in real-world data. In studies of the Russian elections, the data is so perfect that researchers have suggested the Russians gamed it to fool them)

testing the vote distributions

  • check the tail: Is it heavy *(power-law or log normal), or is exponential. Again, there has to be enough data in the tail for the tests to make sense. If the data is not heavy-tailed, that may ve a red flag.
  • check the bulk: Fit it to a Poisson or Gaussian distribution, and check the quality if the fit. if the fit is unusually good, that may also be red flag.

Notice I say may. This needs to be done in a lot of cases, exactly like what you are doing.

final smell test. Am I crazy ?
and then we need to look at the details and ask, does this result pass a smell test. That is, are our suspicions reasonable, or do they stink ?

BTW, here are some motivating references:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants