# Big Data, Black Boxes, and Bias
### Dr. David Elliott

1.1. [Introduction](#intro)

1.2. [Big Data and Bias](#bdb)

1.3. [Algorithmic Ethics](#ethics)

1.4. [Black Boxes](#blackbox)

1.5. [Fixing Bias](#solutions)

1.6. [Ethical Algorithms](#ethics_again)

__Notes__
- _"The White House report “Preparing for the Future of Artificial Intelligence” highlights the need for training in both ethics and security: Ethical training for AI practitioners and students is a necessary part of the solution.""_<sup>14</sup> 
- _"Ideally, every student learning AI, computer science, or data science would be exposed to curriculum and discussion on related ethics and security topics...ethical training should be augmented with technical tools and methods for putting good intentions into practice..."_<sup>14</sup>

__TODO__
- Have a look through what to use from ethics week in intro to data science

# Introduction <a id='intro'></a>

> The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.

July 8, 1958, The New York Times

The embryo in question is a perceptron, a simple logical circuit designed to mimic a biological neuron.

It takes a set of numerical values as inputs, and then spits out either a 0 or a 1.

![Image from Python Machine Learning](./Images/perceptron.jpg)

__Notes__
> - The inventor of the perceptron, Frank Rosenblatt, was a psychologist by training, with broad interests in astronomy and neurobiology. 
>    - He used a two-million-dollar IBM 704 computer to simulate his first perceptron. 
>    - He also had a knack for selling big ideas and described his work in grandiose terms. 
>    - His machine, he told The New York Times, would think the way that humans do and learn from experience. Someday, he predicted, his perceptrons would be able to recognize faces and translate speech in real time. Perceptrons would be able to assemble other perceptrons, and perhaps even attain consciousness. Someday they could become our eyes and ears in the universe, sent out beyond the bounds of Earth to explore the planets and stars on our behalf.<sup>2</sup>

Connect enough of these perceptrons together in the right ways, and you can build:

- a chess-playing computer, 
- a self-driving car, 
- an algorithm that translates speech. 

Though the computer hardware is vastly more powerful, the basic approach remains similar to how it was a half century ago.

The hype hasn’t diminished<sup>2<sup>.

> ...will make possible a new generation of artificial intelligence [AI] systems that will perform some functions that humans do with ease: see, speak, listen, navigate, manipulate and control.

December 28, 2013, The New York Times

__Notes__
- Hype Machine:
    - Newspapers gush about the latest breakthrough. 
    - AI jobs are paying superstar salaries. 
    - Tech firms are wooing away from campus professors with AI expertise. 
    - Venture capital firms are throwing money at anyone who can say "deep learning" with a straight face.

Advances in AI are great and are spurring a lot of economic activity. However there is currently unreasonable expectations, which drives<sup>2<sup>:
- irresponsible research in both industry and academia, 
- threats to personal privacy,
- motivates misdirected policy. 
    
> "Policy makers [are] earnestly having meetings to discuss the rights of robots when they should be talking about discrimination in algorithmic decision making." 

Zachary Lipton, AI researcher at Carnegie Mellon University

__Notes__
- _"Researchers and technologists spend far too much time focusing on the sexy what-might-be, and far too little time on the important what-is."_<sup>2<sup>

_"[AI poses a] fundamental risk to the existence of human civilization."_ Elon Musk, 2017

Compared to the human brain, machine learning isn’t especially efficient. 

A machine learning program requires millions or billions of data points to create its statistical models. 

Its only now those petabytes of data are now readily available, along with powerful computers to process them<sup>13</sup>.

__Notes__
- _"There is a vast gulf between AI alarmism in the popular press, and the reality of where AI research actually stands."_<sup>2</sup>
- A child can learn that hobs (stoves) are hot by touching it once, connects the hot metal and her throbbing hand, and picks up the word for it: burn.

__Extra: Facebook Inventing Skynet__

_"AI Is Inventing Languages Humans Can’t Understand. Should We Stop It?"_ Fast Company article

> BOB THE BOT: "I can can I I everything else."
>
> ALICE THE BOT: "Balls have zero to me to me to me to me to me to me to me to me to."
>
> BOB: "You I everything else."
>
> ALICE: "Balls have a ball to me to me to me to me to me to me to me to me."

The original Facebook blog post simply described a chatbot evolving the repetition of nonsensical sentences, which was dramatically distorted to a story about saving the human race. 

_"There was no panic,"_ one researcher said, _"and the project hasn’t been shut down."_ 

__Notes__
- _"The story described a Facebook research project gone awry. While trying to build a chatbot that could carry on a convincing conversation, researchers tried having computer algorithms train one another to speak. But the speech that the algorithms developed was nothing like human language. Fast Company reported that the researchers quickly shut down the project. Skynet was officially on its way to self-awareness, but disaster had been averted—or so the story, and many others like it, suggested."_<sup>2</sup>

For many jobs, machine learning proves to be more flexible and nuanced than the traditional programs governed by rules<sup>13</sup>.

Rosenblatt deserves credit because many of his ambitious predictions have come true:

- Facial recognition technology, 
- virtual assistants, 
- machine translation systems, 
- stock-trading bots 

...are all built using perceptron-like algorithms<sup>2</sup>.

Most of the recent breakthroughs in machine learning are due to the masses of data available and the processing power to deal with it, rather than a fundamentally different approach.

__Notes__
- Perceptrons used in todays deep learning models no longer reflect human biology, they are just inspired by it.

# Big Data and Bias <a id='bdb'></a>

We live in an ever increasing quantified world, where everything is counted, measured, and analyzed<sup>2</sup>:

- Smartphones count our steps, measure our calls, and trace our movements. 
- "Smart appliances" monitor use and learn about daily routines. 
- Implanted medical devices continuously collect data and predict emergencies. 
- Sensors and cameras are across our cities monitoring traffic, air quality, and pedestrian identities.

We've also moved from companies paying customers to complete surveys to them recording what we do<sup>2</sup>.

__Notes__

- _"Data collection is a big business. Data is valuable: “the new oil,” as the Economist proclaimed. We’ve known that for some time. But the public provides the data under the assumption that we, the public, benefit from it. We also assume that data is collected and stored responsibly, and those who supply the data won’t be harmed."_<sup>14</sup>

- What do they know<sup>2</sup>?
> - Facebook knows whom we know; Google knows what we want to know. 
> - Uber knows where we want to go; Amazon knows what we want to buy. 
> - Match knows whom we want to marry; Tinder knows whom we want to be swiped by.

- Mathematicians and statisticians use this data to study our desires, movements, and spending power. 

- They predict our trustworthiness and calculating our potential as students, workers, lovers, and criminals. 

This is the _"Big Data economy"_, and it promises spectacular gains. 

Algorithms not only save time and money but are _"fair"_ and _"objective"_.

Numbers and data suggest precision and imply a scientific approach, appearing to have an existence separate from the humans reporting them.

![image info](./Images/Calvin_Hobbes_Data_Quality.gif)

__Notes__

- Models don't involve prejudiced humans, just machines processing cold numbers right?<sup>13</sup>.

- Numbers feel objective, but are easily manipulated. 

- _"It’s like the old joke:<sup>2</sup>_
    > A mathematician, an engineer, and an accountant are applying for a job. They are led to the interview room and given a math quiz. The first problem is a warm-up: What is 2 + 2? The mathematician rolls her eyes, writes the numeral 4, and moves on. The engineer pauses for a moment, then writes “Approximately 4.” The accountant looks around nervously, then gets out of his chair and walks over to the fellow administering the test. “Before I put anything in writing,” he says in a low whisper, “what do you want it to be?

Algorithms can go wrong and be damaging just due to simple human incompetence or malfeasance.

But also the fault could be: 
- Training Data
- Result Interpretation
- Algorithms Design Principles

In these (worryingly common) cases, it does not matter how expertly and carefully the algorithms are implimented.

__Notes__
- _"Society’s most influential algorithms—from Google search and Facebook’s News Feed to credit scoring and health risk assessment algorithms—are generally developed by highly trained scientists and engineers who are carefully applying well-understood design principles."_<sup>15</sup>

_"No algorithm, no matter how logically sound, can overcome flawed training data."_<sup>2</sup>

Good training data is difficult and expensive to obtain. 

Training data often comes from the real world, but the real world is full of human errors and biases. 

__Notes__
- _"For various reasons, the glamorous side of machine learning research involves developing new algorithms or tweaking old ones. But what is more sorely needed is research on how to select appropriate, representative data. Advances in that domain would pay rich dividends."_<sup>2</sup>

## Sampling Error

As exact counts and exhaustive measurements are nearly always impossible, we take small samples of a larger group and using that information to make broader inferences.

__Example__<sup>2</sup>

_"If one measured only a half dozen men and took their average height, it would be easy to get a misleading estimate simply by chance. Perhaps you sampled a few unusually tall guys. Fortunately, with large samples things tend to average out, and sampling error will have a minimal effect on the outcome."_

## Measurement Error

This is more of a systematic error caused by or measurement method.

__Example__<sup>2</sup>

_"Researchers might ask subjects to report their own heights, but men commonly exaggerate their heights—and shorter men exaggerate more than taller men."_

> The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. 
>
> Donald Campbell

The algorithms these measures are used as input for, in turn, can modify behaviour.

__Example__<sup>13</sup>

Standardized testing can be valuable indicators of general school achievement under normal teaching conditions. 

But when test scores become the _goal_ of teaching, they both lose their value as indicators and distort the educational process.

__Notes__
- ... in other words, when a measure becomes a target, it ceases to be a good measure.
- [INSERT NOTES ON STARDISED TESTING]

## Selection Bias

Selection bias arises when sampled individuals differ systematically from the population eligible for your study.

__Example__<sup>2</sup>

_"Suppose you decide to estimate people’s heights by going to the local basketball court and measuring the players. Basketball players are probably taller than average, so your sample will not be representative of the population as a whole, and as a result your estimate of average height will be too high."_

What you see depends on where you look

__Example__<sup>2</sup>

_"People turn to Google when looking for help, and turn to Facebook to boast."_

![image info](./Images/husband_fb.png)

![image info](./Images/husband_gl.png) 

## Outliers

Outliers can significantly skew data. 

In some data they are naturally part of what you are measuring, but need to be intepreted appropriately and accounted for.

__Example__<sup>3</sup>

When analyzing income in the United States, there are a few extremely wealthy individuals whose income can influence the average income. For this reason, a median value is often a more accurate representation of the larger population.

When mistakes appear in data even the best-designed algorithms will make the wrong decision.

What happens if you are the outlier, be this a mistake or not?

Statisticians count on large numbers to balance out exceptions and anomalies in data, but that means they punish individuals who happen to be the exception<sup>13</sup>
    
- Computer-generated terrorism no-fly lists are rife with errors. 
- The Federal Trade Commission reported in 2013 that ten million people had an error on one of their credit reports serious enough to result in higher borrowing costs.

__Todo__
- link this quote to the idea that statisticians say the law of big numebers should average this out... but what happens if you are the exception and the algorithm rules against you? What if you war the one with the mistake!

__Notes__
- _"The insights we can get from this unprecedented access to data can be a great thing: we can get new understanding about how our society works, and improve public health, municipal services, and consumer products. But as individuals, we aren’t just the recipients of the fruits of this data analysis: we are the data, and it is being used to make decisions about us—sometimes very consequential decisions."_<sup>15</sup>

## Cognitive bias
    
There are tonnes of human biases that have been defined and classified by psychologists; each affecting individual decision making.

These include feelings towards a person based on their perceived group membership. 

These biases could seep into machine learning algorithms via either<sup>4</sup>:
- designers unknowingly introducing them to the model
- a training data set which includes those biases

# Algorithmic Ethics <a id='ethics'></a>

_"Machines are not free of human biases; they perpetuate them, depending on the data they’re fed."_<sup>2</sup>

Despite appearing impartial, models reflect goals and ideology<sup>13</sup>.

Our values and desires influence our choices, from the data we choose to collect to the questions we ask<sup>13</sup>.

Whether or not a model works is a matter of opinion. A key component of every model, whether formal or informal, is its definition of success<sup>13</sup>.

__Notes__
- _"When we train machines to make decisions based on data that arise in a biased society, the machines learn and perpetuate those same biases."_<sup>2</sup>

__Todo__
- talk about the work from the researchers fired from Google

As good training data is hard to come by, it is often the case we lack the data for the behaviors they’re most interested in classifying/predicting. Therefore proxies are used instead. 

However, proxies are easier to manipulate than the complicated reality they represent<sup>13</sup>.

__Example__

TEACHING

In many places around the world we now have machine learning models traned on proxies that have direct and real impact on people’s lives.

All this raises questions of privacy and fairness, safety, transparency, accountability, and even morality.<sup>15</sup>

## Machine injustice

We may want to develop a model that can predict whether someone will pay back a loan or handle a job. 

As this is a prediction about something that may happen in the future we don't know the outcome yet, so we may be tempted to  include factors such as a person’s _postcode_ or _language patterns_. 

Even if we do not use "race" as a varible in our models, as our society is largely segregated by geography, this is a highly effective proxy for race<sup>13</sup>.

__Notes__
- These features can be discriminatory, and some of them are illegal<sup>13</sup>.

### Examples

__Criminal Sentencing:__ Algorithms identify black defendants as "future" criminals at nearly twice the rate as white defendants, which leads to differences in pretrial release, sentencing, and parole deals<sup>2, ProPublica</sup>.

__Deployment of Police Officers:__ 

__Interest Rates:__ Algorithmic lenders charge higher interest rates to both black and Latino applicants<sup>2, REF</sup>

__Hiring Software__: Automated hiring software have preferentially selected men over women<sup>2,AMAZON</sup>

__College (University) Admissions:__<sup>2</sup>

#### Extra

#### Criminal Sentencing

_"racism is the most slovenly of predictive models. It is powered by haphazard data gathering and spurious correlations, reinforced by institutional inequities, and polluted by confirmation bias."_<sup>13</sup>

_"The question, however, is whether we’ve eliminated human bias or simply camouflaged it with technology. The new recidivism models are complicated and mathematical. But embedded within these models are a host of assumptions, some of them prejudicial."_<sup>13</sup>

_"This is the basis of our legal system. We are judged by what we do, not by who we are. And although we don’t know the exact weights that are attached to these parts of the test, any weight above zero is unreasonable."_<sup>13</sup>

_"sentencing models that profile a person by his or her circumstances help to create the environment that justifies their assumptions."_<sup>13</sup>

_"The penal system is teeming with data, especially since convicts enjoy even fewer privacy rights than the rest of us. What’s more, the system is so miserable, overcrowded, inefficient, expensive, and inhumane that it’s crying out for improvements. Who wouldn’t want a cheap solution like this?"_<sup>13</sup>

#### Deployment of Police Officers

If the algorithms were trainined on white collar crimes they would focus on very different areas of their community.

_[TODO: More on this]_

_"police make choices about where they direct their attention. Today they focus almost exclusively on the poor... And now data scientists are stitching this status quo of the social order into models...we criminalize poverty, believing all the while that our tools are not only scientific but fair."_<sup>13</sup>

#### Interest Rates

_"attempting to reduce human behavior, performance, and potential to algorithms is no easy job."_<sup>13</sup>

#### Hiring Software
_"If you remove names from résumés as a way of eliminating gender discrimination, you may be disappointed, as Amazon was, when the machine continues to preferentially choose men over women. Why? Amazon trained the algorithm on its existing résumés, and there are features on a résumé besides a name that can reveal one’s gender—such as a degree from a women’s college, membership in a women’s professional organization, or a hobby with skewed gender representation."_<sup>2</sup>

#### College (University) Admissions

So they do still affect the _"rich and middle class"_, although this is generally less common. More often the privileged, are processed more by people, and the rest by machines<sup>13</sup>.

### Weapons of Math Destruction (WMD's)<sup>13</sup>

WMD's, as defined by Cathy O'Neil, have three elements: Opacity, Scale, and Damage.

__Opacity__

_"WMDs are, by design, inscrutable black boxes. That makes it extra hard to definitively answer the second question: Does the model work against the subject’s interest? In short, is it unfair? Does it damage or destroy lives?"_<sup>13</sup>

Assumptions of these models are hidden by math, complicated code, or _"proprietary"_ licences, so go untested and unquestioned.

Its hard to question the output, and as it uses math, human victims are held to a high standard of evidence.

__Scale__

_"A formula...might be perfectly innocuous in theory. But if it grows to become a national or global standard, it creates its own distorted and dystopian economy."_<sup>13</sup>

__Damage__

_"They define their own reality and use it to justify their results. This type of model is self-perpetuating, highly destructive—and very common."_<sup>13</sup>

Results from these models are often taken as fact.

They often feed into a viscious cycle creating a feedback loop that makes the model appear reliable and sustain its use.

__Notes__

- Many models encode human prejudice, misunderstanding, and bias into the software systems. 
- These mathematical models are opaque, their workings invisible to all but mathematicians and computer scientists*. 
- Their verdicts, even when wrong or harmful, beyond dispute or appeal. 
- They tend to punish the poor and the oppressed in our society, while making the rich richer<sup>13</sup>.
- Algorithms are made to favor efficiency not fairness - thats hard to quantify.
- _"the real world, with all of its messiness, sits apart. The inclination is to replace people with data trails, turning them into more effective shoppers, voters, or workers to optimize some objective. This is easy to do, and to justify, when success comes back as an anonymous score and when the people affected remain every bit as abstract as the numbers dancing across the screen."_<sup>13</sup>
- _"The move toward the individual, as we’ll see, is embryonic. But already insurers are using data to divide us into smaller tribes, to offer us different products and services at varying prices. Some might call this customized service. The trouble is, it’s not individual. The models place us into groups we cannot see, whose behavior appears to resemble ours. Regardless of the quality of the analysis, its opacity can lead to gouging."_<sup>13</sup>
- _"oceans of behavioral data, in coming years, will feed straight into artificial intelligence systems. And these will remain, to human eyes, black boxes. Throughout this process, we will rarely learn about the tribes we “belong” to or why we belong there. In the era of machine intelligence, most of the variables will remain a mystery. Many of those tribes will mutate hour by hour, even minute by minute, as the systems shuttle people from one group to another. After all, the same person acts very differently at 8 a.m. and 8 p.m. These automatic programs will increasingly determine how we are treated by the other machines, the ones that choose the ads we see, set prices for us, line us up for a dermatologist appointment, or map our routes. They will be highly efficient, seemingly arbitrary, and utterly unaccountable. No one will understand their logic or be able to explain it."_<sup>13</sup>
- Its easy to loose sight of the impact on people who become errors, they are _"collateral damage"_.

*even then they are often deliberately made to be hard/impossible to understand!

# Black Boxes <a id='blackbox'></a>

> "To disarm WMDs, we...need to measure their impact and conduct algorithmic audits. The first step, before digging into the software code, is to carry out research. We’d begin by treating the WMD as a black box that takes in data and spits out conclusions."
>
> Cathy O'Neil

Most often, problems arise either because there are biases in the data, or because there are obvious problems with the output or its interpretation. 

Only ocasionally the technical details of the black box matter to spot issues.

![image info](./Images/black_box.png)

__Notes__
- Any black box has to take in data and spit results out.
- Of course the technical details of the black box matter if you want to develop models, here we are talking about spotting issues!
- Here was are focusing on the black box being an ML model but they can take a variety of forms<sup>2</sup>:
    - _"According to Latour, scientific claims are typically built upon the output of metaphorical “black boxes,” which are difficult if not impossible for the reader to penetrate. These black boxes often involve the use of specialized and often expensive equipment and techniques that are time-consuming and unavailable, or are so broadly accepted that to question them represents a sort of scientific heresy."_

## Training Data

> "On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
Charles Babbage

As data is so central to these systems, to spot problems we can start by looking at the training data and the labels. 

Begin with bad data and labels, and you’ll get a bad program that makes bad predictions in return. 

GIGO: garbage in, garbage out.

__Check:__ Is the data unbiased, reasonable, and relevant to the problem at hand?<sup>2</sup>

__Notes__
- _"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter.One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out."_<sup>2</sup>

## Outputs/Interpretation

_"extraordinary claims require extraordinary evidence."_<sup>2</sup>

- Do the results pass basic plausibility checks? 
- Do they support whatever conclusions are drawn?

__Logical Checks__<sup>2</sup>
- Reductio Ad Absurdum
- Find Counter Examples
- Deploy a Null Model 

### Reductio Ad Absurdum

[EXAMPLATION]

__Example: Momentous sprint at the 2156 Olympics?__<sup>5</sup>

![image info](./Images/sprint.png)

The regression lines are extrapolated (broken blue and red lines for men and women, respectively) and 95% confidence intervals (dotted black lines) based on the available points are superimposed. The projections intersect just before the 2156 Olympics, when the winning women's 100-metre sprint time of 8.079 s will be faster than the men's at 8.098 s.

> Sir—A. J. Tatem and colleagues calculate that women may outsprint men by the middle of the twenty-second century (Nature 431, 525; 200410.1038/431525a). They omit to mention, however, that (according to their analysis) a far more interesting race should occur in about 2636, when times of _less than zero seconds_ will be recorded. In the intervening 600 years, the authors may wish to address the obvious challenges raised for both time-keeping and the teaching of basic statistics.

Ken Rice, Biostatistics Professor

__Notes__
- _"It may be true that women will someday outsprint men, but this analysis does not provide a compelling argument. The authors’ conclusions were based on an overly simplistic statistical model. As shown above, the researchers fit a straight line through the times for women, and a separate straight line through the times for men. If you use this model to estimate future times, it predicts that women will outsprint men in the year 2156. In that year, the model predicts that women will finish the hundred-meter race in about 8.08 seconds and men will be shortly behind with times of about 8.10 seconds. Of course, both women and men will continue to break records. However, there is something clearly wrong with the model."_<sup>2</sup>
- _"A model may pass all the formal statistical model-fitting tests. But if it does not account for real biology—in this case, the physical limits to how fast any organism can run—we should be careful about what we conclude."_<sup>2</sup>
- A favourite example of mine of "Reductio Ad Absurdum" from the neuroscience litriture is the dead salmon study: [Craig Bennett et al. (2009), Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction](http://prefrontal.org/files/posters/Bennett-Salmon-2009.pdf)

### Find Counter Examples

_"If someone claims that A implies B, find a case in which A is true but B is not."_<sup>2</sup>

__Example__
- Try find a more ML relevent example than in the book

__Extra__<sup>2</sup>

Fermat’s last theorem (more of a conjecture due to lack of proof) was that there are no three distinct integers $a$, $b$, and $c$ such that $a^n + b^n = c^n$ for integer values of $n$ greater than 2. This was attempted to be proved for centries (e.g. Andrew Wiles). 

It was later generalized by eighteenth-century mathematician Leonhard Euler into the sum of powers conjecture: for integers $a, b, c, \ldots, z$ and any integer $n$, if you want numbers $a^n, b^n, c^n, \ldots$, to add to some other number $z^n$, you need at least $n$ terms in the sum. Again time passed with no way of proving or disproving this, until 1966 when two mathematicians used an early computer to run through a huge list of possibilities and found  the counterexample below:

In [2]:
27**5 + 84**5 + 110**5 + 133**5 == 144**5

True

### Deploy a Null Mode

_"The point of a null model is not to accurately model the world, but rather to show that a pattern X, which has been interpreted as evidence of a process Y, could actually have arisen without Y occurring at all"._<sup>2</sup>

__Example__<sup>6</sup>

The following is a plot intended to demonstrate how as we age our physical and cognative abilities decline. 

![image info](./Images/senescence.png)

The figure shows the average speed of world record holders in the men’s 100-meter, 1,500-meter, and 10,000-meter race, with the speeds normalized so that the world record corresponds to a pace of 1.0.<sup>2</sup>

__Notes__
- In other words _"A null model helps us understand what we would observe in a very simple system where not much is going on."_
- Remember, we deployed a null model when looking at ML on imballanced datasets.

_"We might see the same decreasing trend in speed simply as a consequence of sample size, even if runners did not get slower with age."_<sup>2</sup>

More people run competitively in their twenties and thirties than in their seventies and eighties. The more runners you sample from, the faster you expect the __fastest__ time to be.

![image info](./Images/senescence_null.png)

This does not mean that senescence is a myth. What it does mean is that the data Carl plotted do not provide compelling evidence of senescence, because the null model shows the same result without senescence."_<sup>2</sup>

__Notes<sup>2</sup>__
- _"If we were looking at average speed, the sample size wouldn’t matter much. Whether we sampled a hundred, a thousand, or a million runners of a given age, we would expect the average time to be about the same. But if we are looking at the extreme outliers, the sample size matters."_
- _"In this case we can use a computer simulation to create a pretend world where age doesn’t affect running speed. Then we see if we still observe the same downward trend in physical performance simply because there are fewer runners of older ages. The graph in the previous page illustrates what we find."_
-  This does not mean that senescence is a myth, this just means this is not compelling evidence, because the null model shows the same result without senescence.

- Other valid objections include:
    - _"These are world record times set by the world’s best athletes. The performance curve above may not be representative of what happens to the rest of us."_
    - _"...the curve shown does not represent the performance trajectory of any single individual."_
    - _"there may be "cohort effects" operating. Runners who set records in the sixty-five-and-over group trained using different techniques, diets, etc., than runners currently in their twenties. Improved training technology could have an effect on record times as well."_

## Case Study<sup>2,7</sup>

Lets put the ideas into practice on an ML paper.

__Automated Inference on Criminality Using Face Images__<sup>8</sup>

> _Unlike a human examiner/judge, a computer vision algorithm or classifier has absolutely no subjective baggages [sic], having no emotions, no biases whatsoever due to past experience, race, religion, political doctrine, gender, age, etc., no mental fatigue, no preconditioning of a bad sleep or meal. The automated inference on criminality eliminates the variable of meta-accuracy (the competence of the human judge/examiner) all together._<sup>8</sup>

The problem with this new study can be identified in the training data and can be reasoned using a null model.

__Notes__
- The idea criminals are betrayed by their physiognomy is not new (e.g. Cesare Lombroso<sup>9</sup>), and its been debunked for the racist and unscientific bullshit it was before.
- _"Phrenology was a model that relied on pseudoscientific nonsense to make authoritative pronouncements, and for decades it went untested. Big Data can fall into the same trap. Models...can lock people out, even when the “science” inside them is little more than a bundle of untested assumptions."_<sup>13</sup>

![image info](./Images/criminals.jpg)
![image info](./Images/non-criminals.jpg)

__Training Data:__ The criminal faces used to train the algorithm were seldom smiling, whereas the noncriminal faces were usually smiling.

__Null Model:__ Could we get the same result by training a model that only identifies smiling? Most likely.

__Notes__
- I like this example due to it highlighting the same BS psychology studies that were debunked years ago are coming back in parts of the ML litriture. Back then they used "Science" to justify their claims, now these modern researchers use terms such as "Artificial Intelligence", "Big Data", or "Machine Learning", to justify the same garbage in a new package.
- I suggest you watch the section of the lecture where this example is more throughly explored and entertainingly elaberated on: [Calling Bullshit 5.5: Criminal Machine Learning](https://www.youtube.com/watch?v=rga2-d1oi30&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=27)
- Note this study falls under the issue I highlighted earlier in that there is limited access to the models and training data so we kind of have to focus on the inputs and outputs.
- _"Smiling is not a good indicator of whether someone is a criminal or not, but the machine didn’t know it was supposed to be finding signs of criminality. It was just trying to discriminate between two different sets of faces in the training set. The presence or absence of a smile turned out to be a useful signal, because of the way the training images were chosen."_<sup>2</sup>
- An additional example is [ml_sexual_orientation](https://www.callingbullshit.org/case_studies/case_study_ml_sexual_orientation.html)

## When Models Fail

Sometimes we do need to look at the models to figure out what is wrong with them.

A number of ML algorithms create their own rules to make decisions—and these rules often make little sense to humans<sup>2</sup>.

Sometimes these rules can be fooled surprisingly easily:

__Fooling Deep Nets__

https://arxiv.org/pdf/1412.1897.pdf

Sometimes these rules focus on unintended aspects of the training data.

__Example__<sup>10</sup>

Ribeiro et al. (2016) developed an automated method for distinguishing photographs of huskies from wolves. 

By looking at the errors (e.g. where a husky is classified as a wolf), they demonstrated the importance of looking at what information the algorithm was using.

![image info](./Images/husky_wolf.jpg)

Wolf images tended to be shot in the snow.

![image info](./Images/explain.jpg)

__Notes__
- This model would not generalise very well!

__Extra Example__

_"John Zech and colleagues at California Pacific Medical Center wanted to investigate how well neural networks could detect pathologies such as pneumonia and cardiomegaly—enlarged heart—using X-ray images. The team found that their algorithms performed relatively well in hospitals where they were trained, but poorly elsewhere....It turned out that the machine was cueing on parts of the images that had nothing to do with the heart or lungs. For example, X-ray images produced by a portable imaging device had the word PORTABLE printed in the upper right corner—and the algorithm learned that this is a good indicator that a patient has pneumonia. Why? Because portable X-ray machines are used for the most severely ill patients, who cannot easily go to the radiology department of the hospital. Using this cue improved prediction in the original hospital. But it was of little practical value. It had little to do with identifying pneumonia, didn’t cue in on anything doctors didn’t already know, and wouldn’t work in a different hospital that used a different type of portable X-ray machine."_<sup>2</sup>

### Overfitting

Complicated models do a great job of fitting the training data, but simpler models often perform better on the test data.

The hard part is figuring out just how simple of a model to use.<sup>2</sup>

#### Case Study

__Detecting Influenza Epidemics Using Search Engine Query Data__<sup>11</sup>

A method for predicting flu outbreaks based on Google search queries.

![image info](./Images/F2.large.jpg)

GFT overestimated the prevalence of flu in the 2012–2013 season and overshot the actual level in 2011–2012 by more than 50%. From 21 August 2011 to 1 September 2013, GFT reported overly high flu prevalence 100 out of 108 weeks. 

(Top) Estimates of doctor visits for ILI. 

(Bottom) Error (as a percentage) {[Non-CDC estmate)−(CDC estimate)]/(CDC) estimate)}.

__Notes__
- It worked well for a few years but the results started to miss the mark by a factor of two and it was eventually axed.

- _"There was no theory about what search terms constituted relevant predictors of flu, and that left the algorithm highly susceptible to chance correlations in timing."_<sup>2</sup>

- _"When the frequency of search queries changes, the rules that the algorithm learned previously may no longer be effective."_<sup>2</sup>

#### The Curse of Dimensionality<sup>2</sup>

Many complicated algorithms use hundreds of variables when making predictions.

If you add enough variables into your black box, you will eventually find combinations that perform well — but it may do so by chance. 

As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck. 
- Training data costs time and money.

__Notes__
- For an expanded discussion of this, watch [Calling Bullshit 5.3: Big Data Hubris](https://www.youtube.com/watch?v=X0XqnAqvyIk&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=25).
- _"Google Flu Trends relied on forty-five key search queries that best predicted flu outbreaks. 
    - TODO: Give examples of these correlations
- _"A machine learning system designed to detect cancer might look at a thousand different genes."_<sup>2</sup>
- _"If you have ten thousand genes you want to incorporate into your model, good luck in finding the millions of example patients that you will need to have any chance of making reliable predictions."_<sup>2</sup>
- [Here](https://www.tylervigen.com/spurious-correlations) are some fun _"supurious correlations"_

# Fixing bias <a id='solutions'></a>

There are a number of ways we can try mitigate the issues of bias in ML models and pipelines. These include:
- Diversity
- Human-in-the-loop
- Continual Updating
- Transparency
- Accountability
- Leave Information out
- Encode Ethical Principles 

## Diversity

_"A more diverse AI community will be better equipped to __anticipate, spot, and review__ issues of unfair bias and better able to __engage__ communities likely affected by bias."_<sup>16</sup>

A diverse AI community aids in the identification of bias. 

The AI field currently does not encompass society’s diversity, including on gender, race, geography, class, and physical disabilities. 

## Human-in-the-loop

> Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all...Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
>
> Chris Anderson, Wired 2008<sup>17</sup>

"Big data" and "machine learning", in most instances, should compliment rather than supplement work by humans.

For open-ended tasks involving judgment and discretion, there is still no substitute for human intervention.

__TODO__
- Also put in a bit from "Deep Medicine"

__Notes__

- _"Identifying fake news, detecting sarcasm, creating humor—for now, these are areas in which machines fall short of their human creators. However, reading addresses is relatively simple for a computer. The digit classification problem—figuring out whether a printed digit is a one, a two, a three, etc.—is a classic application of machine learning."_<sup>2</sup>

Humans can be seen as throwbacks in the data econemy - inefficient and costly.

Any statistical program has errors, but why not just get humans to work on fine-tuning the algorithms? 

Automatic systems urgently require the context, common sense, and fairness that only humans can provide; esspecially when faced with error-ridden data<sup>13</sup>.

_"Mathematical models should be our tools, not our masters."_<sup>13</sup>

We need human values intergrated into these systems, even at the cost of efficiency.

Big Data processes codify the past, they do not invent the future.

__Notes__

- Machines lack moral imagination; that’s something only humans can provide<sup>13</sup>.
- Human decision making, while often flawed, can evolve. 

## Continual Updating

Human decision making, while often flawed, can evolve. Automated systems stay stuck in time until engineers dive in to change them<sup>13</sup>.

Systems need to learn where they have failed by humans.

Trustworthy models maintain a constant back-and-forth with what they are trying to understand or predict. As conditions change, so must the model<sup>13</sup>.

Operational strategies for businesses can include<sup>16</sup>: 
- improving data collection through more cognizant sampling,
- using internal “red teams” or third parties to audit data and models.

## Transparency

> Algorithmic transparency is the principle that people affected by decision-making algorithms should have a right to know why the algorithms are making the choices that they do.<sup>2</sup>

Transparency about processes and metrics helps us understand the steps taken to promote fairness and associated trade-offs<sup>16</sup>.

Auditors face resistance from the web giants
- Google, for example, has prohibited researchers from creating fake profiles to map the biases of the search engine. If the company does carry out bias audits, they are mostly internal<sup>13</sup>.

Researchers however are moving forward with auditing, such as the "Web Transparency and Accountability Project".

__TODO__
- Link to recent firings

__Notes__
- _"...the Web Transparency and Accountability Project...create software robots that masquerade online as people of all stripes—rich, poor, male, female, or suffering from mental health issues. By studying the treatment these robots receive, the academics can detect biases in automated systems from search engines to job placement sites."_<sup>13</sup>

## Accountability

> Algorithmic accountability is the principle that firms or agencies using algorithms to make decisions are still responsible for those decisions, especially decisions that involve humans. We cannot let people excuse unjust or harmful actions by saying “It wasn’t our decision; it was the algorithm that did that.”<sup>2</sup>
    
If platform companies, app developers, and government agencies don’t care about privacy or fairness, there can be an insentive to ignore transparency without accountability.

__The European Union’s General Data Protection Regulation__
- Any data collected must be approved by the user, as an opt-in. 
- It prohibits the reuse of data for other purposes.

This is an attempt to enforce still-vague social values such as “accountability” and “interpretability” on algorithmic behavior.

The tech industry itself is starting to develop self-regulatory initiatives of various types, such as the Partnership on AI to Benefit People and Society<sup>15</sup>.

__Notes__
- The “not reusable” clause is very strong: it makes it illegal to sell user data. 
- The data brokers in Europe are much more restricted, assuming they follow the law<sup>13</sup>
- Some may say that individual companies do not reap the rewards from more fairness and justice<sup>13</sup>. However there is pressure not just from regulators, but consumers around anti-social algorithmic behaviour<sup>15</sup>.
    - Apple is reaping the rewards from starching its hat white when it compares itself to practices in other companies such as Facebook.

## Leave Information out

Are we willing to sacrifice a bit of efficiency and accuracy in the interest of fairness? Should we handicap the models?

- _"In some cases, yes. If we’re going to be equal before the law, or be treated equally as voters, we cannot stand for systems that drop us into different castes and treat us differently."_<sup>13</sup>

__Example__<sup>13</sup>

_FICO scores_
- Devised in 1989 for credit scoring.
- The formula only looked at a borrower’s finances (mostly debt load and bill-paying record).

Good Qualities
- FICO and the credit agencies can tweak those models to make them more accurate.
- They are regulated
- The scores are relatively transparent.

_e-scores_
- Access data on web browsing, purchasing patterns, location of the visitor’s computer, real estate data, for insights about the potential customers wealth.

Bad Qualities
- arbitrary, unaccountable, unregulated, often unfair.

When you include an attribute such as “zip code,” you express the opinion that the history of human behavior in that patch of real estate should determine, at least in part, what kind of loan a person who lives there should get.

__Notes__
- a mathematician named Earl Isaac an engineer, Bill Fair, devised the FICO model for credit scoring
- _"The score was color blind. And it turned out to be great for the banking industry, because it predicted risk far more accurately while opening the door to millions of new customers"_
- _"Much of the predatory advertising we’ve been discussing, including the ads for payday loans and for-profit colleges, is generated through such e-scores. They’re stand-ins for credit scores. But since companies are legally prohibited from using credit scores for marketing purposes, they make do with this sloppy substitute."_<sup>13</sup>
- _"Fair and Isaac’s great advance was to ditch the proxies in favor of the relevant financial data, like past behavior with respect to paying bills. They focused their analysis on the individual in question—and not on other people with similar attributes. E-scores, by contrast, march us back in time. They analyze the individual through a veritable blizzard of proxies. In a few milliseconds, they carry out thousands of “people like you” calculations. And if enough of these “similar” people turn out to be deadbeats or, worse, criminals, that individual will be treated accordingly."_<sup>13</sup>
- Credit card companies such as Capital One carry out similar rapid-fire calculations as soon as someone shows up on their website.
- There’s a very high chance that the e-scoring system will give the borrower from the rough area a low score.
- _"In other words, the modelers for e-scores have to make do with trying to answer the question “How have people like you behaved in the past?” when ideally they would ask, “How have you behaved in the past?”"_<sup>13</sup>
- _"The practice of using credit scores in hirings and promotions creates a dangerous poverty cycle. After all, if you can’t get a job because of your credit record, that record will likely get worse, making it even harder to land work."_<sup>13</sup>
- _"framing debt as a moral issue is a mistake. Plenty of hardworking and trustworthy people lose jobs every day as companies fail, cut costs, or move jobs offshore. These numbers climb during recessions."_<sup>13</sup>
- _"a sterling credit rating is not just a proxy for responsibility and smart decisions. It is also a proxy for wealth. And wealth is highly correlated with race."_<sup>13</sup>

## Encode Ethical Principles 

_"Programmers don’t know how to code for [fairness], and few of their bosses ask them to."_<sup>13</sup>

We can try to _"...encode ethical principles directly into the design of the algorithms."_<sup>15</sup>

Typically there is a focus on algorithmic trade-offs on performance metrics such as computational speed, memory requirements, and accuracy, but there is emerging research into how "privacy" and "fairness" can be included as metrics when considering algorithms.

FATE—fairness, accuracy, transparency, and ethics—of algorithm design<sup>15</sup>.

Definitions of fairness, privacy, transparency, interpretability, and morality remain in the human domain, and require a multidisciplatory team to collaberate on to define in a quantitative definition.

These new goals can be used as constraints on learning. They have an associated costs:

_"If the most accurate model for predicting loan repayment is racially biased, then, by definition, eradicating that bias results in a less accurate model."_<sup>15</sup>

# Ethical Algorithms <a id='ethics_again'></a>

_"In many fields, ethics is an essential part of professional education. This isn’t true in computer science, data science, artificial intelligence, or any related field."_<sup>14</sup>

_"Ethics really isn’t about agreeing to a set of principles. It’s about changing the way you act."_<sup>14</sup>

_"Signing a data oath, or agreeing to a code of conduct, does little if you don’t live and breathe ethics."_<sup>14</sup>

_"It’s also important to realize that ethics isn’t about a fixed list of do’s and don’ts. It’s primarily about having a discussion about how what you’re doing will affect other people, and whether those effects are acceptable."_<sup>14</sup>

_"The ACM’s code of ethics, which dates back to 1993, and is currently being updated, is clear, concise, and surprisingly forward-thinking; 25 years later, it’s a great start for anyone thinking about ethics. The American Statistical Association has a good set of ethical guidelines for working with data. So, we’re not working in a vacuum."_<sup>14</sup>

_"For data scientists, whether you’re doing classical data analysis or leading-edge AI, that’s a big challenge. We need to understand how to build the software systems that implement fairness."_<sup>14</sup>

_"It’s easy to say that applications shouldn’t collect data about race, gender, disabilities, or other protected classes. But if you don’t gather that data, you will have trouble testing whether your applications are fair to minorities. Machine learning has proven to be very good at figuring its own proxies for race and other classes. Your application wouldn’t be the first system that was unfair despite the best intentions of its developers. Do you keep the data you need to test for fairness in a separate database, with separate access controls?"_<sup>14</sup>

_"We particularly need to think about the unintended consequences of our use of data."_<sup>14</sup>

_"Moving fast and breaking things is unacceptable if we don’t think about the things we are likely to break."_<sup>14</sup>

_"The discussion has helped software developers and data scientists to understand that their work isn’t value-neutral, that their work has real impact, both good and bad, on real people."_<sup>14</sup>

_"The UK Government’s Data Ethics Framework and Data Ethics Workbook is one approach. They isolate seven principles, and link to detailed discussions of each."_<sup>14</sup>

_"Here’s a checklist for people who are working on data projects: ❏ Have we listed how this technology can be attacked or abused? ❏ Have we tested our training data to ensure it is fair and representative? ❏ Have we studied and understood possible sources of bias in our data? ❏ Does our team reflect diversity of opinions, backgrounds, and kinds of thought? ❏ What kind of user consent do we need to collect to use the data? ❏ Do we have a mechanism for gathering consent from users? ❏ Have we explained clearly what users are consenting to? ❏ Do we have a mechanism for redress if people are harmed by the results? ❏ Can we shut down this software in production if it is behaving badly? ❏ Have we tested for fairness with respect to different user groups? ❏ Have we tested for disparate error rates among different user groups? ❏ Do we test and monitor for model drift to ensure our software remains fair over time? ❏ Do we have a plan to protect and secure user data?"_<sup>14</sup>

_"expertise is essential, such as designing better laws or policies, proposing how to improve social agencies to reduce unfairness in the first place, or opining on whether and how to stem labor displacement resulting from technology."_<sup>15</sup>

## Data Collection/Handling

__Todo__
- this should be short but provide links and expand on this in next years lectures

_"point: “treat others’ data as you would have others treat your own data.” However, implementing a golden rule in the actual research and development process is challenging..."_<sup>14</sup>

_"Most Twitter users know that their public tweets are, in fact, public; but many don’t understand that their tweets can be collected and used for research, or even that they are for sale."_<sup>14</sup>

_"Even philanthropic approaches can have unintended and harmful consequences. When, in 2006, AOL released anonymized search data to researchers, it proved possible to “de-anonymize” the data and identify specific users. In 2018, Strava opened up their data to allow users to discover new places to run or bike. Strava didn’t realize that members of the US military were using GPS-enabled wearables, and their activity exposed the locations of bases and patrol routes in Iraq and Afghanistan."_<sup>14</sup>

_"Collecting data that may seem innocuous and combining it with other data sets has real-world implications. Combining data sets frequently gives results that are much more powerful and dangerous than anything you might get from either data set on its own."_<sup>14</sup>

_"It’s easy to argue that Strava shouldn’t have produced this product, or that AOL shouldn’t have released their search data, but that ignores the data’s potential for good. In both cases, well-intentioned data scientists were looking to help others. The problem is that they didn’t think through the consequences and the potential risks."_<sup>14</sup>

_"Many data sets that could provide tremendous benefits remain locked up on servers. Medical data that is fragmented across multiple institutions limits the pace of research...opening up that data to researchers requires careful planning."_<sup>14</sup>

_"if we could consolidate medical data from patients around the world, we could make some significant progress on treating diseases like cancer."_<sup>14</sup>

_"It has been difficult for medical research to reap the fruits of large-scale data science because the relevant data is often highly sensitive individual patient records, which cannot be freely shared."_<sup>15</sup>

_"although birthdate, sex, and zip code could not be used individually to identify particular individuals, in combination they could."_<sup>15</sup>

_"“anonymized data isn’t”—either it isn’t really anonymous or so much of it has been removed that it is no longer data."_<sup>15</sup>

## k-anonymity

_"called k-anonymity, is to redact information from individual records so that no set of characteristics matches just a single data record...The goal of k-anonymity is to make it hard to link insensitive attributes to sensitive attributes. Informally, a released set of records is k-anonymous if any combination of insensitive attributes appearing in the database matches at least k individuals in the released data. There are two main ways to redact information in a table to make it k-anonymous: we can suppress information entirely (that is, not include it at all in the released data), or we can coarsen it (not release the information as precisely as we know it, but instead bucket it)."_<sup>15</sup>

## Differential privacy

_"Differential privacy is a mathematical formalization of the foregoing idea—that we should be comparing what someone might learn from an analysis if any particular person’s data was included in the dataset with what someone might learn if it was not."_<sup>15</sup>

_"The use of randomness in differential privacy is for yet another purpose–namely, to deliberately add noise to computations, in a way that promises that any one person’s data cannot be reverse-engineered from the results. Differential privacy requires that adding or removing the data record of a single individual not change the probability of any outcome by “much”"_<sup>15</sup>

_"since we know the process by which errors have been introduced, we can work backward to deduce approximately the fraction of the population for whom the truthful answer is yes."_<sup>15</sup>

_"...the error introduced in our survey by the randomness that was added for privacy shrinks to zero as we include more and more people in our survey. This is just an instance of what is known as the “law of large numbers” in statistics. Although this protocol is simple, the result is remarkable: we are able to learn what we wanted without incidentally collecting any strongly incriminating information about any single individual in the population... The randomized polling protocol we have just described is old—it is known as randomized response, and it dates to 1965..."_<sup>15</sup>

_"centralized versus local differential privacy is whether the privacy is added on the “server” side (centralized) or the “client” side (local)."_<sup>15</sup>

## Some Hope

_"The challenge for data scientists is to understand the ecosystems they are wading into and to present not just the problems but also their possible solutions."_<sup>13</sup>

_"Sometimes the job of a data scientist is to know when you don’t know enough. As I survey the data economy, I see loads of emerging mathematical models that might be used for good and an equal number that have the potential to be great—if they’re not abused. Consider the work of Mira Bernstein, a slavery sleuth. A Harvard PhD in math, she created a model to scan vast industrial supply chains, like the ones that put together cell phones, sneakers, or SUVs, to find signs of forced labor. She built her slavery model for a nonprofit company called Made in a Free World. Its goal is to use the model to help companies root out the slave-built components in their products...Like many responsible models, the slavery detector does not overreach. It merely points to suspicious places and leaves the last part of the hunt to human beings."_<sup>13</sup>

_"Another model for the common good has emerged in the field of social work. It’s a predictive model that pinpoints households where children are most likely to suffer abuse. The model, developed by Eckerd, a child and family services nonprofit in the southeastern United States, launched in 2013 in Florida’s Hillsborough County, an area encompassing Tampa...It funnels resources to families at risk. "_<sup>13</sup>

_"Technologically, the same artificial intelligence techniques used to detect fake news can be used to get around detectors, leading to an arms race of production and detection that the detectors are unlikely to win."_<sup>2</sup>

# What next?

- If you want more deep learning, learn the keras api for Tensorflow 2.0 or PyTorch.
- Once your happy with `scikit-learn` you may want to look at some related python projects: https://scikit-learn.org/stable/related_projects.html#related-projects

# Recommended Lectures

__("Guest" Lectures in the age of COVID)__
- [Calling Bullshit 5.1: Big Data](https://www.youtube.com/watch?v=FLKzmswqF7E&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=23)
- [Calling Bullshit 5.2: Garbage In, Garbage Out](https://www.youtube.com/watch?v=pcmUdXIJQ74&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=24)
- [Calling Bullshit 5.3: Big Data Hubris](https://www.youtube.com/watch?v=X0XqnAqvyIk&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=25)
- [Calling Bullshit 5.4: Overfitting](https://www.youtube.com/watch?v=pDyB_ufVyIw&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=26)
- [Calling Bullshit 5.5: Criminal Machine Learning](https://www.youtube.com/watch?v=rga2-d1oi30&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=27)
- [Calling Bullshit 5.6: Algorithmic Ethics](https://www.youtube.com/watch?v=4u6HGaXx90A&list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&index=28)


# Recommended Readings

__Reading__<sup>1</sup>

- danah boyd and Kate Crawford (2011) Six Provocations for Big Data. A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society.
- David Lazer et al. (2014) The Parable of Google Flu: Traps in Big Data Analysis. Science 343:1203-1205
- Alyin Caliskan et al. (2017) Semantics derived automatically from language corpora contain human-like biases Science 356:183-186
- Jevin West (2014) How to improve the use of metrics: learn from game theory. Nature 465:871-872

__Supplementary reading__<sup>1</sup>

- West, Jevin D.; Bergstrom, Carl T.. Calling Bullshit (p. 41). Penguin Books Ltd.
- Cathy O'Neil (2016) Weapons of Math Destruction Crown Press.
- Peter Lawrence (2014) The mismeasurement of science. Current Biology 17:R583-585

# References
1. https://www.callingbullshit.org/syllabus.html#Big
2. West, Jevin D.; Bergstrom, Carl T.. Calling Bullshit, Penguin Books Ltd.
3. https://mailchimp.com/resources/data-bias-causes-effects/
4. https://research.aimultiple.com/ai-bias/
5. https://www.nature.com/articles/431525a
6. Bergstrom, Carl T., and Lee Alan Dugatkin. Evolution. 2nd edition. New York: W. W. Norton and Co., 2012, 2016.
7. https://www.callingbullshit.org/case_studies/case_study_criminal_machine_learning.html
8. Wu, X., & Zhang, X. (2016). Automated inference on criminality using face images. arXiv preprint arXiv:1611.04135, 4038-4052.
9. Lombroso, Cesare. L’Uomo Delinquente. 1876.
10. Ribeiro, M. T., S. Singh, and C. Guestrin. “‘Why Should I Trust You?’ Explaining the Predictions of any Classifier.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, August 2016.
11. Ginsberg, J., et al. “Detecting Influenza Epidemics Using Search Engine Query Data.” Nature 457 (2009): 1012–14.
12. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205.
13. O'Neil, Cathy. Weapons of Math Destruction (p. 2). Penguin Books Ltd.
14. Loukides, Mike; Mason, Hilary; Patil, DJ. Ethics and Data Science, O'Reilly Media.
15. Kearns, Michael; Roth, Aaron. The Ethical Algorithm. Oxford University Press.
16. https://www.mckinsey.com/featured-insights/artificial-intelligence/tackling-bias-in-artificial-intelligence-and-in-humans#
17. https://www.wired.com/2008/06/pb-theory/

In [3]:
!jupyter nbconvert 1_Big_Data_Black_Boxes_and_Bullshit.ipynb \
    --to slides \
    --output-dir . \
    --TemplateExporter.exclude_input=True \
    --TemplateExporter.exclude_output_prompt=True \
    --SlidesExporter.reveal_scroll=True

[NbConvertApp] Converting notebook 1_Big_Data_Black_Boxes_and_Bullshit.ipynb to slides
[NbConvertApp] Writing 627200 bytes to 1_Big_Data_Black_Boxes_and_Bullshit.slides.html
