::: {.callout-note collapse="true"}
## Learning Outcomes
* Learn about the ethical dilemmas that data scientists face.
* Know how critique models using contextual knowledge about data. 
:::

## The Problem

<a href="https://apps.chicagotribune.com/news/watchdog/cook-county-property-tax-divide/assessments.html">
<img src = "images/vis_1.png"></img></a>

<a href="https://apps.chicagotribune.com/news/watchdog/cook-county-property-tax-divide/assessments.html">
<img src = "images/vis_2.png"></img></a>


<a href="https://www.clccrul.org/bpnc-v-berrios-facts?rq=berrios">
<img src = "images/vis_3.jpg"></img>
</a>

### Spotlight: Appeals

“Appeals are a good thing,” Thomas Jaconetty, deputy assessor for valuation and appeals, said in an interview. “The goal here is fairness. We made the numbers. We can change them.”

Fairness as equal access: “anyone can appeal” - but that’s not really the case: 

Part of a deeper, institutional pattern, potential corruption


<a href = "https://apps.chicagotribune.com/news/watchdog/cook-county-property-tax-divide/appeals.html"> <img src = "images/vis_4.png"></img>
</a>

<a href = "https://apps.chicagotribune.com/news/watchdog/cook-county-property-tax-divide/appeals.html"> <img src = "images/vis_5.png"></img>
</a>

### Human Impacts


<a href = "https://apps.chicagotribune.com/news/watchdog/cook-county-property-tax-divide/assessments.html"> <img src = "images/vis_6.png"></img>
</a>

### Spotlight: Intersection of Real Estate and Race

- Housing has been a key motor of racial inequality in modern US History
- Segregation and credit-market racism
- Redlining: making it difficult or impossible to get a federally-backed mortgage to buy a house in specific neighborhoods coded as “risky” (red). 
- What made them “risky” according to the makers of these maps? Their racial composition.



<a href = "https://dsl.richmond.edu/panorama/redlining/#loc=11/41.84/-87.674"><img src = "images/vis_7.png"></img></a>




- Segregation was not only a result of federal policy, but developed by real estate professionals 
- Real estate industry “professionalized” in the 1920’s and 1930’s by aspiring to become a science guided by strict methods and principles.

- These methods centered on creating objective rating systems (information technologies) for the appraisal of property values which encoded **race** as a factor of valuation (See figure below) 
    - This, in turn, influenced federal policy and practice

<img src = "images/vis_8.png"></img>
Source: Colin Koopman, How We Became Our Data (2019) p. 137



## The Response

### Example: Cook County Open Data Initiative

CCAO’s mandate under new Assessor, Fritz Kaegi

- Distributional equity in property taxation = properties of same value treated alike during assessments
- Creates new Office of Data Science

<img src = "images/vis_9.png"></img>


### Incorporating Knowledge into the Data Science Life Cycle

#### Question/Problem Formulation

- What do we want to know?
- What problems are we trying to solve?
- What are the hypotheses we want to test?
- What are our metrics for success?

There are many different goals that we as data scientists have in mind when working with data. The goal that we have been focusing on primarily throughout this course until now has been accuracy. In the context of the Housing dataset:

1. Accurately, uniformly, and impartially assess the value of a home.

    - Following international standards (coefficient of dispersion)
    - Predicting value of all homes with as little total error as possible.

However, we will pivot to a slightly different goal. 


2. Create a system that assesses house values that is fair to all people, across perceived racial and income differences.

    - disrupts the circuit of corruption (Board of Review appeals process)
    - Eliminates regressivity
    - Engenders trust in the system among all stakeholders 


:::{.callout-definitions collapse="true"}

**Definitions**: Fairness and Transparency

Fairness: The ability of our pipeline to accurately assess property values, accounting for disparities in geography, information, etc. 

Transparency: The ability of the data science department to share and explain pipeline results and decisions to both internal and external stakeholders

:::



#### Data Acquisition and Cleaning

- What data do we have and what data do we need?
- How will we sample more data?
- Is our data representative of the population we want to study?

Example: Sales data

<img src = "images/vis_10.png"></img>

Rather than taking the data at face value, we can investigate the data further. How was this data collected? When? By whom? For what purposes? How and why were particular categories created?

We can incorporate our domain expertise during this stage by asking questions specific to the values of the features themselves. For the above dataset, a data scientist who is knowledgeable about racial inequities in housing appraisals can ask:  

- Are these attributes differentially reported? If so, how can this arise? 
- How are “improvements” (i.e. renovations) tracked and updated?
- Which data is missing, and for which neighborhoods or populations is it missing? 
- What other data sources might be valuable for addressing the above questions?

#### Exploratory Data Analysis

- How is our data organized and what does it contain?
- Do we already have relevant data?
- What are the biases, anomalies, or other issues with the data?
- How do we transform the data to enable effective analysis?

In order for our claims to hold weight, we must show evidence. Some example questions and directions for investigation are:  

1. Which attributes are most predictive of sales price?
    - Measure correlations 
2. Is the data uniformly distributed? 
    - Plot proportions of data coming from each neighborhood.
3. Do all neighborhoods have up to date data? Do all neighborhoods have the same granularity?  
    - Plot histogram of date values for each neighborhood against each other. 
4. Do some neighborhoods have missing or outdated data? 
    - Count proportions of missing values in each neighborhood.

CCAO noticed that low income neighborhoods had disproportionately spottier data. This informed them that they needed to develop new data collection practices--including finding new sources of data.

#### Prediction and Inference

- What does the data say about the world?
- Does it answer our questions or accurately solve the problem?
- How robust are our conclusions and can we trust the predictions? 

Rather than using a singular model to predict sale prices (“fair market value”) of unsold properties, the CCAO fit machine learning models that discover patterns using known sale prices and characteristics of **similar and nearby properties**. It uses a different model for each township.

Compared to traditional mass appraisal, the CCAO’s approach is more granular and more sensitive to neighborhood variations. 


#### Reports Decisions, and Conclusions

- How successful is the system for each goal?
    - accuracy/uniformity of the model.
    - fairness and transparency that eliminates regressivity and engenders trust.
- How do you know? 




## Key Takeaways

1. Accuracy is a necessary, but not sufficient, condition of a fair system.

2. Fairness and transparency are context-dependent and sociotechnical concepts

3. Learn to work with contexts, and consider how your data analysis will reshape them

4. Keep in mind the power, and limits, of data analysis








## Lessons for Data Science Practice

1. Question/Problem formulation

    - Who is responsible for framing the problem?
    - Who are the stakeholders? How are they involved in the problem framing?
    - What do you bring to the table? How does your positionality affect your understanding of the problem?
    - What are the narratives that you're tapping into? 

2. Data Acquisition and Cleaning

    - Where does the data come from?
    - Who collected it? For what purpose?
    - What kinds of collecting and recording systems and techniques were used? 
    - How has this data been used in the past?
    - What restrictions are there on access to the data, and what enables you to have access?

3. Exploratory Data Analysis & Visualization

    - What kind of personal or group identities have become salient in this data? 
    - Which variables became salient, and what kinds of relationship obtain between them? 
    - Do any of the relationships made visible lend themselves to arguments that might be potentially harmful to a particular community?

4. Prediction and Inference

    - What does the prediction or inference do in the world?
    - Are the results useful for the intended purposes?
    - Are there benchmarks to compare the results?
    - How are your predictions and inferences dependent upon the larger system in which your model works?

5. Reports, Decisions, and Solutions

    - How do we know if we have accomplished our goals?
    - How does your work fit in the broader literature? 
    - Where does your work agree or disagree with the status quo?
    - Do your conclusions make sense?
