# Data 80A/180A Data Science for Everyone

# Homework 4: Data Ethics 

###  51 Points + 5 Extra Credit Points

## Due Friday, September 24 by 11:59PM

**Reading**: 
* [Chapter 6: Tables](https://www.inferentialthinking.com/chapters/06/Tables.html)
* [Chapter 7: Visualization](https://www.inferentialthinking.com/chapters/07/Visualization.html)


**Helpful Resource**:

* [Python Reference](http://data8.org/fa21/python-reference.html): Cheat sheet of helpful array & table methods used in Data 80A/180A!

In Lab 4, we explored social media giant Facebook's users' base and its revenue streams, most of whicb comes from advertising. Have you thought about: by what means do social media platforms, like Facebook, use to target its users? Or what are the roles social media platforms and tech companies play when dealing with your personal information? What have they done and are you ok with it?  

In this homework, we will explore some data ethics issues, including:

* Social media platforms, online shopping influence, data sharing, and data monopoly

* Revenues of top US Tech-related companies and data privacy

* Data breach (data security)

* Data privacy and targeted ads


In [None]:
import pandas as pd
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import otter
grader = otter.Notebook()

## 1. Social Media Platforms, Online Shopping Influence, Data Sharing, and Data Monopoly

Let's take a quick look at a [2017 survey](https://data.world/ahalps/social-influence-on-shopping) of 2,676 millennials with data on social platforms and their influence on online shopping. It will shed some light on social platforms that have the most influence in online shopping.

In [None]:
# read in the survey results
shopping = Table.read_table('WhatsgoodlyData-6.csv')
shopping

**Question 1.1. (2 pts)** Let's focus on rows where `segment_type` is *Mobile* only.  Assign `shopping` to the resulting table.

*Hint:* Table 1 displays the expected output.

In [None]:
# shopping = shopping.take(0,1,2,3,4)
shopping = shopping.where('segment_type', are.containing('Mobile'))  # ...
shopping

Table 1

<img src="Q1.1_table.PNG"> 

**Question 1.2. (4 pts)** Note that the `percentage` column is expressed as a fraction, not percentage.  Change that column to percentage; also relabel `answer` column to `platform`.

*Hint:* Table 2 displays the expected output.

In [None]:
shopping = shopping.relabeled('answer', 'platform') # ...
percentages = shopping.column('percentage')*100  # ...
shopping = shopping.drop('percentage').with_column('percentage', percentages) # ...
shopping

Table 2

<img src="Q1.2_table.PNG"> 

**Question 1.3. (2 pts)**  Generate a bar graph showing the percentageof users, in sorted order, influenced by the various platflorms.

*Hint:* Graph 1 shows the expected output.

In [None]:
shopping.sort('percentage', descending=True).barh('platform', 'percentage')  # ...

Graph 1

<img src="Q1.3_graph.PNG"> 

In Lab 4, we saw that WhatsApp was the single most downloaded app overall in 2020. How would Facebook monetize WhatsApp? Some authors have claimed Facebook will allow users and businesses across the world to have a [payment](https://observer.com/2020/01/whatsapp-ads-facebook-monetization-payment-platform/) system. 

Recently (in 2021), WhatsApp updated their privacy policy, forcing users to agree sharing their data with Facebook (the parent company) in order to keep using the service or have their accounts [deleted](https://www.forbes.com/sites/carlypage/2021/01/08/whatsapp-tells-users-share-your-data-with-facebook-or-well-deactivate-your-account/?sh=1dfae9b92d46). Later, WhatsApp claimed it was not giving all users' [data](https://www.theverge.com/2021/1/12/22226792/whatsapp-privacy-policy-response-signal-telegram-controversy-clarification) to Facebook.

 **Question 1.4 (12 pts)** There is a thin line between "data privacy" and data ethics. 

  (a) Should Facebook let you decide what part of your data that you would like to share?  
  (b) Is it ok that Facebook forces you to share your data or have your account deleted (since it is a free service)?  
  (c) Facebook (and other companies) knows a lot about you.  How do you feel about that?
   Please, share your thoughts in the following cell.

Your answer ...


## 2. Revenues of Top US Tech-related Companies and Data Privacy

 
The article [How Big Tech Makes Their Billions](https://www.visualcapitalist.com/how-big-tech-makes-their-billions-2020/) breaks down the revenue streams of the top US tech companies -- Apple, Google, Amazon, Microsoft, and Facebook -- in 2019. It is an interesting read.  Here we look at the dataset of revenues for these tech companies.

In [None]:
# run this cell
tech_revenue_table = Table.read_table('BigTechRevenue.csv')
tech_revenue_table

**Question 2.1 (2 pts)**  Generate a bar graph (overlaid) showing the revenues of the tech companies.

*Hint:* Graph 2 shows the expected output.

In [None]:
tech_revenue_table.barh('company') # ...

Graph 2

<img src="Q2.1_graph.PNG"> 

**Question 2.2 (2 pts)**  Next, Generate non-overlaid bar graphs showing the revenues of the tech companies.

*Hint:* Graph 3 shows the expected output.

In [None]:
# you only need a single line of code -- set parameter overlay = False
tech_revenue_table.barh('company', overlay = False)      # ...

Graph 3

<img src="Q2.2_graph.PNG"> 

The following is a mobility trend report graph that Apple released to the public. The data is published daily and reflects directions requests from Apple Maps users.

![MobilityTrendReportApple](MobilityTrendReportApple.png)
[Apple Mobility Trends Reports](https://covid19.apple.com/mobility)

Is it ok for Apple to share your data in aggregate form?  The internet is plagued with articles criticizing tech companies like [this one](https://www.theatlantic.com/technology/archive/2019/01/apples-hypocritical-defense-data-privacy/581680/), which accuses Apple of enabling surveillance that supposedly offends its own values on data privacy. Remember, Apple, Google, Facebook, and many more companies have access to users' data. How they use it and what they use it for is a question that we should ask ourselves.  

For example, according to [The Wall Street Journal](https://www.wsj.com/articles/amazon-scooped-up-data-from-its-own-sellers-to-launch-competing-products-11587650015), Amazon has been using data collected from buyers and sellers to launch their products, which contributed to their revenue growth year after year. However, this habit has also led governments to charge Amazon with [antitrust](https://www.washingtonpost.com/world/amazon-antitrust-charges-europe-eu-vestager/2020/11/10/88df3fca-233c-11eb-9c4a-0dc6242c4814_story.html) practices, alleging "Amazon uses the vast pool of information it gathers from its marketplace platform to identify popular products being sold by outside vendors on its website, then offers similar products itself, sometimes at lower prices."

Interested in knowing more about what [Alphabet (Google)](https://www.avast.com/c-how-google-uses-your-data#:~:text=The%20simple%20answer%20is%20yes,%2C%20online%20purchases%2C%20and%20more) and [Microsoft](https://www.fastcompany.com/90290137/how-microsoft-has-avoided-tough-scrutiny-over-privacy-issues) do with your data? Feel free to check out the links.

**Question 2.3 (5 pts)**  How do you feel about sharing data with companies so they can make a profit from your data?

Your answer ...

## 3. Data Breach

Have you heard or read about the [Cambridge Analytica scandal](https://www.businessinsider.com/cambridge-analytica-a-guide-to-the-trump-linked-data-firm-that-harvested-50-million-facebook-profiles-2018-3)? Data was harvested from over 87 million Facebook users through an external app by the consulting firm Cambridge Analytica to build psychological profiles of users. Later the data was used in political campaigns to help target ads. Facebook was fined $5 billion by the Federal Trade Commission for this data breach. Similar to the Facebook-Cambridge data breach, there were hundreds of other [data breaches](https://data.world/bentonporter/online-data-breaches) that people may not even be aware of. In the following dataset, we will analyze the type of data that leaked in the data breaches.

In [None]:
# Just run this cell
data_table = Table.read_table('breaches_090316.csv')
data_table

Note that the types of data that was leaked are listed in columns `1`, `2`, and `3`, as well as later columns though those are not cleaned yet.  We shall focus on columns `1`, `2`, and `3` for now.

**Question 3.1 (2 points)**  How many data breaches are contained in this data set?

In [None]:
num_of_breaches = data_table.num_rows
num_of_breaches

In [None]:
grader.check("q31")

**Question 3.2 (2 points)**  How many times does the word **Email** occur in column `1`?

In [None]:
num_of_Email = data_table.where('1', are.containing('Email')).num_rows # ...
num_of_Email

In [None]:
grader.check("q32")

The cell below contains code that reads through columns `1`, `2`, and `3` of the dataset and counts the number of times (the frequency) various types of data breaches are mentioned.  It then generates a bar graph showing the most prominant data breaches and their frequency.  

In [None]:
# Just run this entire section

import re
df = pd.read_csv('breaches_090316.csv')

# array from column data_1 using pandas data frame
data1_array = df['1']
data2_array = df['2']
data3_array = df['3']

# function to make a string from an array
def make_string(data_array):
    string = ""
    for word in data_array:
        if not isinstance(word, float):
            string += word
    return string

# make a long string out of datai_array 
data1_string = make_string(data1_array)
data2_string = make_string(data2_array)
data3_string = make_string(data3_array)
# get the patterns we want to filter from the string
pattern = re.compile(r'[^a-zA-Z ]+')
# make the entire string lower case and filter the patterns
data1_string = re.sub(pattern, '', data1_string).lower()
data2_string = re.sub(pattern, '', data2_string).lower()
data3_string = re.sub(pattern, '', data3_string).lower()
# split words in data1_string
list_data1 = data1_string.split()
list_data2 = data2_string.split()
list_data3 = data3_string.split()

list_combined = list_data1 + list_data2 + list_data3
# have a frequency count array
freq = []
# have a unique_list for unique words found
unique_list = []

# go through list_data1 and check 
# if a word is unique or not and count its frequency
for word in list_combined:
    if word not in unique_list:
        unique_list.append(word)
        num = list_combined.count(word)
        freq.append(num)

# create a python dictionary using freq and unique_list
dictionary = dict(zip(unique_list, freq))
# create an array of values and keys
values = []
keys = []
# store the values from the dictionary into the values and keys array
for key in dictionary:
    if key == 'email':
        values.append(dictionary.get(key))
        keys.append(key)
    if key == 'names':
        values.append(dictionary.get(key))
        keys.append(key)
    if key == 'usernames':
        values.append(dictionary.get(key))
        keys.append(key)
    if key == 'passwords':
        values.append(dictionary.get(key))
        keys.append(key)
    if key == 'birth':
        values.append(dictionary.get(key))
        keys.append(key)
    if key == 'ip':
        values.append(dictionary.get(key))
        keys.append(key)
 
breach_table = Table().with_columns('data breach', keys, 'count', values)
breach_table.barh('data breach', 'count')

 ## 4. Exploring Data Ethics -- Data Privacy and Targeted Ads
 

**4.1 Facebook Fourth Quarter and Full Year 2020 Financial Highlights**

The table below shows Facebbok's finances for 2020.

<img src="fbR.png"/>

In [None]:
# Run this cell.  Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'advertising', 'other'
sizes = [27187,885]
explode = (0, 0.1, ) 

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

The pie chart shows 96.8% Facebook revenue is from advertising. Now let't take a look at how Facebook runs its advertising business. In Facebook for Business webpage, Facebook provides *audience targeting advertisement service* to its clients. Here is the original statement: 

"Facebook will automatically show your ads to people who are most likely to find your ads relevant. You can further target your ad delivery with three audience selection tools...Define an audience based on criteria like age, interests, geography and more.


**Question 4.1 (6 pts)**:Do you think it is ethical to do target advertising by defining audience based on criteria like age,interests, geography? Why? 

Your answer ...

<b> 4.2 Department of Housing and Urban Development Sues Facebook Over Housing Discrimination</b>

On August 13, 2018, the Assistant Secretary for Fair Housing and Equal Opportunity(“Assistant Secretary”) filed a timely complaint with the Department of Housing and Urban Development (“HUD” or the “Department”) alleging that Facebook, Inc violated subsections 804(a), 804(b), 804(c) and 804(f) of the Fair Housing Act, 42 U.S.C. §§ 3601-19 (“Act”), by discriminating because of race, color, religion, sex, familial status, national origin and disability. Here is part of the summary:

It is unlawful to make, print, or publish, or cause to be made, printed, or published, any
notice, statement, or advertisement with respect to the sale or rental of a dwelling that indicates
any preference, limitation, or discrimination based on race, color, religion, sex, familial status,
national origin or disability, or that indicates an intention to make such a distinction. 42 U.S.C.
§ 3604(c); 24 C.F.R. § 100.75(a), (b), (c)(1). Such unlawful activity includes “[s]electing media
or locations for advertising the sale or rental of dwellings which deny a particular segment of the
housing market information about housing opportunities because of race, color, religion, sex,
handicap, familial status, or national origin.” [source](https://www.hud.gov/sites/dfiles/Main/documents/HUD_v_Facebook.pdf) 

Let's take a look at what you can set regarding *ad targeting*. [source](https://www.documentcloud.org/documents/3191165-Facebook-Propublica-Ad.html)

<img src="ads1.png"/>

**Question 4.2 (6 pts)** On the above Facebook's set ad targeting page, which protected groups are being excluded?  Is Facebook's platform creating an environment for exclusion in our society?  Your thoughts?

Your answer ...

Below is another ad that was posted on Facebook 

Left: A Facebook ad for *Hiring CDL Drivers*.

Right: Facebook’s chart indicating the percentages of men and women the ad was shown to.  [Source](https://www.propublica.org/article/facebook-ads-can-still-discriminate-against-women-and-older-workers-despite-a-civil-rights-settlement)

<img src="ads2.png"/>

**Question 4.3 (6pts)** By the time the ad stopped running ten days later, it was shown to more than 20,000 people. Among the age group 25-34 the ad was shown to, 87% were men. Did women miss out on the opportunity of this job?  What do you think about Facebook's platform with the ability to set targets for ads?

Your answer ...

**4.3 Privacy Violations Using Microtargeted Ads: A Case Study** by Aleksandra Korolova
at Stanford University

Please look through the following paper [Privacy Violations Using Microtargeted Ads](https://theory.stanford.edu/~korolova/Privacy_violations_using_microtargeted_ads.pdf)

The paper summary states: 
"We propose a new class of attacks that breach user privacy by exploiting advertising systems offering microtargeting capabilities. We study the advertising system of the largest online social network, Facebook, and the risks that the design of the system poses to the privacy of its users. We propose, describe and provide experimental evidence of several novel approaches to exploiting the advertising system in order to obtain private user information.

The work illustrates how a real-world system designed with an intention to protect privacy but without rigorous privacy guarantees can leak private information, and motivates the need for further research on the design of microtargeted advertising systems with provable privacy guarantees. Furthermore, it shows that user privacy may be breached not only as a result of data publishing using improper anonymization techniques, but also as a result of internal data-mining of that data. "

**Question 4.4 (Extra Credit -- 5 pts)** Facebook states that "We are committed to honoring your privacy choices and protecting your information." [source](https://about.fb.com/actions/protecting-privacy-and-security/?utm_source=ads&utm_medium=GS&utm_content=489243000410&utm_campaign=2021Q1) However, Standford's paper has proved that there are private information leaking issues on the Facebook platform. Moreover, Facebook allows its ad clients to exclude certain groups of users based on the privacy information that it collected from users. Is Facebook's action in accordance or contradictory to its statement? 

Your answer ...

## Questions to ponder ...

    - What platforms have the most influence on your shopping experience?
    
    - Is your data (all the ones that you can think of) stored in a safe place?
    
    - Do you know whether your data has ever been compromised?
    
    - Are companies using your data responsibly or ethically?


#### Resources used in this homework:

* [Social platform influence in online shopping](https://data.world/ahalps/social-influence-on-shopping)

* [Online data breach](https://data.world/bentonporter/online-data-breaches)
    
* Want to know about [11 of the Worst Data Breaches in Media](https://auth0.com/blog/11-of-the-worst-data-breaches-in-media/)? 
   
* [Facebook Ad Library Report](https://www.facebook.com/ads/library/report/)

* [Aggregate Advertising Expenditure in the US Economy: What's Up? Is It Real?](https://hbswk.hbs.edu/item/aggregate-advertising-expenditure-in-the-us-economy)

* [Facebook Advertising Targeting](https://www.facebook.com/business/ads/ad-targeting)

* [Facebook Ads Can Still Discriminate Against Women and Older Workers, Despite a Civil Rights Settlement](https://www.propublica.org/article/facebook-ads-can-still-discriminate-against-women-and-older-workers-despite-a-civil-rights-settlement)

* [Privacy Violations Using Microtargeted Ads: A Case Study](https://theory.stanford.edu/~korolova/Privacy_violations_using_microtargeted_ads.pdf)





**You've completed Homework 4!**

Please save your notebook, download a pdf version of the notebook, and submit it to Canvas.