![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fpresentations&branch=master&subPath=health-data-privacy.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/> </a>

# Data Privacy

As our world becomes increasingly digitized and interconnected, our use of and reliance on data has also increased. The types of devices that are being connected to the internet get [stranger and stranger every day](https://www.metrikus.io/blog/10-weirdest-iot-enabled-devices-of-all-time), where the value of that interconnectivity is sometimes difficult to see. This is a fairly recent trend, made possible by advances in computing, internet, and data storage technology, which are are all innovating at breakneck speed. It's becoming such an integrated part of our lives that it's sometimes easy to forget just how much information is collected about us, and who has access to it. This is especially true for health data, which is typically quite closely guarded, and can have very negative effects if it falls into the wrong hands.

In this notebook, and the accompanying activity, we'll look at ways that data impacts your life, from its collection to its applications. We'll have a special focus on the role of health data, with some information about what you can do to make sure that your data isn't used in a way that negatively impacts your life. Hopefully you'll leave with an appreciation of how seemingly irrelevant data can be used to paint a picture of who you are, and how that information can be used for both bad and good.

# Positive Uses of Data
### Historical Health Data
Though recent advances have made data more accessible in both type and quantity, the use of data in public health has always been appreciated. The COVID-19 pandemic has introduced the general public to metrics and terms that were once solely the domain of health researchers in the field of [epidemiology](https://en.wikipedia.org/wiki/Epidemiology). The response to COVID-19 has been largely data-driven, but we can look at some historical instances of health data being used to control a disease outbreak long before data was as commonplace as it is today.

A classic example of the role of health data is in [determining the source of a cholera outbreak in London in 1854](https://www.rcseng.ac.uk/library-and-publications/library/blog/mapping-disease-john-snow-and-cholera/). Cholera is an [incredibly nasty bacterial disease](https://www.mayoclinic.org/diseases-conditions/cholera/symptoms-causes/syc-20355287) that affects the digestive tract, and though it's rare today in developed nations, it still results in the [deaths of tens of thousands of people](https://www.who.int/news-room/fact-sheets/detail/cholera) in developing countries across the world *each year*. 

At the time, London was dealing with massive amounts of people moving to the city in a short period of time, and the sewage system wasn't adequate to handle the removal of all the waste, especially in one area of the city. This resulted in the sewage contaminating the drinking water supply, which is now known as the primary route of cholera infection in humans. This lead to the third outbreak of cholera that the city had seen in 20 years, and at its peak over [600 people were dying each week from the disease](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7150208/).

A young physician named [John Snow](https://en.wikipedia.org/wiki/John_Snow) rejected the leading theory at the time that cholera was an airborne disease, and was insistent that it spread through water. To illustrate his point, he recorded the locations of the homes of those who had died from cholera, and added that data to a map of the area that also had the locations of the water pumps that provided the drinking water for the nearby homes:

#### John Snow's original map overlaid with cholera deaths (bubble size is number of deaths; blue taps indicate location of water pumps)

![John Snow's original map overlaid with cholera causes (bubble size is number of cases](https://blog.rtwilson.com/wp-content/uploads/2012/01/SnowMap_Points.png)
<p>
<b>https://blog.rtwilson.com/wp-content/uploads/2012/01/SnowMap_Points.png</b>
</p>

By comparing the spread of the cases with the location of the pumps, it became quite obvious which pump was contaminated with cholera and was causing the epidemic. He was able to convince the local government to disable the contaminated pump, forcing citizens to get their water from other pumps, and the outbreak was contained. This is considered to be a major event in both the birth of the field of epidemiology, and the use of data in public health.

John Snow's work laid the foundation for the use of health data, and is a great example of how our personal data (in this case, home address and infection status) can be used positively to help limit the spread of a deadly disease in the community. Unfortunately, as we'll see, data hasn't always been used for the benefit of the population it's collected from. Though there are no shortage of examples of your data knowingly being used in a way you might not be comfortable with, first we'll show a few examples of data that was released in earnest, but had unintended consequences.

# Unintentional Misuse of Data
### Smart Devices and GPS

[Strava](https://en.wikipedia.org/wiki/Strava) is an internet service that allows users to track their physical activity and compare with their friends and others, both locally and across the world. Users can upload their activities (usually via the Strava app) to the service, and compare their times on certain segments or progress towards certain goals. To enable that, Strava collects the GPS data from the users' smartwatches (Apple, Garmin, Fitbit, etc) and other devices that they use to record their activity, and uses that data to calculate times, distances, and speeds. By allowing users to compare their times, collect achievements, and accomplish goals, the social media aspect of the service encourages people to exercise, which is a fairly noble goal.

One of the other functions of Strava is to help users find new routes near them that are popular with other Strava users. To help users explore their local routes, Strava generates a heatmap that shows the most popular locations for activites, shown as the brightest locations below:

![](https://1n4rcn88bk4ziht713dla5ub-wpengine.netdna-ssl.com/wp-content/uploads/2017/10/Global-Heatmap.png)
<p>
<b>https://blog.strava.com/zi/press/strava-community-creates-ultimate-map-of-athlete-playgrounds/</b>
</p>

In November 2017, Strava released their heatmap as an [interactive tool](http://labs.strava.com/heatmap) that allowed anyone to explore the map and the popular locations for physical activity across the world (though non-users are restricted to a wider level of zoom). A few months later, one user discovered that there were many "hot spots" showing up in otherwise completely remote regions, far away from any population centers. What this user had discovered was the [existence of (previously) secret military bases](https://techcrunch.com/2018/01/28/strava-exposes-military-bases/) in regions such as Afghanistan, Syria, and Somalia. Looking deeper, it was also possible to track the movements of troops, as some of them had uploaded recordings of their training exercises.

![](https://ichef.bbci.co.uk/news/976/cpsprodpb/112EA/production/_99787307_bagram_airbase.jpg)
<p>
    <b>https://ichef.bbci.co.uk/news/976/cpsprodpb/112EA/production/_99787307_bagram_airbase.jpg</b>
</p>

As you can imagine, the militaries of the countries whose bases were exposed were none too pleased. They certainly had never expected the locations of their operations to be revealed in this way, and had therefore not put in any protections to stop such an event from occurring. Likewise, the service members who uploaded the data were just trying to track their workouts, and had no idea that Strava would make these heatmaps public, nor that anyone would look at them that closely. It's also difficult to blame Strava for their role in this situation, as they had no idea that the bases existed. Nor did they know exactly how the heatmap was going to be used; all it took was a university student from Australia to probe a bit deeper into the available data as part of his academic research to uncover valuable strategic information. Strava has since added privacy tools where users can obscure the starts and ends of their activities (to hide the location of their home or work), as well as changed several potentially privacy-invasive features to be "opt-in" versus "opt-out".

In summary, though all parties involved in the collection, analysis, and release of the data had no intention of causing any harm, the damage was still done and the heatmap became a serious security concern. As data becomes cheaper and more commonplace to collect, and as the public's appetite for such data continues to increase, the world will continue to struggle to keep up with the potential negative side effects of carelessly releasing personal information.

Unfortunately, as we learn more about how seemingly innocuous data can be used to discover previously secret information, there are ever-increasing numbers of entities that are using that knowledge specifically to direct people's behaviour towards a goal that aligns with theirs.

# Negative Uses of Data
### Genetic Testing and Health Insurance Discrimination

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)