# Cybersecurity Breaches Analysis 

## Background

As a computer network and cybersecurity major, a large focus of my educational curriculum was the importance of taking the necessary steps needed to safeguard a network from the actions of both external and internal threat actors. As such, it should come as no surprise that I would choose to analyze a cybersecurity breach data set, for my data analytics internship. This dataset is in direct alliance with my educational path, as it allows me to analyze and interpret data on individuals and organizations that were affected by data breaches. 


Data breaches remains a rising trend as it continues to angle upwards. As a result, there has not been a more precarious time in history where the maintenance of the integrity and security of the information of individuals and organizations is of utmost importance. In order to help prevent the repetition of data breaches that occurs as a result of data theft, it is essential to analyze the trends and patterns surrounding data breaches. 

In [4]:
import pandas as pd
import os

from dotenv import load_dotenv
from pathlib import Path

dotenv_path = Path('../.env')
load_dotenv(dotenv_path=dotenv_path)

GOOGLE_APPLICATION_CREDENTIALS = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
print(GOOGLE_APPLICATION_CREDENTIALS)

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

%load_ext google.cloud.bigquery
%reload_ext google.cloud.bigquery

/Users/mickayliajohnson/Downloads/fall-2021-internship-187dc2a95056.json
The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery


## Data Description

The [Cybersecurity Breaches](https://www.kaggle.com/alukosayoenoch/cyber-security-breaches-data/metadata) data set contains data on individuals and organizations that were affected by data breaches. The data set contains information on the number of individuals that had their data compromised, the date of the data breach, the type of data breach, the location of the breached information, the date the breach was posted or updated, resolution date and a summary of the breach and how the situation was handled.  

In [119]:
%%bigquery
SELECT *  
FROM `fall-2021-internship.cybersecurity_data.breaches` 
LIMIT 1000


Query complete after 0.00s: 100%|█████████████| 1/1 [00:00<00:00, 701.62query/s]
Downloading: 100%|███████████████████████| 1000/1000 [00:01<00:00, 849.98rows/s]


Unnamed: 0,Row_Number,Number,Name_of_Covered_Entity,State,Business_Associate_Involved,Individuals_Affected,Date_of_Breach,Type_of_Breach,Location_of_Breached_Information,Date_Posted_or_Updated,Summary,breach_start,breach_end,year
0,1,0,Brooke Army Medical Center,TX,,1000,10/16/09,Theft,Paper,2014-06-30,"A binder containing the protected health information (PHI) of up to 1,272 individuals was stolen from a staff member's vehicle. The PHI included names, telephone numbers, detailed treatment notes, and possibly social security numbers. In response to the breach, the covered entity (CE) sanctioned the workforce member and developed a new policy requiring on-call staff members to submit any information created during their shifts to the main office instead of adding it to the binder. Following OCR's investigation, the CE notified the local media about the breach.",2009-10-16,,2009
1,2,1,"Mid America Kidney Stone Association, LLC",MO,,1000,9/22/09,Theft,Network Server,2014-05-30,"Five desktop computers containing unencrypted electronic protected health information (e-PHI) were stolen from the covered entity (CE). Originally, the CE reported that over 500 persons were involved, but subsequent investigation showed that about 260 persons were involved. The ePHI included demographic and financial information. The CE provided breach notification to affected individuals and HHS. Following the breach, the CE improved physical security by installing motion detectors and alarm systems security monitoring. It improved technical safeguards by installing enhanced antivirus and encryption software. As a result of OCR's investigation the CE updated its computer password policy.",2009-09-22,,2009
2,3,2,Alaska Department of Health and Social Services,AK,,501,10/12/09,Theft,"Other Portable Electronic Device, Other",2014-01-23,,2009-10-12,,2009
3,4,3,"Health Services for Children with Special Needs, Inc.",DC,,3800,10/9/09,Loss,Laptop,2014-01-23,"A laptop was lost by an employee while in transit on public transportation. The computer contained the protected health information of 3800 individuals. The protected health information involved in the breach included names, Medicaid ID numbers, dates of birth, and primary physicians. In response to this incident, the covered entity took steps to enforce the requirements of the Privacy & Security Rules. The covered entity has installed encryption software on all employee computers, strengthened access controls including passwords, reviewed and updated security policies and procedures, and updated it risk assessment. In addition, all employees received additional security training. \n\n",2009-10-09,,2009
4,5,4,"L. Douglas Carlson, M.D.",CA,,5257,9/27/09,Theft,Desktop Computer,2014-01-23,"A shared Computer that was used for backup was stolen on 9/27/09 from the reception desk area of the covered entity. The Computer contained certain electronic protected health information (ePHI) of 5,257 individuals who were patients of the CE. The ePHI involved in the breach included names, dates of birth, and clinical information, but there were no social security numbers, financial information, addresses, phone numbers, or other ePHI in any of the reports on the disks or the hard drive on the stolen Computer. Following the breach, the covered entity notified all 5,257 affected individuals and the appropriate media; added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer; added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet; and added administrative safeguards by requiring annual refresher retraining of CE staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules.\n\n",2009-09-27,,2009
5,6,5,"David I. Cohen, MD",CA,,857,9/27/09,Theft,Desktop Computer,2014-01-23,"A shared Computer that was used for backup was stolen from the reception desk area, behind a locked desk area, probably while a cleaning crew had left the main door to the building open and the door to the suite was unlocked and perhaps ajar. The Computer contained certain electronic protected health information (ePHI) of 857 patients. The ePHI involved in the breach included names, dates of birth, and clinical information. Following the breach, the covered entity notified all affected individuals and the media, added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer, added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet, and added administrative safeguards by requiring annual refresher retraining staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules, which has already taken place.\n\n",2009-09-27,,2009
6,7,6,"Michele Del Vicario, MD",CA,,6145,9/27/09,Theft,Desktop Computer,2014-01-23,"A shared Computer that was used for backup was stolen on 9/27/09 from the reception desk area of the covered entity. The Computer contained certain electronic protected health information (ePHI) of 6,145 individuals who were patients of the CE, The ePHI involved in the breach included names, dates of birth, and clinical information, but there were no social security numbers, financial information, addresses, phone numbers, or other ePHI in any of the reports on the disks or the hard drive on the stolen Computer. Following the breach, the CE: notified all 6,145 affected individuals and the appropriate media; added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer; all passwords are strong; all computers are password protected; added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet; and added administrative safeguards by requiring annual refresher retraining of CE staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules, which has already taken place. \n\n",2009-09-27,,2009
7,8,7,"Joseph F. Lopez, MD",CA,,952,9/27/09,Theft,Desktop Computer,2014-01-23,"A shared Computer that was used for backup was stolen on 9/27/09. The Computer contained certain electronic protected health information (ePHI) of 952 patients. Following the breach, the covered entity notified all 952 affected individuals and the appropriate media; added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer; added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet; and added administrative safeguards by requiring annual refresher retraining of staff for Privacy and Security Rules. \n\n",2009-09-27,,2009
8,9,8,"Mark D. Lurie, MD",CA,,5166,9/27/09,Theft,Desktop Computer,2014-01-23,"A shared Computer that was used for backup was stolen on 9/27/09 from the reception desk area of the covered entity. The Computer contained certain electronic protected health information (ePHI) of 5,166 individuals who were patients of the CE, The ePHI involved in the breach included names, dates of birth, and clinical information, but there were no social security numbers, financial information, addresses, phone numbers, or other ePHI in any of the reports on the disks or the hard drive on the stolen Computer. Following the breach, the CE: notified all 5,166 affected indiv's and the appropriate media; added technical safeguards of encryption for all ePHI stored on the USB flash drive or the CD used on the replacement computer; all passwords are strong; all computers are password protected; added physical safeguards by keeping new portable devices locked when not in use in a secure combination safe in doctor's private office or in a secure filing cabinet; and added administrative safeguards by requiring annual refresher retraining of CE staff for Privacy and Security Rules as well as requiring immediate retraining of cleaning staff in both Rules, which has already taken place. \n\n",2009-09-27,,2009
9,10,9,City of Hope National Medical Center,CA,,5900,9/27/09,Theft,Laptop,2014-01-23,"A laptop computer was stolen from a workforce member's car. The laptop computer contained the protected health information of approximately 5,900 individuals. Following the breach, the covered entity encrypted all protected health information stored on lap tops. Additionally, OCR's investigation resulted in the covered entity improving their physical safeguards and retraining employees.\n\n",2009-09-27,,2009


The above display columns in the Cybersecurity Breach dataset. 

In [105]:
%%bigquery
SELECT 
    Type_of_Breach,
    COUNT(Type_of_Breach) AS Frequency 
FROM `fall-2021-internship.cybersecurity_data.breaches`
GROUP BY Type_of_Breach 
ORDER BY Frequency DESC

Query complete after 0.00s: 100%|████████████| 3/3 [00:00<00:00, 3175.10query/s]
Downloading: 100%|████████████████████████████| 29/29 [00:01<00:00, 15.35rows/s]


Unnamed: 0,Type_of_Breach,Frequency
0,Theft,516
1,Unauthorized Access/Disclosure,148
2,Other,91
3,Loss,85
4,Hacking/IT Incident,75
5,Improper Disposal,38
6,"Theft, Unauthorized Access/Disclosure",26
7,"Theft, Loss",15
8,Unknown,10
9,"Unauthorized Access/Disclosure, Hacking/IT Incident",9


The above display the different types of data breaches and the frequeny of each type of data breach. However, it was recognized that not all type of data breach was distinct, the Type of Breach column includes both single and combined data breaches. 

In [106]:
%%bigquery
SELECT 
    Name_of_Covered_Entity, 
    Individuals_Affected
FROM `fall-2021-internship.cybersecurity_data.breaches`
ORDER BY Individuals_Affected DESC
LIMIT 10

Query complete after 0.00s: 100%|████████████| 1/1 [00:00<00:00, 1519.68query/s]
Downloading: 100%|████████████████████████████| 10/10 [00:01<00:00,  7.20rows/s]


Unnamed: 0,Name_of_Covered_Entity,Individuals_Affected
0,TRICARE Management Activity (TMA),4900000
1,"Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group",4029530
2,"Health Net, Inc.",1900000
3,New York City Health & Hospitals Corporation's North Bronx Healthcare Network,1700000
4,"AvMed, Inc.",1220000
5,The Nemours Foundation,1055489
6,"BlueCross BlueShield of Tennessee, Inc.",1023209
7,Sutter Medical Foundation,943434
8,"Horizon Healthcare Services, Inc., doing business as Horizon Blue Cross Blue Shield of New Jersey, and its affiliates",839711
9,South Shore Hospital,800000


The above display the entity with the highest number of individuals affected. TRICARE Management Activity (TMA) recorded the highest number of individuals affected by data breach; affecting a total of 4.9 million individuals.

In [116]:
%%bigquery
SELECT 
    Name_of_Covered_Entity, 
    Individuals_Affected,
    Summary
FROM `fall-2021-internship.cybersecurity_data.breaches`
ORDER BY Individuals_Affected ASC
LIMIT 20

Query complete after 0.00s: 100%|█████████████| 1/1 [00:00<00:00, 898.33query/s]
Downloading: 100%|████████████████████████████| 20/20 [00:31<00:00,  1.56s/rows]


Unnamed: 0,Name_of_Covered_Entity,Individuals_Affected,Summary
0,LANA MEDICAL CARE,500,
1,"Central Brooklyn Medical Group, PC",500,"OCR opened an investigation of the covered entity (CE), Preferred Health Partners f/k/a Central Brooklyn Medical Group, after it reported appointment schedules, pathology reports and portions of medical records containing the protected health information (PHI) of 500 individuals were stolen from an office. The PHI included names, ages, telephone numbers, social security numbers, medical insurance information, pathology reports, and other clinical information. Upon discovery of the breach, the CE filed a police report and worked with law enforcement authorities to recover as much of the PHI as possible that was stolen. As a result of OCR's investigation, the CE removed PHI such as social security or medical insurance numbers from tracking logs. In addition, the CE improved safeguards by storing log binders in a locked area and shredding documents regularly. Further, the CE replaced the manual process of printing certain records with an electronic verification system. The CE also archived, stored off site, and locked up all paper records and retrained all staff on its HIPAA policies and procedures."
2,Northern Trust,500,
3,"CHC MEMPHIS CMHC, LLC",500,
4,West Georgia Ambulance,500,
5,Lankenau Medical Center,500,
6,Titus Regional Medical Center,500,
7,"Ultra Stores, Inc.",500,
8,Knox Community Hospital,500,
9,University of Mississippi Medical Center,500,


The above display the entity with the lowest number of individuals affected. LANA MEDICAL CARE recorded the highest number of individuals affected by data breach; affecting a total of 500 individuals.

In [118]:
#%%bigquery
#SELECT 
#    Name_of_Covered_Entity, 
#    Summary
#FROM `fall-2021-internship.cybersecurity_data.breaches`
#WHERE Name_of_Covered_Entity = "TRICARE Management Activity (TMA)"


The above display the Entity with the highest individuals afffected and the summary of the breach. Unfortunately, there was no summary of the how the situation was handled.

In [88]:
%%bigquery
SELECT 
    DISTINCT Type_of_Breach,
    SUM(Individuals_Affected) AS Amount_of_Individuals_Affected
FROM `fall-2021-internship.cybersecurity_data.breaches`
GROUP BY Type_of_Breach
ORDER BY Amount_of_Individuals_Affected DESC

Query complete after 0.00s: 100%|████████████| 4/4 [00:00<00:00, 2883.67query/s]
Downloading: 100%|████████████████████████████| 29/29 [00:01<00:00, 17.77rows/s]


Unnamed: 0,Type_of_Breach,Amount_of_Individuals_Affected
0,Theft,16515554
1,Loss,7254286
2,Unknown,1918312
3,Hacking/IT Incident,1878870
4,Unauthorized Access/Disclosure,1424227
5,Other,772500
6,Improper Disposal,671594
7,"Unauthorized Access/Disclosure, Hacking/IT Incident",551355
8,"Unknown, Other",317082
9,"Unauthorized Access/Disclosure, Other",162781


The above display the amount of individuals affected by both single and combined data breaches. Based on the results obatained Theft affected the most individuals with a record of 16515554.

In [91]:
%%bigquery
SELECT 
    Location_of_Breached_Information,
    COUNT(Location_of_Breached_Information) AS Frequency
FROM `fall-2021-internship.cybersecurity_data.breaches`
GROUP BY Location_of_Breached_Information
ORDER BY Frequency DESC

Query complete after 0.00s: 100%|████████████| 1/1 [00:00<00:00, 1463.98query/s]
Downloading: 100%|████████████████████████████| 41/41 [00:00<00:00, 46.95rows/s]


Unnamed: 0,Location_of_Breached_Information,Frequency
0,Paper,227
1,Laptop,217
2,Other,116
3,Desktop Computer,113
4,Network Server,107
5,Other Portable Electronic Device,60
6,E-mail,54
7,"Other Portable Electronic Device, Other",53
8,Electronic Medical Record,21
9,"Laptop, Desktop Computer",8


The above display the single and combined medium through with data breach and the frequeny of each. Paper recorded the highest frequency of 227. 

In [111]:
%%bigquery
SELECT 
    Year,
    COUNT(Year) AS Frequency 
FROM `fall-2021-internship.cybersecurity_data.breaches`
GROUP BY Year
ORDER BY Frequency DESC

Query complete after 0.00s: 100%|████████████| 3/3 [00:00<00:00, 3183.94query/s]
Downloading: 100%|████████████████████████████| 14/14 [00:01<00:00, 11.99rows/s]


Unnamed: 0,Year,Frequency
0,2013,254
1,2011,229
2,2012,227
3,2010,211
4,2009,56
5,2014,56
6,2008,13
7,2004,2
8,2005,2
9,1997,1


The above display the year recording the highest frequency of data breach. The year 2013 recorded the highest with a total of 254. 

In [121]:
%%bigquery
SELECT 
    Name_of_Covered_Entity,
    Type_of_Breach,
    breach_start
FROM `fall-2021-internship.cybersecurity_data.breaches`
ORDER BY breach_start ASC
LIMIT 10

Query complete after 0.00s: 100%|█████████████| 1/1 [00:00<00:00, 881.90query/s]
Downloading: 100%|████████████████████████████| 10/10 [00:02<00:00,  4.11rows/s]


Unnamed: 0,Name_of_Covered_Entity,Type_of_Breach,breach_start
0,UNCG Speech and Hearing Center,Hacking/IT Incident,1997-01-01
1,UMass Memorial Medical Center,Unauthorized Access/Disclosure,2002-05-06
2,Riverside Mercy Hospital and Ohio/Mercy Diagnostics,Improper Disposal,2003-03-29
3,Duke University Health System,Unauthorized Access/Disclosure,2004-04-21
4,SW General Inc,Theft,2004-05-01
5,Harris County,Unauthorized Access/Disclosure,2005-08-15
6,Methodist Dallas Medical Center,Unauthorized Access/Disclosure,2005-09-01
7,"South Shore Physicians, PC",Theft,2006-01-01
8,South Shore Medical Center,Hacking/IT Incident,2007-01-01
9,OhioHealth Corporation dba Grant Medical Center,Theft,2008-01-01


The above display the date of the first recorded data breach and the duration for resolution. 

# Conclusion and Recommendation

Based on the analysis Theft was the most frequent data breach and affected the most individuals. All the flaws:
dates