# A Breached Fortress: Analyzing the Government Agency Cybersecurity Breaches

![image.png](https://www.reliasite.com/wp-content/uploads/2019/08/bigstock-Hacker-Using-Laptop-With-Binar-257453926-e1565109796243.jpg)

# The Background
---

In a not-so-distant future, a government-owned agency that plays a crucial role in national security and public welfare finds itself entangled in a series of alarming cybersecurity breaches. This agency, responsible for handling sensitive information and critical infrastructure, was once regarded as a fortress of impenetrable security measures. However, recent events have exposed its vulnerabilities and sent shockwaves through the nation.

In order to recover, this agency has tasked you with analyzing data involving its breaches. Entity names have not been disclosed for security purposes. Your mission, should you choose to accept it, is to analyze the data and uncover trends and insights regarding the breaches then add your recommendations. The future of the agency and the nation's security depends on you.

# The Data Description

| Column Name | Description |
| --- | --- |
| **ID** | Entity breached identification number |
| **Name of Covered Entity** | Cover ame for breached entities |
| **State** | State of origin for entity |
| **Business Associate Involved** | Whether or not the breach involved a business associate |
| **Individuals Affected** | Number of people affected by the breach |
| **Breach Start** | Start date of breach |
| **Breach End** | End date of breach |
| **Posted/Updated** | The date in which the breach was posted for open to public |
| **Type of Breach** | The mode of breach|
| **Location of Breached Information** |  The technical device(s) that were compromised |

# Data Profiling and Cleaning

In [10]:
#importing libraries and data

import pandas as pd
from tabulate import tabulate
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

df = pd.read_excel('CyberSecurityBreaches.xlsx')
breaches = df.copy()

In [17]:
# Inspect the data

print('First 5 rows of the data')
print(tabulate(breaches.head(), tablefmt="pipe", headers="keys"), '\n')
print('Last 5 rows of the data')
print(tabulate(breaches.tail(), tablefmt="pipe", headers="keys"), '\n')
print('Random 5 rows of the data')
print(tabulate(breaches.sample(5), tablefmt="pipe", headers="keys"))



First 5 rows of the data
|    |    ID | Name of Covered Entity   | State   | Business Associate Involved   |   Individuals Affected | Breach Start        | Breach End          | Posted/Updated      | Type of Breach   | Location of Breached Information        |
|---:|------:|:-------------------------|:--------|:------------------------------|-----------------------:|:--------------------|:--------------------|:--------------------|:-----------------|:----------------------------------------|
|  0 | 90840 | Entity 1                 | TX      | No                            |                   1711 | 2015-09-13 00:00:00 | 2015-10-15 00:00:00 | 2016-06-29 00:00:00 | Theft            | Paper                                   |
|  1 | 90711 | Entity 2                 | MO      | No                            |                    692 | 2014-07-13 00:00:00 | 2014-07-13 00:00:00 | 2016-05-29 00:00:00 | Theft            | Network Server                          |
|  2 | 90799 | Entity 3        

This looks ok to me. I don't see anything alarming right off the bat. Let move forward.

In [45]:
# Inspect the data types

print(tabulate(breaches.dtypes.reset_index().rename(columns={0:'dtype', 'index':'column'}), tablefmt="pipe", headers="keys"),'\n')

# Inspect the data shape

print(f'There are {breaches.shape[0]} rows and {breaches.shape[1]} columns in the data')

|    | column                           | dtype          |
|---:|:---------------------------------|:---------------|
|  0 | ID                               | int64          |
|  1 | Name of Covered Entity           | object         |
|  2 | State                            | object         |
|  3 | Business Associate Involved      | object         |
|  4 | Individuals Affected             | int64          |
|  5 | Breach Start                     | datetime64[ns] |
|  6 | Breach End                       | datetime64[ns] |
|  7 | Posted/Updated                   | datetime64[ns] |
|  8 | Type of Breach                   | object         |
|  9 | Location of Breached Information | object         | 

There are 2110 rows and 10 columns in the data


In [7]:
len(df.columns)

10