<div style="width:100%;text-align: center;"> <img align=middle src="https://media.istockphoto.com/photos/dark-web-hooded-hacker-picture-id1143736474?b=1&k=20&m=1143736474&s=170667a&w=0&h=SRcCM-3398E6OOwgLXGbuo-Q7zkLSOT9S4rh3zMZW7Y=" alt="Heat beating" style="height:366px;margin-top:3rem;"> </div>

# <h1 style='background:#20C20E; border:0; color:white'><center>EDA:GeoData for hacking attempts (July 2021-Aug 2022)</center></h1> 

# **<span style="color:	#0e6b0e;">📰About the Dataset</span>**

Geodata with dates and times of hacking attempts on an Australian website. Data was collected using fail2ban logs on a set of pre-defined rules as to what was considered a hacking attempt, such as multiple failed login retries, repeated attempt to hit particular pages or applications that did not exist. The data included an IP address, which was then transformed in to geospatial co-ordinates - as such, spoofed IP addresses or VPN use may skew results.

Timestamps in AEST (GMT+10) format and cover 21 August 2021 to 13 July 2022

# **<span style="color:	#0e6b0e;">📁About the files</span>**

There is only one CSV file consisting of 3 columns, namely:

> 1. lat - Latitude

> 2. lng - Longitude

> 3. datetime - time stamp of location with date

# **<span style="color:	#0e6b0e;">🎨Notebook Color Palette</span>**

In [None]:
import seaborn as sns

#Custom Colors
class clr:
    S = '\033[1m' + '\033[96m'
    E = '\033[0m'
    
my_colors = ["#20C20E", "#0e6b0e", "#649568", "#9ccc9c", "#2b5329"]

print(clr.S + "Notebook Color Scheme: " + clr.E)
sns.palplot(sns.color_palette(my_colors))


In [None]:
#Imports
!pip install imagesize
import imagesize
from IPython.display import display_html
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.patches as patches
import random
import sys
! pip install -q folium
import folium
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime

In [None]:
from branca.element import Figure

In [None]:
#Environment check

import os
import warnings

warnings.filterwarnings("ignore")

In [None]:
! pip install basemap
from mpl_toolkits.basemap import Basemap

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/geodata-for-hacking-attempts/hacking_attempts_geodata.csv')
df.head()

In [None]:
#Separating Date and Time from ""datetime"" column

df['Dates'] = pd.to_datetime(df['datetime']).dt.date
df['Time'] = pd.to_datetime(df['datetime']).dt.time

In [None]:
df.head()

In [None]:
#Now we can drop "datetime" column

df.drop('datetime', axis=1, inplace=True) 

In [None]:
df.head()

In [None]:
df.columns

# **<span style="color:#9ccc9c;">🥽Let's see what all we can explore in this dataset</span>**

> 1. On which date we had maximum breach?

> 2. Which coordinates were used during most breaches?

> 3. What was the most optimum time of breach?

> 4. Is there any relation between all the parameters?

> 5. Plot all coordinates on a map


# **<span style="color:#2b5329;">🤝 Correlation Between all parameters</span>**

In [None]:
plt.figure(figsize = (15, 8), facecolor = "#F7F4F4")
sns.heatmap(df.corr(), annot = True, cmap = "Greens");

**There seems to be no particular relation between any parameter.**

# **<span style="color:#2b5329;">🚨 Maximum breach occurance by Date</span>**

In [None]:
plt.figure(figsize=(15,8))
plt.plot(df['Dates'], color = my_colors[1])
plt.show()

In [None]:
max_breach_date = df['Dates'].value_counts()
max_breach = max_breach_date.max()
max_breach


In [None]:
print(max_breach_date)

We found that most breaches (~approximately 960 times) took place on 2022-07-13 i.e 13 July 2022. While it is also noteworthy that most of other top breaches also took place in the months of July and August at almost alternate dates.

# **<span style="color:#2b5329;">🗺 Maximum breach occurance by Location</span>**

In [None]:
#Which coordinates were used the most
max_breach_lat = df['lat'].value_counts()
max_breach_lng = df['lng'].value_counts()
print('Latitude -  ',max_breach_lat , 'Longitude', max_breach_lng)

In [None]:
plt.figure(figsize=(15,8))
plt.plot(max_breach_lat.head(100), '.', alpha=0.6, markersize=10, color=my_colors[0])
plt.xlabel('Latitude')

Since, we found that most of the time the breach location latitude was **39.9075** which can also be observed in the above graph as we see more markers being condensed near the interval of 30-40.

In [None]:
plt.figure(figsize=(15,8))
plt.plot(max_breach_lng.head(100), '.', alpha=0.6, markersize=10, color=my_colors[0])
plt.xlabel('Longitude')

From the above data analysis, we found that most used breach longitude coordinate was **-74.0000**. From the chart plotted above it is made clear that we see some points condensed near -70.

While most longitude values are in positive half, thus we find most crowded points on the right hand side of the origin.

 **<span style="color:#2b5329;">⌚ Most optimum time of breach </span>**

In [None]:
optimum_time = df['Time'].value_counts()
optimum_time

From the above metrics - **03:26:43** seems to be the most optimum time, whe most of the hacks took place.

# **<span style="color:#2b5329;">🌎 Plotting Longitude and Latitudes</span>**

In [None]:
# Plotting the Latitude and Longitude values to see what we get

plt.figure(figsize=(20,10))

# Plot the latitude and Longitude values
plt.plot(df['lng'], df['lat'], '.', alpha=0.6, markersize=5, color=my_colors[0])
plt.xlabel('Longitude')
plt.ylabel('Latitude')


In [None]:
#Plotting single coordinate

def generateBaseMap(default_location=[51.5085, -0.1257], default_zoom_start=5):
    """
    location: Define the default location to zoom at when rendering the map
    zoom_start: The zoom level that the map will default to when rendering the map
    control_scale: Shows the map scale for a given zoom level
    """
    base_map = folium.Map(location=default_location, control_scale=True, zoom_start=default_zoom_start)
    return base_map

In [None]:
base_map = generateBaseMap()
#fig3=Figure(width=550,height=350)
#fig3.add_child(base_map)
folium.Marker(location=[51.5085, -0.1257],popup='Breach Location',tooltip='Breach Location 1').add_to(base_map)
folium.Marker(location=[33.7215, 73.0433],popup='Custom Marker 2',tooltip='<strong>Breach Location 2</strong>',icon=folium.Icon(color=my_colors[0],prefix='glyphicon',icon='off')).add_to(base_map)
folium.Marker(location=[18.5196, 73.8554],popup='Custom Marker 3',tooltip='<strong>Breach Location 3</strong>',icon=folium.Icon(color=my_colors[1],prefix='fa',icon='anchor')).add_to(base_map)
folium.Marker(location=[48.1031, 29.1260],popup='Custom Marker 4- <b>Analytics Vidhya</b>',tooltip='<strong>Breach Location 4</strong>',icon=folium.Icon(color=my_colors[2],prefix='fa',icon='anchor')).add_to(base_map)
base_map


# **<span style="color:#2b5329;">👆Zoom out to view more Breach Locations👆</span>**

 **<span style="color:#2b5329;">🗺 Let's plot location from where Maximum Breaches took place </span>**
 
 **<span style="color:#20C20E;"> Latitude -> 39.9075 and Longitude -> -74.0000  </span>**


In [None]:
folium.Marker(location=[ 39.9075,  -74.0000],popup='Maximum Breach Location',tooltip='Maximum Breach Location').add_to(base_map)
base_map

# **<span style="color:#2b5329;">👆Zoom out and move your cursor to left to view most used Location for Breach👆</span>**

**<span style="color:#000000;">#This marks the end of EDA: GeoData for hacking attempts </span>** 

**<span style="color:#000000;"> Stay Tuned for more </span>** 

**<span style="color:#000000;">Please share your feedback and suggestions and help me improve 😇 </span>** 