## 6) Identify three IP scan/attack pairs. A scan/attack pair is where one IP is used to scan the honeypots and a second IP or more are used to attack the honeypots.   
a. State how you determine the three scan/attack pairs?  
b. Support your theory.  
Note: The attacking IP may or may not be within the same IP subnet.

#### Server description

- Snort is not a honeypot but an IDS/IPS and is used to detect attacks.
- Dionaea is a low interaction honeypot that exposes services such as MSSQL, SIP, HTTP, FTP, TFTP, SFTP, SSH, and SMB.
- Cowrie is a medium interaction SSH honeypot designed to log brute force attacks and shell interactions performed by the attackers.
- Elastishoney is a simple elasticsearch honeypot designed to catch attackers exploiting remote code execution (RCE) vulnerabilities in elasticsearch.
- Glastopf is a Python based Web application honeypot that has the ability to emulate thousands of web vulnerabilities.
- Wordpot is a WordPress emulator honeypot.
- ShockPot is a web application honeypot that exposes the vulnerability CVE-2014-6271
***
- 192.168.10.2 - dionaea
- 192.168.10.5 - cowrie
- 192.168.10.6 - elastic
- 192.168.10.6 - glastopf
- 192.168.10.3 - wordpot
- 192.168.10.4 - shockpot


- maybe from the same orgin, scanning the ports that might be open on a particular honeypot
- 

Question 6 and 7 seems to have stumped a few students.  

There are a couple of ways to approach Question 6. The first option is to use the entire data set but that is a lot of data and can be overwhelming.  
The second approach is to used the data collected from one of the previous questions. Question 4 asked to find those attacks that were successful. Using this data will greatly decrease the amount of data you need to look through. From here, there are two options. The first would be to use the IP address and the attack date. You can look through the AllTraffic file and see if there are any older attacks for this IP. If there are none, this means either the attacker was very lucky and their first attack was successful or this IP is a good candidate for being one-half of a scan/attack pair. How could the attacker know the vulnerability without performing some kind of scan?  
You can then look at other IPs that probed the same honeypot looking for same vulnerability. There is a good chance that this failed attempt will be found as a Snort entry but not always. If fact you might find a couple of IPs that could be the other half. If so, then list all the potential candidates. You could narrow it down if you spot the same two IP against other honeypots. I would limit the search to a 7 day period.  
The other way would be to look at the top 50 attacking IP by volume of attack. A lot of the IPs will be in the same subnet (x.x.x.0-255). Once you have identified a number of IPs within the same subnet, create a time graph for all the IPs in the subnet range. By looking at these graphs you should notice activity that looks like a heart beat. You might also find different IPs displaying a similar heart beat. This heart beat should look the same but occur at a different time. These could represent scan and attack pairs. Further investigation would be needed to prove this fact but this is not required.
Once you find three scan/attack pairs, you should be able to answer Question 7.  


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

#df_all = pd.read_csv("sorted-AllTraffic.csv",sep='\t', lineterminator='\n')
#df_dio = pd.read_csv("sorted-dionaea.csv",sep='\t', lineterminator='\n')
#df_ssh = pd.read_csv("sorted-cowrie.csv",sep='\t', lineterminator='\n')

In [2]:
df_raw = pd.read_csv("sorted-AllTraffic.csv",sep='\t', lineterminator='\n')
df_raw["asn\r"]=df_raw["asn\r"].str.slice(stop=-1)
df_raw.timestamp = pd.to_datetime(df_raw.timestamp)
df_raw.loc[(df_raw['channel'] == "snort.alerts") & (df_raw['destination_ip'] == "192.168.10.2"),'channel'] = "snort.dionaea.connections"
df_raw.loc[(df_raw['channel'] == "snort.alerts") & (df_raw['destination_ip'] == "192.168.10.3"),'channel'] = "snort.wordpot.events"
df_raw.loc[(df_raw['channel'] == "snort.alerts") & (df_raw['destination_ip'] == "192.168.10.4"),'channel'] = "snort.shockpot.events"
df_raw.loc[(df_raw['channel'] == "snort.alerts") & (df_raw['destination_ip'] == "192.168.10.5"),'channel'] = "snort.cowrie.sessions"
df_raw.loc[(df_raw['channel'] == "snort.alerts") & (df_raw['destination_ip'] == "192.168.10.6"),'channel'] = "snort.glastopf.events"

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
df=df_raw[['timestamp', 'channel', 'source_ip',
       'destination_ip', 'destination_port', 'protocol',
       'city', 'country', 'asn\r']]

df.sample()

Unnamed: 0,timestamp,channel,source_ip,destination_ip,destination_port,protocol,city,country,asn
61555,2016-09-27 04:31:24.698,dionaea.connections,183.81.95.111,192.168.10.2,23,pcap,Hanoi,Vietnam,The Corporation for Financing & Promoting Tech...


In [4]:
df_ip = df_raw.groupby('source_ip').size().to_frame(name = 'occurence')
df_ip2 = df_ip[df_ip.occurence > 10].sort_index().reset_index()

# split IP to different columns
df_ip3 = pd.DataFrame(df_ip2.source_ip.str.split('.',expand=True).values.tolist(),columns = ['ip1','ip2','ip3','ip4'])
# combine both DF to the 2
df_ip2 = df_ip2.join(df_ip3)
# find IP with the first 3 groups with more than XX matches
df_ip4 = ((df_ip2.groupby(['ip1','ip2','ip3']).size().to_frame(name = 'match') > 10).reset_index())

In [5]:
# return the IPs have multiple matches
df_ip5 = df_ip4[df_ip4.match == True]
# recombine 3 groups of ip, a list of IPs that have lots of repeats
df_ip6 = df_ip5.apply(lambda x: x['ip1'] + '.' + x['ip2']+ '.' + x['ip3'] + '.',axis=1)

In [15]:
pd.set_option('max_colwidth',120)
pd.set_option('display.max_rows', 500)

list_ip = list(df_ip6)

def probe_ip(v):
    df_temp = df[df['source_ip'].str.contains(v)]
    df_ipfq = df_temp.groupby([pd.Grouper(key='timestamp', freq='7D'),'source_ip'])['channel'].agg(['count']).reset_index()
    df_ipfq2 = df_temp.groupby([pd.Grouper(key='timestamp', freq='7D'),'source_ip'])['channel'].apply(np.unique).reset_index()
    df_temp = pd.concat([df_ipfq,df_ipfq2.channel],join='outer',axis=1)
    display(df_temp)

In [23]:
print("List of IPs that has subnets attacking. ", list(df_ip6))

List of IPs that has subnets attacking.  ['121.18.238.', '121.205.66.', '141.212.122.', '158.85.81.', '164.52.0.', '168.1.128.', '182.18.8.', '184.105.139.', '184.105.247.', '196.52.43.', '216.218.206.', '221.194.44.', '31.207.47.', '45.55.21.', '58.242.83.', '59.45.175.', '74.82.47.', '77.72.82.', '89.163.242.', '91.195.103.', '91.211.0.', '93.174.93.']


6) Identify three IP scan/attack pairs. A scan/attack pair is where one IP is used to scan the honeypots and a second IP or more are used to attack the honeypots.   
a. State how you determine the three scan/attack pairs?  
b. Support your theory.  
Note: The attacking IP may or may not be within the same IP subnet.

** Answers**

The method I choose involve using subnets of the source IP. The IP addresses are first groupby source to merge all similar IPs and do an initial count, eliminating any IPs that has less than 10 counts. Then the IP addresses split into 4 new columns using period delimiter. A groupby groups the first 3 columns and does a recount and gets eliminated again if it is less than 10 - meaning there are less than 10 subnets. Lastly, using 3 columns to rejoin the final set of IPs. The final list consists of 3-prefixed IPs that has multiple subnets. Next I iterate through the list, performing a str.contain search each 3-prefix IP entries against source to get a full list of attacks from all the subnets. Last I manually observe through the list figuring out any patters, singling out IPs that does a multi honeypot-wide search and another set of IP(s) that follows up with a series of attacks.

Since the criterias are rather lenient, I am only focusing on catching the most obvious scan/attack pairs reside in the same subnet.

### Set 1

#### Probe 121.18.238.114
2017-04-04	121.18.238.114	11	[cowrie.sessions, snort.cowrie.sessions, snort.shockpot.events]  
2017-04-05	121.18.238.114	5	[cowrie.sessions, snort.shockpot.events, snort.wordpot.events]  
2017-04-06	121.18.238.114	16	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events, snort.wordpot.events]  

#### Probe by 121.18.238.122
2017-04-09	121.18.238.122	21	[cowrie.sessions, snort.cowrie.sessions]  
2017-04-16	121.18.238.122	3	[cowrie.sessions]  
2017-04-17	121.18.238.122	5	[cowrie.sessions, snort.shockpot.events]  
2017-04-18	121.18.238.122	43	[cowrie.sessions, snort.shockpot.events, snort.wordpot.events]  
2017-04-19	121.18.238.122	96	[cowrie.sessions, snort.cowrie.sessions, snort.shockpot.events]  
2017-04-20	121.18.238.122	8	[cowrie.sessions, snort.glastopf.events]  
2017-04-21	121.18.238.122	46	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.wordpot.events]  
2017-04-22	121.18.238.122	10	[cowrie.sessions, snort.wordpot.events]  
2017-04-23	121.18.238.122	28	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events]  
2017-04-24	121.18.238.122	10	[cowrie.sessions, snort.shockpot.events, snort.wordpot.events]  
2017-04-25	121.18.238.122	87	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.wordpot.events]  


#### Probe/Attack by 121.18.238.(106,119,123,125)

On May till August, .106, .119, .123, .125 were used to heavily attack into cowrie, shockpot, wordpot, glastopf on a daily basis.

2017-05-17	121.18.238.106	3	[cowrie.sessions, snort.wordpot.events]  
2017-05-17	121.18.238.119	54	[cowrie.sessions, snort.cowrie.sessions, snort.shockpot.events, snort.wordpot.events]  
2017-05-17	121.18.238.123	10	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events]  
2017-05-17	121.18.238.125	12	[cowrie.sessions, snort.cowrie.sessions, snort.wordpot.events]  
2017-05-18	121.18.238.106	13	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events, snort.wordpot.events]  
2017-05-18	121.18.238.119	7	[cowrie.sessions, snort.glastopf.events, snort.shockpot.events, snort.wordpot.events]  
2017-05-18	121.18.238.123	12	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events]  
2017-05-18	121.18.238.125	8	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events, snort.wordpot.events]  

Example of attacks that continued onwards till August.

390	2017-08-07	121.18.238.106	153	[cowrie.sessions, snort.cowrie.sessions]  
391	2017-08-07	121.18.238.119	199	[cowrie.sessions, snort.cowrie.sessions]  
392	2017-08-07	121.18.238.123	222	[cowrie.sessions, snort.cowrie.sessions]  
393	2017-08-07	121.18.238.125	162	[cowrie.sessions, snort.cowrie.sessions]  

### Set 2
#### Probe by 164.52.0.130

On July 17 and 24, Aug 14 and 21, .130 and .135 was used to scout into cowrie, glastopf, dionaea, shockpot and wordpot

2017-07-17	164.52.0.130	4	[snort.cowrie.sessions, snort.glastopf.events]  
2017-07-19	164.52.0.130	137	[dionaea.connections, snort.dionaea.connections, snort.shockpot.events, snort.wordpot.events]  

#### July 24 attack by 164.52.0.x

On July 24 .132, .134, 136, 138, 139, 140 were used to attack glastopf, shockpot, wordpot and mainly dionaea.

2017-07-31 18:50:37.479	164.52.0.136	190	[dionaea.connections]  
2017-07-31 18:50:37.479	164.52.0.137	198	[dionaea.connections]  
2017-07-31 18:50:37.479	164.52.0.138	102	[dionaea.connections]  
2017-07-31 18:50:37.479	164.52.0.139	111	[dionaea.connections]  
2017-07-31 18:50:37.479	164.52.0.140	206	[dionaea.connections]  

### Set 3

#### Probe by 221.194.44.224 in March and April
From March till April 2017, .224 conducted a series of scouting on all honeypots and stopped on Apr 25.

2017-03-31 221.194.44.224	57	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events, snort.wordpot.events]  
2017-04-07 221.194.44.224	53	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events, snort.wordpot.events]  
2017-04-21 221.194.44.224	178	[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events]



#### Attack by .212 from May till August
Attacks from .212 come in full force after the scout finished. The attacks on cowrie lasted for 3 months.

2017-06-16 221.194.44.212	189	[cowrie.sessions, snort.cowrie.sessions]  
2017-06-23 221.194.44.212	609	[cowrie.sessions, snort.cowrie.sessions]  
2017-06-30 221.194.44.212	114	[cowrie.sessions, snort.cowrie.sessions]  
2017-07-07 221.194.44.212	49	[cowrie.sessions, snort.cowrie.sessions]  
2017-07-14 221.194.44.212	1289	[cowrie.sessions, snort.cowrie.sessions]  
2017-07-21 221.194.44.212	1345	[cowrie.sessions, snort.cowrie.sessions]  


In [21]:
probe_ip(list_ip[11])

Unnamed: 0,timestamp,source_ip,count,channel
0,2016-10-21 11:28:57.242,221.194.44.160,7,"[dionaea.connections, snort.cowrie.sessions, snort.dionaea.connections]"
1,2016-11-11 11:28:57.242,221.194.44.219,50,"[cowrie.sessions, snort.cowrie.sessions, snort.dionaea.connections, snort.glastopf.events, snort.shockpot.events, sn..."
2,2016-11-11 11:28:57.242,221.194.44.224,25,"[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.wordpot.events]"
3,2016-11-18 11:28:57.242,221.194.44.219,120,"[cowrie.sessions, snort.cowrie.sessions, snort.dionaea.connections, snort.glastopf.events, snort.shockpot.events, sn..."
4,2016-11-18 11:28:57.242,221.194.44.224,51,"[cowrie.sessions, snort.cowrie.sessions, snort.shockpot.events, snort.wordpot.events]"
5,2016-11-25 11:28:57.242,221.194.44.160,1,[snort.shockpot.events]
6,2016-11-25 11:28:57.242,221.194.44.219,213,"[cowrie.sessions, snort.cowrie.sessions, snort.dionaea.connections, snort.glastopf.events, snort.shockpot.events, sn..."
7,2016-11-25 11:28:57.242,221.194.44.224,70,"[cowrie.sessions, snort.cowrie.sessions, snort.glastopf.events, snort.shockpot.events, snort.wordpot.events]"
8,2016-11-25 11:28:57.242,221.194.44.229,3,[dionaea.connections]
9,2016-12-02 11:28:57.242,221.194.44.219,250,"[cowrie.sessions, snort.cowrie.sessions, snort.dionaea.connections, snort.glastopf.events, snort.shockpot.events, sn..."


df[df.source_ip.str.contains('121.18.238')]

***