Skip to content

Commit 990f05c

Browse files
committed
Web Server log analyzer
1 parent 0ee50ad commit 990f05c

File tree

3 files changed

+74
-1
lines changed

3 files changed

+74
-1
lines changed

SCRIPTS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,4 +118,5 @@
118118
| 109\. | Fetch Contributions | This script is a Python tool that fetches pull requests made by a user in a GitHub organization. | [Take Me](./Fetch_Contributions/) | [Sabhi Sharma](https//github.com/sabhisharma-ise)
119119
| 109\. | Domain Name Availability Checker | This script is a Python tool that allows you to check the availability of domain names using the GoDaddy API. | [Take Me](./Domain_Name_Availability/) | [Sabhi Sharma](https//github.com/sabhisharma-ise)
120120
| 110\. | Automatic Spelling Checker and Corrector | This Script is used to detect spelling errors in a text and correct them if the user wishes to do so. | [Take Me](./Automatic_Spelling_Checker_Corrector/) | [Sabhi Sharma](https//github.com/sabhisharma-ise)
121-
| 111\. | File Searcher | The File Search script is a Python tool that allows you to search for files with a specific extension in a directory. It recursively searches through all subdirectories of the specified directory and returns a list of files that match the provided file extension. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/File\Search) | [Srujana Vanka](https://github.com/srujana-16)
121+
| 111\. | File Searcher | The File Search script is a Python tool that allows you to search for files with a specific extension in a directory. It recursively searches through all subdirectories of the specified directory and returns a list of files that match the provided file extension. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/File\Search) | [Srujana Vanka](https://github.com/srujana-16)
122+
| 112\. | Web Server Log Analysis Script | A Python script to parse and analyze web server logs to extract useful information such as visitor statistics, popular pages, and potential security threats. | [Take Me](https://github.com/avinashkranjan/Amazing-Python-Scripts/tree/master/WebServer) | [Shraddha Singh](https://github.com/shraddha761)

WebServer/Readme.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Web Server Log Analysis Script
2+
3+
A Python script to parse and analyze web server logs to extract useful information such as visitor statistics, popular pages, and potential security threats.
4+
5+
6+
## Introduction
7+
8+
This script is designed to analyze web server logs in the Apache Combined Log Format and provide useful statistics. It extracts information such as the total number of requests, unique visitors, popular pages, status codes, and potential security threats from the logs.
9+
10+
## Features
11+
12+
- Parses Apache Combined Log Format logs
13+
- Counts total requests and unique visitors
14+
- Identifies popular pages based on visit frequency
15+
- Reports status code distribution
16+
- Detects potential security threats based on HTTP status codes
17+
18+
## Prerequisites
19+
20+
- Python 3.7 or higher
21+

WebServer/webServerlog.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import re
2+
from collections import Counter
3+
4+
# Regular expressions for parsing the Apache Combined Log Format
5+
log_pattern = r'^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+)\s?(\S+)?\s?(\S+)?" (\d{3}) (\d+|-)'
6+
7+
def parse_log(log_file_path):
8+
with open(log_file_path, 'r') as log_file:
9+
for line in log_file:
10+
match = re.match(log_pattern, line)
11+
if match:
12+
yield match.groups()
13+
14+
def analyze_logs(log_file_path):
15+
# Initialize counters and sets to store information
16+
total_requests = 0
17+
unique_visitors = set()
18+
page_visits = Counter()
19+
status_codes = Counter()
20+
potential_threats = set()
21+
22+
for ip, _, _, _, _, _, url, status_code, _ in parse_log(log_file_path):
23+
total_requests += 1
24+
unique_visitors.add(ip)
25+
page_visits[url] += 1
26+
status_codes[status_code] += 1
27+
28+
# Detect potential security threats (e.g., 404 errors from the same IP)
29+
if status_code.startswith('4'):
30+
potential_threats.add((ip, url))
31+
32+
return total_requests, len(unique_visitors), page_visits, status_codes, potential_threats
33+
34+
if __name__ == "__main__":
35+
log_file_path = "path/to/your/log/file.log"
36+
37+
total_requests, unique_visitors, page_visits, status_codes, potential_threats = analyze_logs(log_file_path)
38+
39+
print(f"Total Requests: {total_requests}")
40+
print(f"Unique Visitors: {unique_visitors}")
41+
print("\nPopular Pages:")
42+
for page, count in page_visits.most_common(10):
43+
print(f"{page}: {count} visits")
44+
45+
print("\nStatus Codes:")
46+
for code, count in status_codes.items():
47+
print(f"Status Code {code}: {count} occurrences")
48+
49+
print("\nPotential Security Threats:")
50+
for ip, url in potential_threats:
51+
print(f"IP: {ip}, URL: {url}")

0 commit comments

Comments
 (0)