# Project Explained

- The goal of this project is to provide a user-friendly web server log analysis system that automates log processing and produces useful insights. 
- We have reviewed the available methods and tools for web server log analysis in order to design the system. We utilise machine learning techniques to automatically detect patterns in log data and deliver tailored insights based on the application type.
- Along with this, we offer a user-friendly interface that enables users to rapidly spot performance bottlenecks, security concerns, and user behaviour trends, our technology attempts to meet this demand.

# Aim

- The objective of this project is to use various data analysis and machine learning approaches to examine web server logs and get useful insights from them. Python will be used to do data analysis, alter data, and create machine learning models in order to find trends and abnormalities in web server log data.
- The pre-processing of the log data, feature engineering, and application of machine learning models to the pre-processed data will all be part of the project's efforts to distinguish between regular and abnormal traffic.

# Features Explained

**1. Host** : The IP address or hostname of the client that made the request. **Datatype**: integer or string.<br>
**2. Date** : The date when the request was made. **Datatype**: date or string.<br>
**3. Method** : The HTTP method used in the request (e.g., GET, POST). **Datatype**: string.<br>
**4. Endpoint** : The endpoint of the request (e.g., /config/getuser). **Datatype**: string.<br>
**5. Protocol** :  The HTTP protocol used in the request (e.g., HTTP/1.1). **Datatype**: string.<br>
**6. Status Code** : The HTTP status code returned by the server (e.g., 200 OK). **Datatype**: integer or string.<br>
**7. Content Size** : The size of the response body in bytes. **Datatype**: integer.<br>
**8. No of Requests** : The number of requests made to this endpoint. **Datatype**: integer.<br>

# Model Deployment

In [None]:
import streamlit as st
from pymongo import MongoClient
import pandas as pd
import numpy as np
import joblib

In [None]:
model1 = joblib.load('model1.pkl')
model2 = joblib.load('model4.pkl')
df = pd.read_csv('third_result.csv')

In [None]:
ip = [i for i in df['Host']]
method = [i for i in df['Method']]
endpoint = [i for i in df['Endpoint']]
protocol = [i for i in df['Protocol']]
status_code = [i for i in df['Status Code']]
content_size = [i for i in df['Content Size']]
no_of_requests = [i for i in df['No_of_Request']]

if st.button('Select a Random IP Configuration to Test'):
    l1 = df.sample().tolist()
    st.write('IP Address Selected : ',ip_val)
    st.write('Method Selected : ',method_val)
    st.write('Endpoint Selected : ',endpoint_val)
    st.write('Protocol Selected : ',protocol_val)
    st.write('Status Code Selected : ',status_code_val)
    st.write('Content Size Selected : ',content_size_val)
    st.write('No of Requests Selected : ',no_of_requests_val)
else:
    st.write('')

In [None]:
def predict(model, lists):
    data = pd.DataFrame({
        'Host': [list1[0]],
        'Method': list1[1],
        'Endpoint': list1[2],
        'Protocol': list1[3],
        'Status Code': list1[4],
        'Content Size': list1[5],
        'No of Requests': list1[6]
    })

    prediction = model.predict(data)

    return prediction

In [None]:

prediction1 = predict(model1, list1)
prediction2 = predict(model4, list1)

st.markdown('## Prediction')
st.write('The predicted value using model 1 is: ', prediction1)
st.write('The predicted value using model 2 is: ', prediction2)

# Model Statistics