<a href="https://colab.research.google.com/github/dilaraogz/webtrafficlog/blob/main/webtrafficlogipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Gerekli kütüphaneleri yükleyelim
!pip install pandas scikit-learn faiss-cpu transformers

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
import faiss
from transformers import pipeline




In [None]:
# Web trafik loglarını içeren CSV dosyasını yükleyelim
from google.colab import files

uploaded = files.upload()  # CSV dosyanızı buradan yükleyin

# Dosyanın adını belirtin
file_name = 'web_traffic_logs.csv'

# CSV dosyasını okuyalım
log_data = pd.read_csv(file_name)

# Gerekli alanları seçelim ve temizleyelim
selected_data = log_data[['IP Address', 'Timestamp', 'Method', 'URL', 'Status Code', 'User Agent']]
cleaned_data = selected_data[selected_data['Status Code'] != 500]

# Veriyi metin formatına dönüştürelim
cleaned_data['text_data'] = cleaned_data['URL'] + ' ' + cleaned_data['User Agent']

# TF-IDF ile vektörleştirme
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(cleaned_data['text_data'])

# Vektör boyutlarını kontrol edelim
X.shape


Saving web_traffic_logs.csv to web_traffic_logs (1).csv


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_data['text_data'] = cleaned_data['URL'] + ' ' + cleaned_data['User Agent']


(766, 40)

In [None]:
# FAISS index oluşturma
d = X.shape[1]  # Vektör boyutu
index = faiss.IndexFlatL2(d)
index.add(X.toarray())

# Dil modeli kurulumunu yap
generator = pipeline('text-generation', model='gpt2')

# Örnek bir sorgu oluşturma ve FAISS ile en uygun kayıtları bulma
query = "GET /index.html Mozilla/5.0"
query_vec = vectorizer.transform([query]).toarray()
D, I = index.search(query_vec, 5)  # En yakın 5 komşu

# Bulunan log kayıtlarını birleştirelim
retrieved_logs = ' '.join(cleaned_data.iloc[I[0]]['text_data'].tolist())

# Sorguya uygun yanıt oluşturma, max_new_tokens parametresini kullanarak
response = generator(retrieved_logs, max_new_tokens=50, num_return_sequences=1)




# Yanıtı görüntüleme
print(response[0]['generated_text'])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


/index.html Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /index.html Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /index.html Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /index.html Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /index.html Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /index.html Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.


In [None]:
def ask_question(query):
    # Sorguyu vektörleştir
    query_vec = vectorizer.transform([query]).toarray()

    # FAISS ile en yakın komşuları bul
    D, I = index.search(query_vec, 5)

    # Bulunan log kayıtlarını birleştir
    retrieved_logs = ' '.join(cleaned_data.iloc[I[0]]['text_data'].tolist())

    # Dil modeli ile yanıt oluştur, max_new_tokens kullanarak
    response = generator(retrieved_logs, max_new_tokens=50, num_return_sequences=1)

    return response[0]['generated_text']

# Sistemi test edelim
print(ask_question("GET /products Mozilla/5.0"))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


/products Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /products Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /products Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /products Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /products Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /products Mozilla/5.0 (Windows NT 10.0; Win64; x64]

It's an ugly simple, but useful tool. When looking at a set of images, you may notice that the "small" image is


In [None]:
import time

# Performans değerlendirmesi için test
start_time = time.time()
response = ask_question("POST /login Mozilla/5.0")
end_time = time.time()

print("Yanıt:", response)
print("Yanıt süresi:", end_time - start_time)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Yanıt: /login Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /login Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /login Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /login Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /login Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /login Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 /
Yanıt süresi: 4.109825134277344
