<a href="https://colab.research.google.com/github/AmirKatorza/Log_File_Analyze/blob/main/log_file_analyzing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Analyzing Log File**

In [11]:
import pandas as pd

col_names = ["Timestamp", "Log_Level", "integer1", "integer2", "message"]
df = pd.read_csv("exam.log", sep="\t", header=None, names=col_names)
df

Unnamed: 0,Timestamp,Log_Level,integer1,integer2,message
0,19-3-2020 06:39:34.157,INFO,11,188,Log 187 - user 5256010 at realtime 2023-07-02 ...
1,19-3-2020 06:39:34.196,INFO,14,193,no mailbox necessary for traffic-CONTROL
2,19-3-2020 06:39:34.240,INFO,15,192,mailbox allocated for rsvp-api
3,19-3-2020 06:39:34.280,INFO,10,191,"interface 127.0.0.1, entity for rsvp allocated..."
4,19-3-2020 06:39:34.324,INFO,13,190,parsing transaction entries
...,...,...,...,...,...
4443,19-3-2020 06:42:46.961,INFO,17,192,creating mailslot for dump
4444,19-3-2020 06:42:46.961,INFO,10,193,transaction 17333 begun
4445,19-3-2020 06:42:47.062,INFO,12,184,creating mailslot for RSVP
4446,19-3-2020 06:42:47.062,INFO,13,194,"transaction done, id=17333"


## **Return the first line of the file**

In [12]:
df.head(1)

Unnamed: 0,Timestamp,Log_Level,integer1,integer2,message
0,19-3-2020 06:39:34.157,INFO,11,188,Log 187 - user 5256010 at realtime 2023-07-02 ...


In [13]:
df.iloc[0]

Timestamp                               19-3-2020 06:39:34.157
Log_Level                                                 INFO
integer1                                                    11
integer2                                                   188
message      Log 187 - user 5256010 at realtime 2023-07-02 ...
Name: 0, dtype: object

## **How many lines with log level ERROR are there in the file?**

Each log line has a log level indication. For errors, this is always ERROR.

In [15]:
error_count = df[df["Log_Level"] == "ERROR"].shape[0]
print(f"Number lines with log level ERRORS is: {error_count}")

Number lines with log level ERRORS is: 25


## **How many transactions are in the logfile?**

In [17]:
num_transactions = df[df['message'].str.contains("transaction \d+ begun", case=False)].shape[0]
print(f"Number of transactions in the log file: {num_transactions}")

Number of transactions in the log file: 106


## **What is the ID of the fastest transaction, the one that took the least time?**

In [16]:
df['Timestamp'] = pd.to_datetime(df["Timestamp"], format='%d-%m-%Y %H:%M:%S.%f')

In [18]:
transactions_start_df = df.loc[df['message'].str.contains("transaction \d+ begun", case=False)].copy()
transactions_start_df = transactions_start_df.assign(transaction_id=transactions_start_df['message'].apply(lambda x: x.split()[1]))
transactions_start_df = transactions_start_df[['Timestamp', 'transaction_id']].copy()
transactions_start_df.rename(columns={"Timestamp": "start_time"}, inplace=True)
transactions_start_df

Unnamed: 0,start_time,transaction_id
47,2020-03-19 06:39:36.088,17018
62,2020-03-19 06:39:36.769,17021
77,2020-03-19 06:39:37.405,17024
192,2020-03-19 06:39:42.444,17027
202,2020-03-19 06:39:42.928,17030
...,...,...
4377,2020-03-19 06:42:43.888,17321
4397,2020-03-19 06:42:44.829,17324
4410,2020-03-19 06:42:45.435,17327
4423,2020-03-19 06:42:46.065,17330


In [19]:
transactions_end_df = df.loc[df['message'].str.contains("transaction done", case=False)].copy()
transactions_end_df = transactions_end_df.assign(transaction_id=transactions_end_df['message'].apply(lambda x: x.split("id=")[1]))
transactions_end_df = transactions_end_df[['Timestamp', 'transaction_id']].copy()
transactions_end_df.rename(columns={"Timestamp": "end_time"}, inplace=True)
transactions_end_df


Unnamed: 0,end_time,transaction_id
49,2020-03-19 06:39:36.268,17018
64,2020-03-19 06:39:36.890,17021
79,2020-03-19 06:39:37.561,17024
194,2020-03-19 06:39:42.621,17027
204,2020-03-19 06:39:43.030,17030
...,...,...
4379,2020-03-19 06:42:44.081,17321
4399,2020-03-19 06:42:45.016,17324
4412,2020-03-19 06:42:45.632,17327
4425,2020-03-19 06:42:46.197,17330


In [22]:
transactions_df = pd.merge(transactions_start_df, transactions_end_df)
cols = ['transaction_id', 'start_time', 'end_time']
transactions_df = transactions_df[cols]
transactions_df

Unnamed: 0,transaction_id,start_time,end_time
0,17018,2020-03-19 06:39:36.088,2020-03-19 06:39:36.268
1,17021,2020-03-19 06:39:36.769,2020-03-19 06:39:36.890
2,17024,2020-03-19 06:39:37.405,2020-03-19 06:39:37.561
3,17027,2020-03-19 06:39:42.444,2020-03-19 06:39:42.621
4,17030,2020-03-19 06:39:42.928,2020-03-19 06:39:43.030
...,...,...,...
101,17321,2020-03-19 06:42:43.888,2020-03-19 06:42:44.081
102,17324,2020-03-19 06:42:44.829,2020-03-19 06:42:45.016
103,17327,2020-03-19 06:42:45.435,2020-03-19 06:42:45.632
104,17330,2020-03-19 06:42:46.065,2020-03-19 06:42:46.197


In [27]:
transactions_df = transactions_df.assign(time_diff = transactions_df['end_time'] - transactions_df['start_time'])
transactions_df['time_diff'] = transactions_df['time_diff'].apply(lambda x: int(x.total_seconds() * 1000))
transactions_df

Unnamed: 0,transaction_id,start_time,end_time,time_diff
0,17018,2020-03-19 06:39:36.088,2020-03-19 06:39:36.268,180
1,17021,2020-03-19 06:39:36.769,2020-03-19 06:39:36.890,121
2,17024,2020-03-19 06:39:37.405,2020-03-19 06:39:37.561,156
3,17027,2020-03-19 06:39:42.444,2020-03-19 06:39:42.621,177
4,17030,2020-03-19 06:39:42.928,2020-03-19 06:39:43.030,102
...,...,...,...,...
101,17321,2020-03-19 06:42:43.888,2020-03-19 06:42:44.081,193
102,17324,2020-03-19 06:42:44.829,2020-03-19 06:42:45.016,187
103,17327,2020-03-19 06:42:45.435,2020-03-19 06:42:45.632,197
104,17330,2020-03-19 06:42:46.065,2020-03-19 06:42:46.197,132


In [34]:
fastest_transaction = transactions_df[transactions_df['time_diff'] == transactions_df['time_diff'].min()]['transaction_id'].values[0]
print(f"The fastest transaction is: {fastest_transaction}")

The fastest transaction is: 17211


## **What is the average transaction time in milliseconds?**


In [29]:
avg_transaction_time = transactions_df['time_diff'].mean()
print(f"The average transaction time in ms is: {avg_transaction_time}")

The average transaction time in ms is: 156.0
