# Exercise 1

Write a programme that will read the log file with users activity in the system.
Display the information of how much time user spent in the system as the sum of 
numbers read from the file.

Time in the system:
- user-1: 92 s
- user-2: 51 s
- user-3: 20 s

How we can do it?
- open the file in reading mode
- we can use a for loop to iterate through all the lines in the file
- .strip() will remove non-printable characters from the beginning and the end of the line
- split the line (`split()`) by ';'

To store the information about user and time spent in the system we can use dictionary (key = user name, value = number of seconds)

In [5]:
total_time = {}

with open('logs_simple.txt', 'r') as file_handle:
    for line in file_handle:
        # 'user-5;37\n' -> strip('user-5;37\n') -> 'user-5;37' -> split('user-5;37') -> ['user-5', '37']
        user_name, time_spent = line.strip().split(';')
        if user_name not in total_time:
            total_time[user_name] = 0
            
        total_time[user_name] += int(time_spent)     
        
for user_name, time_spent in total_time.items():
    print(f"{user_name} - {time_spent}")

user-5 - 26065
user-4 - 46198
user-10 - 37595
user-3 - 9301
user-9 - 40589
user-1 - 44501
user-7 - 35360
user-6 - 27717
user-8 - 5731
user-2 - 27958


In [7]:
total_time

{'user-5': 26065,
 'user-4': 46198,
 'user-10': 37595,
 'user-3': 9301,
 'user-9': 40589,
 'user-1': 44501,
 'user-7': 35360,
 'user-6': 27717,
 'user-8': 5731,
 'user-2': 27958}

I don't like the output where the users are not sorted.

How we can sort those users?

`total_time.sort()` -> this will not work as dictionary does not contain `.sort()` method, lists do.

Any workaround? We can try to use `sorted(ITERABLE)` function. This function accepts any iterable as argument and returns the list of sorted elements from this iterable. 

In [9]:
sorted(total_time)

['user-1',
 'user-10',
 'user-2',
 'user-3',
 'user-4',
 'user-5',
 'user-6',
 'user-7',
 'user-8',
 'user-9']

In [10]:
for user_name in sorted(total_time):
    print(f"{user_name} - {total_time[user_name]}")

user-1 - 44501
user-10 - 37595
user-2 - 27958
user-3 - 9301
user-4 - 46198
user-5 - 26065
user-6 - 27717
user-7 - 35360
user-8 - 5731
user-9 - 40589


Still we have issues with `user-10` which is not sorted correctly. That's a standard behaviour of sorting strings in python but there is a way to sort those user names better. 

For that we can use external library [natsort](https://pypi.org/project/natsort/).

To install external libraries we have to find the name of the library (google...) and then we can use [pypi](https://pypi.org/) to get the proper library.

Once we know the name of the library we want to install we can use `pip` (python install package) tool to install particular library within virtual environment.

If we want to execute any console command (including `pip`) we have to start the code cell with `!`.

In [11]:
!pip install natsort



In [13]:
from natsort import natsorted

natsorted(total_time)

['user-1',
 'user-2',
 'user-3',
 'user-4',
 'user-5',
 'user-6',
 'user-7',
 'user-8',
 'user-9',
 'user-10']

In [15]:
for user_name in natsorted(total_time):
    print(f"{user_name} - {total_time[user_name]}")

user-1 - 44501
user-2 - 27958
user-3 - 9301
user-4 - 46198
user-5 - 26065
user-6 - 27717
user-7 - 35360
user-8 - 5731
user-9 - 40589
user-10 - 37595


In [17]:
total_time

{'user-5': 26065,
 'user-4': 46198,
 'user-10': 37595,
 'user-3': 9301,
 'user-9': 40589,
 'user-1': 44501,
 'user-7': 35360,
 'user-6': 27717,
 'user-8': 5731,
 'user-2': 27958}

In [18]:
import pandas as pd

In [20]:
df = pd.DataFrame({
    'total_time': total_time.values()
}, index=total_time.keys())
df

Unnamed: 0,total_time
user-5,26065
user-4,46198
user-10,37595
user-3,9301
user-9,40589
user-1,44501
user-7,35360
user-6,27717
user-8,5731
user-2,27958


In [23]:
df.loc['user-1', 'total_time']

44501

In [24]:
df['total_time'].mean(), df['total_time'].min(), df['total_time'].max(), df['total_time'].sum()

(30101.5, 5731, 46198, 301015)