# A simple report of Web Page visitors

The task of this notebook is to create a simple report from a made up "log-file" of visitors of a website.
The input-file ('log.txt') contains a list of page visits with a timestamp for the time the page was accessed, an url for which page that was visited and a user id to uniquely identify the visitor. 

The purpose is to process this data and create a report that for each page displays the number of page views and unique visitors that page had for a given time range.

## Step 1

The first step is to read the file and import the content. The timestamp is used to create a DateTime object for future use in the search. Each line from the file is then stored as a dictionary in a list.

In [None]:
import datetime
date_time_format = '%Y-%m-%d %H:%M:%S%Z'
log_list = []
dict_list = []
dict_keys = ['timestamp', 'url', 'userid', 'datetime']

with open('log.txt', 'r') as log:
    
    next(log)    
    for line in log:
        log_list.append(line.split('|')[1:4])
    
    for line in log_list:
        length_of_line = len(line)
        
        for item in range(length_of_line):
            line[item] = line[item].strip()
    
    for line in log_list:
        current_timestamp = line[0]
        date_time_obj = datetime.datetime.strptime(current_timestamp, date_time_format)
        line.append(date_time_obj)
        dict_list.append(dict(zip(dict_keys, line)))
                         
                         
print(dict_list)

## Step 2
The second step is to create the variables that the web master will use to enter what time range to search for.
(Change the values to vary the search)

In [None]:
time_zone = 'UTC'
search_start_date = '2013-09-01'
search_start_time = '09:00:00'  
search_end_date = '2013-09-01' ##optional
search_end_time = '10:59:59' 

The variables are then used to create DateTime objects to be used for comparison in the search.

In [None]:
if search_end_date == '' or search_end_date is None:
    search_end_date = search_start_date
    
search_start = datetime.datetime.strptime("{} {}{}".format(search_start_date, search_start_time, time_zone), date_time_format)
search_end = datetime.datetime.strptime("{} {}{}".format(search_end_date, search_end_time, time_zone), date_time_format)

print('The time range chosen is: {} - {}'.format(search_start, search_end.time() if search_start.date() == search_end.date() else search_end))



## Step 3
Then the actual search is performed and all objects that maches the given time range is stored in a new list.

In [None]:
search_results = []
print('The objects in the original list that matches the search:\n')
for dct in dict_list:
    checker = None
    if dct['datetime'] >= search_start and dct['datetime'] <= search_end:
        checker = True
        del dct['datetime']
        search_results.append(dct)
    print(checker)

print('\nThe new list with the results of the search: \n\n{}\n'.format(search_results))
    

All the unique pages that was visited during the given time range is then extracted and stored in a list.

In [None]:
pages_visited = []

for result in search_results:
    if result['url'] not in pages_visited:
        pages_visited.append(result['url'])

print('The unique pages that was visited during the given time range:\n\n {}'.format(pages_visited))

## Step 4

Then the number of visits and unique visitors is calculated and is stored for each page as dictionaries in a new list that will be used to present the results.

In [None]:
final_report = []

for page in pages_visited:
    page_views = 0
    number_of_visitors = 0
    visitors = []
    for result in search_results:
        if result['url'] == page:
            page_views += 1
            if result['userid'] not in visitors:
                visitors.append(result['userid'])
    number_of_visitors = len(visitors)
    final_report.append(dict({'page': page, 'page views': page_views, 'visitors': number_of_visitors}))

    print(final_report)

Finally the report is presnted in a data frame. The data frame could then be used to export the report into a CSV-file.

In [None]:
import pandas as pd

df = pd.DataFrame(data=final_report)
df.rename(columns={'page': 'URL', 'page views': 'Page Views', 'visitors': 'Visitors'}, inplace=True)

df