log-analysis

A set of scripts to analyze the most popular posts on my blog.

I assume that posts urls are formatted like /{YEAR}/{MONTH_NUM}/{MONTH_DAY_NUM}/{post_title}. It assumes that the log files are standard apache logs. Especially those produced by NearlyFreeSpeech.net.

Overview

I use python to process multiple log files in parallel then reduce the results together and write out a csv.

Here's just one of the many types of plots it generates.

box and whisker grouped by quarter of total daily hits

How To Use

Edit the SSHLocation in retrievelogs.sh to be where you get your logs from. I use an ssh alias for mine.

You also need a json file named posttags.json formatted as below. It's an object where the keys are your posts, and have an array of strings associated with them. It was originally included to make some kind of visualization involving popular tags, but as of now it just names the live posts on my blog.

{
    "POST LOCATION": [
        "some tag",
        "some tag 2",
        "another tag"
        ],
    "POST LOCATION 2": [
        "some tag"
        ]
}

Then you can run from the current directory:

./retrievelogs.sh && ./run.sh && $YOUR_PDF_VIEWER Rplots.pdf

How Views Are Counted

As of right now, views are counted once per ip, post and date. The results are placed in articleViews.csv file. The fields are as follows in the same order:

path : The path to the post. We can derive the publication date of the post from this. You might not be able to if you don't use jekyll and the same naming structure.

date : when the posts were hit

hits : The number of hits for that date.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
log_viewer		log_viewer
.gitignore		.gitignore
README.md		README.md
animated_pub_date_v_cum_hits.R		animated_pub_date_v_cum_hits.R
awk_sort.sh		awk_sort.sh
bw.png		bw.png
clean-up-athena-tables.py		clean-up-athena-tables.py
facet_year_days.png		facet_year_days.png
log-analysis.Rproj		log-analysis.Rproj
log_count_views.py		log_count_views.py
make_plots.R		make_plots.R
out.png		out.png
retrievelogs.sh		retrievelogs.sh
rolling_median_notebook.Rmd		rolling_median_notebook.Rmd
run-with-athena.sh		run-with-athena.sh
run.sh		run.sh

HSchmale16/log-analysis

Folders and files

Latest commit

History

Repository files navigation

log-analysis

Overview

How To Use

How Views Are Counted

About

Topics

Resources

Stars

Watchers

Forks

Languages