Extracting some insights by tranforming Apache access logs and visualizing through plots
View the main notebook here to render the dynamic map
Data contains Apache Access logs obtained from open source freely available sources.
The access logs contain a total of 6 million+ rows.
It has not been uploaded here to GitHub due to size constraints, you can see the urls in the "GetData.ipynb" notebook.
- ProcessLogs.ipynb - Main notebook containing all the Transformations and plot
- ProcessRDDLogs.ipynb - Trying to tranform the same data using RDD, took much more time, hence abandoned
- iplocation/ - Contains the python code for the web crawler built on scrapy
- GetData.ipynb - Helper notebook to download the data and process it, also includes some RnD
- locations.json - contains location information based on the IP (obtained by crawling the web with a web crawler(iplocation) built on scrapy)