forked from mmistakes/minimal-mistakes
Indiana Click Data
Wang Cheng-Jun edited this page Dec 19, 2016
·
1 revision
http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/
The data was generated by applying a Berkeley Packet Filter to a mirror of the traffic passing through the border router of Indiana University. This filter matched all traffic destined for TCP port 80. A long-running collection process used the pcap library to gather these packets, then applied a small set of regular expressions to their payloads to determine whether they contained HTTP GET requests. If a packet did contain a request, the collection system logged a record with the following fields: a timestamp the requested URL the referring URL a boolean classification of the user agent (browser or bot) a boolean flag for whether the request was generated inside or outside IU.