Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for GeoIPOrg.dat #41

Closed
OkkeKlein opened this issue Nov 10, 2013 · 14 comments
Closed

Support for GeoIPOrg.dat #41

OkkeKlein opened this issue Nov 10, 2013 · 14 comments

Comments

@OkkeKlein
Copy link

Same issue as https://logstash.jira.com/browse/LOGSTASH-1394 it seems.

but for organisations in GeoIPOrg.dat.

Maybe same fix?

{:timestamp=>"2013-10-28T16:06:13.964000+0000", :message=>"Exception in filterworker", "exception"=>#<NoMethodError: undefined method to_hash' for "Teraspace GmbH":String>, "backtrace"=>["file:/opt/logstash/logstash-1.2.2-flatjar.jar!/logstash/filters/geoip.rb:104:in filter'", "(eval):220:ininitialize'", "org/jruby/RubyProc.java:271:in call'", "file:/opt/logstash/logstash-1.2.2-flatjar.jar!/logstash/pipeline.rb:250:infilter'", "file:/opt/logstash/logstash-1.2.2-flatjar.jar!/logstash/pipeline.rb:191:in filterworker'", "file:/opt/logstash/logstash-1.2.2-flatjar.jar!/logstash/pipeline.rb:134:in`start_filters'"], :level=>:error}

@cjheath
Copy link
Owner

cjheath commented Nov 11, 2013

I don't use logstash or Jira. If you do, please test the change and make a pull request. Thanks!

@OkkeKlein
Copy link
Author

Unfortunately I am not a programmer. And know nothing about Ruby. But if someone makes the changes, I will gladly test it with Logstash.

@cjheath
Copy link
Owner

cjheath commented Nov 11, 2013

I'm sorry, but you'll need to find someone who knows how to replicate this problem before anyone can attempt a fix.

@cjheath cjheath closed this as completed Nov 11, 2013
@batbast
Copy link

batbast commented Dec 11, 2013

Hi,

I can reproduce this bug ;
Exception in filterworker {"exception"=>#<NoMethodError: undefined method to_hash' for "MYORG":String>, "backtrace"=>["file:/opt/elasticsearch/logstash-1.2.2/logstash-1.2.2-flatjar.jar!/logstash/filters/geoip.rb:104:in filter'", "(eval):64:ininitialize'", "org/jruby/RubyProc.java:271:in call'", "file:/opt/elasticsearch/logstash-1.2.2/logstash-1.2.2-flatjar.jar!/logstash/pipeline.rb:250:infilter'", "file:/opt/elasticsearch/logstash-1.2.2/logstash-1.2.2-flatjar.jar!/logstash/pipeline.rb:191:in filterworker'", "file:/opt/elasticsearch/logstash-1.2.2/logstash-1.2.2-flatjar.jar!/logstash/pipeline.rb:134:in`start_filters'"], :level=>:error}

I made a correction for geoip.rb file (1.3.2 version, packaged with logstash 1.2.2), which is similar to the previous ASN bug (#38)

diff geoip.rb ../new/geoip.rb 155a156,163

class ISP < Struct.new(:isp)

def to_hash
  Hash[each_pair.to_a]
end

end
350c358

< record

ISP.new(record)

You can test it with an ORG file construct with a script, which is build from https://github.com/mteodoro/mmutils.
I can send you this new script, or an sample ORG dat file.

Thanks for your code.

@cjheath
Copy link
Owner

cjheath commented Dec 11, 2013

Thanks batbast.

I think I have applied your change correctly, but I don't have a license to access the ISP data. If you can send me a sample ISP data file, I'll test and release it....

... or you could just send me a pull request :)

@cjheath cjheath reopened this Dec 11, 2013
@batbast
Copy link

batbast commented Dec 12, 2013

Hi Clifford,

Thank you for your response.

I send you 3 files :
- a dat sample
- a csv sample
- a python script used to create the dat file from the csv file :
it is inspired from
https://github.com/mteodoro/mmutils/raw/master/csv2dat.py

Utilisation (install python-ipaddr package)

$ ./csvORG2dat.py -w organizations.dat mmorg_net organizations.csv
wrote 42-node trie with 5 networks (3 distinct labels) in 0 seconds

Test

$ geoiplookup -f organizations.dat 10.1.1.1
GeoIP Organization Edition: ORG1

On 2013-12-11 21:25, Clifford Heath wrote:

Thanks batbast.

I think I have applied your change correctly, but I don't have a
license to access the ISP data. If you can send me a sample ISP data
file, I'll test and release it....

... or you could just send me a pull request :)

Reply to this email directly or view it on GitHub [1].

Links:

[1] #41 (comment)
10.1.1.0/24,ORG1
10.10.0.0/16,ORG1
10.20.0.0/16,ORG2
10.1.3.0/24,ORG3
13.0.0.0/16,ORG3

#!/usr/bin/env python

Source : https://github.com/mteodoro/mmutils

import sys
import logging
import logging.handlers
import optparse

import csv
import fileinput
import itertools
import struct
import time

from functools import partial

import ipaddr

def init_logger(opts):
level = logging.INFO
handler = logging.StreamHandler()
#handler = logging.handlers.SysLogHandler(address='/dev/log')
if opts.debug:
level = logging.DEBUG
handler = logging.StreamHandler()
root = logging.getLogger()
root.setLevel(level)
root.addHandler(handler)

def parse_args(argv):
if argv is None:
argv = sys.argv[1:]
p = optparse.OptionParser()

cmdlist = []
for cmd, (f, usage) in sorted(cmds.iteritems()):
    cmdlist.append('%-8s\t%%prog %s' % (cmd, usage))
cmdlist = '\n  '.join(cmdlist)

p.usage = '%%prog [options] <cmd> <arg>+\n\nExamples:\n  %s' % cmdlist

p.add_option('-d', '--debug', action='store_true',
        default=False, help="debug mode")
p.add_option('-g', '--geoip', action='store_true',
        default=False, help='test with C GeoIP module')
p.add_option('-w', '--write-dat', help='write filename.dat')
opts, args = p.parse_args(argv)

#sanity check
if not args or args[0] not in cmds:
    p.error('missing command. choose from: %s' % ' '.join(sorted(cmds)))

return opts, args

def gen_csv(f):
"""peek at rows from a csv and start yielding when we get past the comments
to a row that starts with an int (split at : to check IPv6)"""
def startswith_int(row):
try:
int(row[0].split(':', 1)[0])
return True
except ValueError:
return False

cr = csv.reader(f)
#return itertools.dropwhile(lambda x: not startswith_int(x), cr)
return cr

class RadixTreeNode(object):
slots = ['segment', 'lhs', 'rhs']
def init(self, segment):
self.segment = segment
self.lhs = None
self.rhs = None

class RadixTree(object):
def init(self, debug=False):
self.debug = False

    self.netcount = 0
    self.segments = [RadixTreeNode(0)]
    self.data_offsets = {}
    self.data_segments = []
    self.cur_offset = 1

def __setitem__(self, net, data):
    self.netcount += 1
    inet = int(net)
    node = self.segments[0]
    for depth in range(self.seek_depth, self.seek_depth - (net.prefixlen-1), -1):
        if inet & (1 << depth):
            if not node.rhs:
                node.rhs = RadixTreeNode(len(self.segments))
                self.segments.append(node.rhs)
            node = node.rhs
        else:
            if not node.lhs:
                node.lhs = RadixTreeNode(len(self.segments))
                self.segments.append(node.lhs)
            node = node.lhs

    if not data in self.data_offsets:
        self.data_offsets[data] = self.cur_offset
        enc_data = self.encode(*data)
        self.data_segments.append(enc_data)
        self.cur_offset += (len(enc_data))

    if self.debug:
        #store net after data for easier debugging
        data = data, net

    if inet & (1 << self.seek_depth - (net.prefixlen-1)):
        node.rhs = data
    else:
        node.lhs = data

def gen_nets(self, opts, args):
    raise NotImplementedError

def load(self, opts, args):
    for nets, data in self.gen_nets(opts, args):
        for net in nets:
            self[net] = data

def dump_node(self, node):
    if not node:
        #empty leaf
        return '--'
    elif isinstance(node, RadixTreeNode):
        #internal node
        return node.segment
    else:
        #data leaf
        data = node[0] if self.debug else node
        return '%d %s' % (len(self.segments) + self.data_offsets[data], node)

def dump(self):
    for node in self.segments:
        print node.segment, [self.dump_node(node.lhs), self.dump_node(node.rhs)]

def encode(self, *args):
    raise NotImplementedError

def encode_rec(self, rec, reclen):
    """encode rec as 4-byte little-endian int, then truncate it to reclen"""
    assert(reclen <= 4)
    return struct.pack('<I', rec)[:reclen]

def serialize_node(self, node):
    if not node:
        #empty leaf
        rec = len(self.segments)
    elif isinstance(node, RadixTreeNode):
        #internal node
        rec = node.segment
    else:
        #data leaf
        data = node[0] if self.debug else node
        rec = len(self.segments) + self.data_offsets[data]
    return self.encode_rec(rec, self.reclen)

def serialize(self, f):
    if len(self.segments) >= 2 ** (8 * self.segreclen):
        logging.warning('too many segments for final segment record size!')

    for node in self.segments:
        f.write(self.serialize_node(node.lhs))
        f.write(self.serialize_node(node.rhs))

    f.write(chr(42)) #So long, and thanks for all the fish!
    f.write(''.join(self.data_segments))

    f.write('bat.bast') #.dat file comment - can be anything
    f.write(chr(0xFF) * 3)
    f.write(chr(self.edition))
    f.write(self.encode_rec(len(self.segments), self.segreclen))

class ORGIPRadixTree(RadixTree):
usage = '-w mmorg.dat mmorg_ip GeoIPORG.csv'
cmd = 'mmorg_ip'
seek_depth = 31
edition = 5
reclen = 4
segreclen = 4

def gen_nets(self, opts, args):
    for lo, hi, org in gen_csv(fileinput.input(args)):
        lo, hi = ipaddr.IPAddress(lo), ipaddr.IPAddress(hi)
        nets = ipaddr.summarize_address_range(lo, hi)
        #print 'lo %s - li %s - nets %s - org %s' % (lo, hi, nets, org)
        yield nets, (org,)

def encode(self, data):
    return data + '\0'

class ORGNetworkRadixTree(RadixTree):
usage = '-w mmorg.dat mmorg_net GeoIPORG.csv'
cmd = 'mmorg_net'
seek_depth = 31
edition = 5
reclen = 4
segreclen = 4

def gen_nets(self, opts, args):
    for net, org in gen_csv(fileinput.input(args)):
        net = [ipaddr.IPNetwork(net)]
        yield net, (org,)

def encode(self, data):
    return data + '\0'

def build_dat(RTree, opts, args):
tstart = time.time()
r = RTree(debug=opts.debug)

r.load(opts, args)

if opts.debug:
    r.dump()

with open(opts.write_dat, 'wb') as f:
    r.serialize(f)

tstop = time.time()
print 'wrote %d-node trie with %d networks (%d distinct labels) in %d seconds' % (
        len(r.segments), r.netcount, len(r.data_offsets), tstop - tstart)

rtrees = [ORGIPRadixTree, ORGNetworkRadixTree]
cmds = dict((rtree.cmd, (partial(build_dat, rtree), rtree.usage)) for rtree in rtrees)

def main(argv=None):
global opts
opts, args = parse_args(argv)
init_logger(opts)
logging.debug(opts)
logging.debug(args)

cmd = args.pop(0)
cmd, usage = cmds[cmd]
return cmd(opts, args)

if name == 'main':
rval = main()
logging.shutdown()
sys.exit(rval)

@cjheath
Copy link
Owner

cjheath commented Dec 12, 2013

I don't see any data files. I had already cloned the mmutils repository and got the conversion program running - I just didn't have any data to convert.

Also, I don't think that github's issue mail system is useful for sending files around. Put them in a gist if you must send files this way. Or attach them in normal email to clifford.heath@gmail.com

@cjheath
Copy link
Owner

cjheath commented Dec 17, 2013

Github had mangled your code, above. I figured it out, built the DAT file and tested it. The new version of GeoIP has been released, please test it.

@batbast
Copy link

batbast commented Dec 20, 2013

Hi Clifford,

There is a little mistake at the line 157 : replace lsp by isp

With this correction I have made 2 tests :

  • with a known IP --> OK
  • with an unknown IP --> OK

Le 2013-12-18 00:01, Clifford Heath a écrit :

Github had mangled your code, above. I figured it out, built the DAT
file and tested it. The new version of GeoIP has been released, please
test it.

Reply to this email directly or view it on GitHub [1].

Links:

[1] #41 (comment)

@cjheath
Copy link
Owner

cjheath commented Dec 20, 2013

Ouch, thanks. Update pushed.

@OkkeKlein
Copy link
Author

Thanx to everyobdy for contributing.

I will test and see if this works for logstash.

First weird thing is that it can't find gem with version 1.3.5. But when I change version to 1.3.4 it finds the 1.3.5 version and fetches it.

@OkkeKlein
Copy link
Author

Ok. Got it working in Logstash. ISP is now showing, but I am still missing Organization. Not sure if this is an extra field or if this value is stored in ISP.

@batbast
Copy link

batbast commented Dec 22, 2013

Organization file is similar to the ISP file, so I think the return
generated by the current code of geoip.rb is correct : it's only my opinion.

Le 21/12/2013 14:21, OkkeKlein a écrit :

Ok. Got it working in Logstash. ISP is now showing, but I am still
missing Organization.


Reply to this email directly or view it on GitHub
#41 (comment).

@OkkeKlein
Copy link
Author

Assuming this is true, the issue is closed. I appreciate the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants