The following code will take a list of IPs and use a database to find from which country the IP originates. The list of IPs used here (called 'ip.txt') is taken from a list of known malicious addresses (http://www.malwaredomainlist.com/forums/index.php?topic=3270.0). The database referencing countries by IP addresses is downloaded from (https://dev.maxmind.com/geoip/legacy/geolite/):

In [13]:
import pygeoip
import os
import csv
gi = pygeoip.GeoIP('GeoIP.dat')
os.getcwd()
os.chdir('/home/sharding/Downloads')
with open('ip.txt', 'r') as f:
    reader = csv.reader(f)
    IPList=list(reader)
IPListAdjusted=[]
for i in range(0,len(IPList),1):
    IPListAdjusted.append(IPList[i][0])
IPCountryName=[]
for i in range(0,len(IPListAdjusted),1):
    IPCountryName.append(gi.country_name_by_addr(IPListAdjusted[i]))
print(IPCountryName[0:100])

['India', 'India', 'Australia', 'Indonesia', 'Hong Kong', 'Hong Kong', 'Australia', 'Thailand', 'Malaysia', 'India', 'India', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'Romania', 'Romania', 'Italy', 'Italy', 'Russian Federation', 'Russian Federation', 'Russian Federation', 'Russian Federation', 'United Kingdom', 'United Kingdom', 'Israel', 'Portugal', 'Netherlands', 'Ukraine', 'Korea, Republic of', 'China', 'Japan', 'Hong Kong', 'Korea, Republic of', 'Japan', 'Vietnam', 'China', 'Bangladesh', 'Korea, Republic of', 'Japan', 'China', 'Korea, Republic of', 'Australia', 'Korea, Republic of', 'China', 'China', 'China', 'Korea, Republic of', 'China', 'Vietnam', 'China', 'India', 'Japan', 'China', 'China', 'Korea, Republic of', 'Korea, Republic of', 'China', 'China', 'China', 'Vietnam', 'Vietnam', 'Korea, Republic of

First we import some necessary packages. 'pygeoip' is a package that runs in tandem with the database we downloaded, and helps to reference the entries inside. 'os' is to allow us to navigate directories and find necessary files without leaving python. 'csv' is needed to correctly handle the text file and read it into a workable format.



The first 100 IP countries are shown. Some of the IPs, particularly near the end of the large list, return blank country names (''), most likely because the database from which we are referencing the country names is incomplete. It is easy to remove these unknown values:

In [4]:
IPCountryNameAdjusted=[x for x in IPCountryName if not x=='']
print(IPCountryNameAdjusted[0:100])
len(IPCountryNameAdjusted)

['India', 'India', 'Australia', 'Indonesia', 'Hong Kong', 'Hong Kong', 'Australia', 'Thailand', 'Malaysia', 'India', 'India', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'United States', 'Romania', 'Romania', 'Italy', 'Italy', 'Russian Federation', 'Russian Federation', 'Russian Federation', 'Russian Federation', 'United Kingdom', 'United Kingdom', 'Israel', 'Portugal', 'Netherlands', 'Ukraine', 'Korea, Republic of', 'China', 'Japan', 'Hong Kong', 'Korea, Republic of', 'Japan', 'Vietnam', 'China', 'Bangladesh', 'Korea, Republic of', 'Japan', 'China', 'Korea, Republic of', 'Australia', 'Korea, Republic of', 'China', 'China', 'China', 'Korea, Republic of', 'China', 'Vietnam', 'China', 'India', 'Japan', 'China', 'China', 'Korea, Republic of', 'Korea, Republic of', 'China', 'China', 'China', 'Vietnam', 'Vietnam', 'Korea, Republic of

967

We are left with about 96% of our original data.

In [7]:
import pandas as pd
IPCountrySeries=pd.Series(IPCountryNameAdjusted)
counts=IPCountrySeries.value_counts()
print(counts[0:20])

United States         317
Russian Federation     71
Germany                60
France                 53
Netherlands            46
China                  41
Ukraine                36
United Kingdom         36
Italy                  33
Korea, Republic of     26
Canada                 19
Brazil                 16
Poland                 15
Czech Republic         14
Japan                  12
Turkey                 11
Austria                11
Singapore              10
Romania                 8
Sweden                  8
dtype: int64


This code gives the number of times that each country occures in the list of dangerous IP addresses.

In [9]:
os.getcwd()

'/home/sharding/Downloads'