Skip to content
This repository has been archived by the owner on Oct 15, 2022. It is now read-only.

Commit

Permalink
Browse files Browse the repository at this point in the history
Initial checking of iso 3166 codes.
Fully functional.  The format of output.txt may need to be tweaked.
  • Loading branch information
cjfarrar committed Dec 6, 2011
1 parent c59a53d commit f6a6c5a
Show file tree
Hide file tree
Showing 6 changed files with 66 additions and 0 deletions.
Empty file added iso_3166_codes/README.txt
Empty file.
1 change: 1 addition & 0 deletions iso_3166_codes/data.url
@@ -0,0 +1 @@
http://www.iso.org/iso/list-en1-semic-3.txt
2 changes: 2 additions & 0 deletions iso_3166_codes/fetch.sh
@@ -0,0 +1,2 @@
wget -O - -i data.url > raw.data

15 changes: 15 additions & 0 deletions iso_3166_codes/meta.txt
@@ -0,0 +1,15 @@
# This is the name of the source as people would refer to it, e.g. Wikipedia or PerlDoc
Name: ISO 3166 code lists

# This is the base domain where the source pages are located.
Domain: www.iso.org

# This is what gets put in quotes next to the source
# It can be blank if it is a source with completely general info spanning many types of topics like Facebook.
Type: ISO 3166

# Whether the source is from MediaWiki (1) or not (0).
MediaWiki: 0

# Keywords uses to trigger (or prefer) the source over others.
Keywords: iso, 3166
45 changes: 45 additions & 0 deletions iso_3166_codes/parse.py
@@ -0,0 +1,45 @@
#!/usr/bin/python
# -*- coding: utf-8 -*-

# Released under the GPL v2 license
# https://www.gnu.org/licenses/old-licenses/gpl-2.0.html

import lxml.etree, lxml.html
import re

url = "http://www.iso.org/iso/list-en1-semic-3.txt"
title = "ISO 3166 Country Codes"
article_type = "A"

outp = "output.txt"
inp = "raw.data"

#Open input file
input_file = open( inp, "r" )

#Read and throw out first line
input_file.readline()

output_file = open( outp, "w")

#Loop thru the remainder of the file, format each line
#and print it to the output file.
for line in input_file.readlines() :
line = line.strip();
pair = line.split( ';' );
if len( pair ) < 2 :
continue;
output_file.write( "\t".join ( [ pair[ 1 ],
"",
url,
pair[ 0 ],
"",
"",
"",
"" ] )
);
output_file.write( "\n" );

input_file.close();
output_file.close();

3 changes: 3 additions & 0 deletions iso_3166_codes/parse.sh
@@ -0,0 +1,3 @@
#!/bin/bash
python parse.py

0 comments on commit f6a6c5a

Please sign in to comment.