This repository has been archived by the owner on Oct 15, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Initial checking of iso 3166 codes.
Fully functional. The format of output.txt may need to be tweaked.
- Loading branch information
Showing
6 changed files
with
66 additions
and
0 deletions.
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
http://www.iso.org/iso/list-en1-semic-3.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
wget -O - -i data.url > raw.data | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# This is the name of the source as people would refer to it, e.g. Wikipedia or PerlDoc | ||
Name: ISO 3166 code lists | ||
|
||
# This is the base domain where the source pages are located. | ||
Domain: www.iso.org | ||
|
||
# This is what gets put in quotes next to the source | ||
# It can be blank if it is a source with completely general info spanning many types of topics like Facebook. | ||
Type: ISO 3166 | ||
|
||
# Whether the source is from MediaWiki (1) or not (0). | ||
MediaWiki: 0 | ||
|
||
# Keywords uses to trigger (or prefer) the source over others. | ||
Keywords: iso, 3166 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/usr/bin/python | ||
# -*- coding: utf-8 -*- | ||
|
||
# Released under the GPL v2 license | ||
# https://www.gnu.org/licenses/old-licenses/gpl-2.0.html | ||
|
||
import lxml.etree, lxml.html | ||
import re | ||
|
||
url = "http://www.iso.org/iso/list-en1-semic-3.txt" | ||
title = "ISO 3166 Country Codes" | ||
article_type = "A" | ||
|
||
outp = "output.txt" | ||
inp = "raw.data" | ||
|
||
#Open input file | ||
input_file = open( inp, "r" ) | ||
|
||
#Read and throw out first line | ||
input_file.readline() | ||
|
||
output_file = open( outp, "w") | ||
|
||
#Loop thru the remainder of the file, format each line | ||
#and print it to the output file. | ||
for line in input_file.readlines() : | ||
line = line.strip(); | ||
pair = line.split( ';' ); | ||
if len( pair ) < 2 : | ||
continue; | ||
output_file.write( "\t".join ( [ pair[ 1 ], | ||
"", | ||
url, | ||
pair[ 0 ], | ||
"", | ||
"", | ||
"", | ||
"" ] ) | ||
); | ||
output_file.write( "\n" ); | ||
|
||
input_file.close(); | ||
output_file.close(); | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#!/bin/bash | ||
python parse.py | ||
|