Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

add script for reading corpus.leeds.ac.uk data

  • Loading branch information...
commit f7b5ae7c1addd1f8823da762dc2cfa21ee9c1929 1 parent 11d6542
@rspeer rspeer authored
Showing with 15,014 additions and 0 deletions.
  1. +15,000 −0 scripts/leeds-internet-ja.num
  2. +14 −0 scripts/reformat-leeds.py
View
15,000 scripts/leeds-internet-ja.num
15,000 additions, 0 deletions not shown
View
14 scripts/reformat-leeds.py
@@ -0,0 +1,14 @@
+import codecs
+infile = codecs.open('leeds-internet-ja.num', encoding='utf-8')
+outfile = codecs.open('../metanl/data/leeds-internet-ja.txt', 'w', encoding='utf-8')
+
+for line in infile:
+ line = line.strip()
+ if line:
+ rank, freq, token = line.split(' ')
+ freq = float(freq)
+ freq_int = int(freq*100)
+ if ',' not in token:
+ print >> outfile, u"%s,%d" % (token, freq_int)
+ print u"%s,%d" % (token, freq_int)
+outfile.close()
Please sign in to comment.
Something went wrong with that request. Please try again.