Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
909abcb
commit fec7193
Showing
5 changed files
with
112 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,5 @@ | ||
all: | ||
|
||
dblp: | ||
wget http://dblp.uni-trier.de/xml/dblp.xml | ||
|
||
clean: | ||
rm -rf *.xml *.json | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
all: | ||
|
||
extract: | ||
xsltproc --stringparam conf models filter.xslt _everything.xml > models.xml | ||
|
||
get: | ||
wget http://dblp.uni-trier.de/xml/dblp.xml -O _everything.xml | ||
|
||
clean: | ||
rm -rf *.xml | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
This directory contains the XML dump taken from DBLP: | ||
http://dblp.uni-trier.de/xml/ | ||
The data itself is distributed with an open data license ODC-BY: | ||
http://opendatacommons.org/licenses/by/1.0 | ||
|
||
The _everything.xml file is identical to dblp.xml, but preprocessed to be made self-contained: all SGML entities otherwise specified by dblp.dtd, are replaced with their Unicode counterparts. | ||
|
||
DO NOT run 'make get' on your machine unless you want to sacrifice about 1 GB of network traffic and 4 hours or so on preprocessing. Otherwise, go ahead. | ||
|
||
DO NOT run 'make clean' unless you managed to irreparably damage the XML files. Otherwise, go ahead. | ||
|
||
Yours, | ||
Vadim Zaytsev aka @grammarware, | ||
http://grammarware.net |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
#!/bin/sh | ||
|
||
echo 'Fixing umlauts...' | ||
sed -i 's/ä/ä/g' $1 | ||
sed -i 's/ë/ë/g' $1 | ||
sed -i 's/ï/ï/g' $1 | ||
sed -i 's/ö/ö/g' $1 | ||
sed -i 's/ü/ü/g' $1 | ||
sed -i 's/ÿ/ÿ/g' $1 | ||
echo 'Fixing umlauts on capitals...' | ||
sed -i 's/Ä/Ä/g' $1 | ||
sed -i 's/Ë/Ë/g' $1 | ||
sed -i 's/Ï/Ï/g' $1 | ||
sed -i 's/Ö/Ö/g' $1 | ||
sed -i 's/Ü/Ü/g' $1 | ||
sed -i 's/Ÿ/Ÿ/g' $1 | ||
echo 'Fixing acutes...' | ||
sed -i 's/á/á/g' $1 | ||
sed -i 's/é/é/g' $1 | ||
sed -i 's/í/í/g' $1 | ||
sed -i 's/ó/ó/g' $1 | ||
sed -i 's/ú/ú/g' $1 | ||
sed -i 's/ý/ý/g' $1 | ||
echo 'Fixing acutes on capitals...' | ||
sed -i 's/Á/Á/g' $1 | ||
sed -i 's/É/É/g' $1 | ||
sed -i 's/Í/Í/g' $1 | ||
sed -i 's/Ó/Ó/g' $1 | ||
sed -i 's/Ú/Ú/g' $1 | ||
sed -i 's/Ý/Ý/g' $1 | ||
echo 'Fixing graves...' | ||
sed -i 's/à/à/g' $1 | ||
sed -i 's/è/è/g' $1 | ||
sed -i 's/ì/ì/g' $1 | ||
sed -i 's/ò/ò/g' $1 | ||
sed -i 's/ù/ù/g' $1 | ||
sed -i 's/&ygrave;/ỳ/g' $1 | ||
echo 'Fixing graves on capitals...' | ||
sed -i 's/À/À/g' $1 | ||
sed -i 's/È/È/g' $1 | ||
sed -i 's/Ì/Ì/g' $1 | ||
sed -i 's/Ò/Ò/g' $1 | ||
sed -i 's/Ù/Ù/g' $1 | ||
sed -i 's/&Ygrave;/Ỳ/g' $1 | ||
echo 'Fixing tildes...' | ||
sed -i 's/ã/ã/g' $1 | ||
sed -i 's/õ/õ/g' $1 | ||
sed -i 's/ñ/ñ/g' $1 | ||
echo 'Fixing tildes in capitals...' | ||
sed -i 's/Ã/Ã/g' $1 | ||
sed -i 's/Õ/Õ/g' $1 | ||
sed -i 's/Ñ/Ñ/g' $1 | ||
echo 'Fixing rings and circumflexes...' | ||
sed -i 's/å/å/g' $1 | ||
sed -i 's/â/â/g' $1 | ||
sed -i 's/ĉ/ĉ/g' $1 | ||
sed -i 's/ê/ê/g' $1 | ||
sed -i 's/î/î/g' $1 | ||
sed -i 's/ô/ô/g' $1 | ||
sed -i 's/û/û/g' $1 | ||
echo 'Fixing rings and circumflexes in capitals...' | ||
sed -i 's/Å/Å/g' $1 | ||
sed -i 's/Â/Â/g' $1 | ||
sed -i 's/Ĉ/Ĉ/g' $1 | ||
sed -i 's/Ê/Ê/g' $1 | ||
sed -i 's/Î/Î/g' $1 | ||
sed -i 's/Ô/Ô/g' $1 | ||
sed -i 's/Û/Û/g' $1 | ||
echo 'Fixing other diacritics...' | ||
sed -i 's/ç/ç/g' $1 | ||
sed -i 's/Ç/Ç/g' $1 | ||
sed -i 's/ø/ø/g' $1 | ||
sed -i 's/Ø/Ø/g' $1 | ||
echo 'Fixing fancy letters...' | ||
sed -i 's/µ/µ/g' $1 | ||
sed -i 's/ß/ß/g' $1 | ||
sed -i 's/æ/æ/g' $1 | ||
sed -i 's/Æ/Æ/g' $1 | ||
sed -i 's/œ/œ/g' $1 | ||
sed -i 's/Œ/Œ/g' $1 | ||
sed -i 's/ð/ð/g' $1 | ||
sed -i 's/Ð/Ð/g' $1 | ||
sed -i 's/þ/þ/g' $1 | ||
sed -i 's/Þ/Þ/g' $1 | ||
echo 'Fixing miscellaneous signs...' | ||
sed -i 's/×/×/g' $1 | ||
sed -i 's/®/®/g' $1 |