Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Slight changes on the README and #! /usr/bin/env python

  • Loading branch information...
commit d8becf5a355d4d768117e1d3bb1cd0e0251853a3 1 parent 1430ea2
@alexksikes authored
View
4 README.md
@@ -55,6 +55,10 @@ For the sake of our IMDb example we run:
example/conf/ : the configuration files for extract.py.
example/tables/ : where to store the plain text tables.
+
+Why regular expressions are used instead of well know packages such Beautiful Soup?
+
+Because these modules do not scale well to millions of data. Although less expressive regular expressions are much faster in practice.
4. Populate
-----------
View
1  extract.py 100644 → 100755
@@ -1,3 +1,4 @@
+#! /usr/bin/env python
# Author: Alex Ksikes (alex.ksikes@gmail.com)
import codecs
View
1  populate.py 100644 → 100755
@@ -1,3 +1,4 @@
+#! /usr/bin/env python
# Author: Alex Ksikes (alex.ksikes@gmail.com)
import extract
View
0  repository.py 100644 → 100755
File mode changed
View
1  retrieve.py 100644 → 100755
@@ -1,3 +1,4 @@
+#! /usr/bin/env python
# Author: Alex Ksikes (alex.ksikes@gmail.com)
# Using code from pycurl retriever-multi.py example
Please sign in to comment.
Something went wrong with that request. Please try again.