Skip to content

Commit

Permalink
Drop support for Python 2.7, fixes #40
Browse files Browse the repository at this point in the history
- update README
- drop support for Python 2.x module urlparse (replaced by
  urllib.parse)
  • Loading branch information
sebastian-nagel committed Mar 16, 2023
1 parent 54918e8 commit f72f905
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 13 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ pip install -r requirements.txt

## Compatibility and Requirements

Tested with Spark 2.1.0 – 2.4.6 in combination with Python 2.7 or 3.5, 3.6, 3.7, and with Spark 3.0.0 - 3.2.1 in combination with Python 3.7, 3.8 and 3.9.
Tested with with Spark 3.2.3 and 3.3.2 in combination with Python 3.8, 3.9 and 3.10. See the branch [python-2.7](/commoncrawl/cc-pyspark/tree/python-2.7) if you want to run the job on Python 2.7 and older Spark versions.


## Get Sample Data
Expand Down
7 changes: 1 addition & 6 deletions server_ip_address.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
import ujson as json

try:
# Python2
from urlparse import urlparse
except ImportError:
# Python3
from urllib.parse import urlparse
from urllib.parse import urlparse

from pyspark.sql.types import StructType, StructField, StringType, LongType

Expand Down
7 changes: 1 addition & 6 deletions wat_extract_links.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,7 @@

import ujson as json

try:
# Python2
from urlparse import urljoin, urlparse
except ImportError:
# Python3
from urllib.parse import urljoin, urlparse
from urllib.parse import urljoin, urlparse

from pyspark.sql.types import StructType, StructField, StringType

Expand Down

0 comments on commit f72f905

Please sign in to comment.