Convert code base for Python 3.x #3

dipanjanS · 2017-05-21T16:48:46Z

Python 3 is the future and even though a lot of legacy code and systems run on Python 2 (including our applications, which is why I had written this book in Python 2 in the first place). We need to slowly start migrating and building our code, apps and systems based on Python 3.

Looking for experts in Python 3.x as well as NLP and text analytics who could help out in migrating each chapter's codebase to Python 3.x, since I am occupied for a major part of this year on other projects. I do have some parts of it ready for Python 3.x and can offer help and support whenever needed.

Successful codebase migrations will make sure you are mentioned as a contributor in the acknowledgements & contributor list of this repository and project. Also you will get a mention in future versions of the book whenever that is in the pipeline.

anubhav3itb · 2017-05-29T18:38:06Z

I think I can help you with this.

Salil999 · 2017-06-04T17:25:24Z

If anyone is interested, I have updated almost all of chapters 1 to 4. Chapters 5 to 7 are displaying some 'lfs' error. I will try to resolve that later, but feel free to fork and make pull requests.

Here's the repo: text-analytics-with-python

dipanjanS · 2017-06-04T18:00:03Z

Thanks, but like I said we need to follow a structured workflow and approach instead of working in an ad-hoc manner for this to have hassle free merges. Please wait before further conversions because I need to restructure the current repo sand put out a plan. I will do so in a couple of days.

Salil999 · 2017-06-04T19:28:25Z

Okay, will hold off on it. I would like to note that the module "pattern" does not seem to have support for Python 3 yet. This will hopefully change in the future, but Chapters 5-7 (and I think some of Chapter 4) are at a roadblock for now.

dipanjanS · 2017-06-04T19:42:08Z

Sure and yeah I'm aware of the issue with pattern. There is an unofficial port but it's incomplete sadly since the last couple of years. I've thought of some strategies to tackle the same. Let me restructure the current repository then we can get started on this in more detail. I'll update once that is done then we can port and merge chapter by chapter.

dipanjanS · 2017-06-06T10:43:40Z

Here is the first phase of the plan, once each step is done it will be checked off to keep track. I am currently on vacation so will update you guys as soon as the re-structuring is done.

Re-structure current repository @dipanjanS
Contributors to pull in latest changes
Port code for chapters 1-3 and send pull requests for each chapter separately
Merge subsequent pull requests to main repository after review @dipanjanS
Look into the pattern repository and necessary modules needed @dipanjanS
Discuss strategies for porting remaining chapters and post the plan for the same

ambientlight · 2017-07-03T12:46:33Z

@dipanjanS
This idea might sound a bit weird... but do you think it makes sense adding type hints into Python 3.x code examples?

it just might be a bit easier to read through the code in the book.
and code completion / correct jump to definition within PyCharm...

dipanjanS · 2017-07-03T16:13:56Z

@ambientlight Sorry I'm a bit tied up with work and a couple of other things so I'm not getting time to look into this. Maybe I will sometime soon. With regard to your query, are you talking about the type hints as in specifying the data type per variable in the code? If so maybe we can look into it once the entire code is ported.

ambientlight · 2017-07-03T16:22:15Z

@dipanjanS got it! thanks a lot!
method parameters and return types I think would be good enough. normally variable is evident enough from the rhs of the expression.

I ported few things up to CH4. Something like this:

class Normalizer:

    stopwords: List[str] = nltk.corpus.stopwords.words('english')
    wnl = WordNetLemmatizer()

    @staticmethod
    def tokenize_text(text: str) -> List[str]:
        tokens: List[str] = nltk.word_tokenize(text)
        tokens = [token.strip() for token in tokens]
        return tokens

    @staticmethod
    def expand_contractions(text: str, contraction_mapping: Dict[str, str]) -> str:
        contractions_pattern = re.compile('({})'.format('|'.join(contraction_mapping.keys())),
                                          flags=re.IGNORECASE | re.DOTALL)

        def expand_match(contraction):
            match = contraction.group(0)
            first_char = match[0]
            expanded_contraction = \
                contraction_mapping.get(match) \
                if contraction_mapping.get(match) \
                else contraction_mapping.get(match.lower())

            expanded_contraction = first_char + expanded_contraction[1:]
            return expanded_contraction

        expanded_text = contractions_pattern.sub(expand_match, text)
        expanded_text = re.sub("'", "", expanded_text)
        return expanded_text

    # Annotate text tokens with POS tags
    @staticmethod
    def pos_tag_text(text: str) -> List[Tuple[str, str]]:
        # convert Penn treebank tag to wordnet tag
        def penn_to_wn_tags(pos_tag):
            if pos_tag.startswith('J'):
                return wn.ADJ
            elif pos_tag.startswith('V'):
                return wn.VERB
            elif pos_tag.startswith('N'):
                return wn.NOUN
            elif pos_tag.startswith('R'):
                return wn.ADV
            else:
                return None

        tagged_text = nltk.pos_tag(Normalizer.tokenize_text(text))
        tagged_lower_text = [(word.lower(), penn_to_wn_tags(pos_tag)) for word, pos_tag in tagged_text]
        return tagged_lower_text

I can contribute the typing later on if it would be appropriate.

bkbonde · 2017-07-16T12:31:05Z

I am not sure if all is now ported to python 3, if not I can contribute, I will checkout repo and add some tests for python 3
Bhushan

dipanjanS · 2017-07-16T12:36:11Z

@ambientlight @pribond

Sure, thanks for the interest. Code is currently in Python 2. Unfortunately I am a bit pre-occupied with several things at work and one of my books. I'm planning to resume this around end of August hopefully or even earlier.

I still need to refactor the repository so that we have the code separate for Python 2 and 3. I will notify all in this thread once we are ready to start porting.

brycecf · 2017-10-29T18:13:28Z

@dipanjanS What's the status of this issue? I'd be happy to help out.

igatanasov · 2017-11-13T13:52:22Z

@dipanjanS is there any plan to convert this to Jupyter notebook?

dipanjanS · 2017-11-13T13:57:28Z

Sorry folks, a bit tied up with multiple engagements at the moment. Following is what I promise as soon as I can get to it.

Code in both Python 2 and 3
Jupyter notebooks besides normal code files

Collaborating with some folks from work for better output and ease of communication. In case I need additional help I will update here.

prakritidev · 2017-11-14T07:56:20Z

@dipanjanS I can help you with this is this issue is still open. I think creating Jupyter notebooks will be more interactive. Let me know if you need help on this.

Thanks

akhilap · 2017-11-15T12:05:08Z

Hi,

Can you please help me with latest code for python 3.5 64bit operating system? I am using visual studio 2017 to run the code.

prakritidev · 2017-11-15T12:17:10Z

I would say, use Jupiter notebook rather than Visual studio. Converting python 2 into python 3 is simple

dipanjanS · 2017-11-15T14:54:52Z

Kindly go through the book to get details of what have been used. For now the code runs on Python 2.7.x and you can use the anaconda distribution. The same is mentioned in the book. There is a work in progress to convert the code into Python 3 as well as jupyter notebooks. Once that is done it will be updated here.

akhilap · 2017-11-21T11:50:18Z

Can you please give any steps guideline documents on how to convert the code in python 2.X to 3.X using jupitor notebook?

dipanjanS · 2017-11-21T18:39:51Z

Jupyter notebook is not used for code conversion, it is a mechanism to run code, document your findings and share it across with others easily if needed. You need to use your own logic and utility libraries like 2to3 or six to convert the code.

peterotool · 2018-01-02T17:48:10Z

Any plans to port the code to Python 3 in 2018?

dipanjanS · 2018-01-02T18:55:29Z

@peterotool Thanks for bringing this up! Yep, work is already underway on this, we are planning to bring out a new revised edition of this book with all code in Python 3 and also adding new examples, use-cases and so on. Stay tuned! The book is going to come back better and with more content!

peterotool · 2018-01-02T21:33:34Z

@dipanjanS, Do you have any twitter account where i can follow any news regarding this?

…

On Tue, Jan 2, 2018 at 3:55 PM, Dipanjan Sarkar ***@***.***> wrote: @peterotool <https://github.com/peterotool> Thanks for bringing this up! Yep, work is already underway on this, we are planning to bring out a new revised edition of this book with all code in Python 3 and also adding new examples, use-cases and so on. Stay tuned! The book is going to come back better and with more content! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABZ9AYvrzVf-FKm74CDAiKnK482gS_E8ks5tGnuigaJpZM4Nhpzn> .

peterotool · 2018-04-06T18:59:26Z

@dipanjanS it is possible to create a chatbot using some deep learning architecture?

dipanjanS · 2018-06-04T16:52:45Z

@samuelxmli Can you please stop spamming the same question everywhere? You have already created two issues\comments. Closing this issue since I have replied on the other thread and soon we will be doing a revised version of this book in Python 3.x

dipanjanS added enhancement help wanted labels May 21, 2017

dipanjanS closed this as completed Jun 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert code base for Python 3.x #3

Convert code base for Python 3.x #3

dipanjanS commented May 21, 2017

anubhav3itb commented May 29, 2017

Salil999 commented Jun 4, 2017 •

edited

Loading

dipanjanS commented Jun 4, 2017

Salil999 commented Jun 4, 2017

dipanjanS commented Jun 4, 2017

dipanjanS commented Jun 6, 2017

ambientlight commented Jul 3, 2017

dipanjanS commented Jul 3, 2017

ambientlight commented Jul 3, 2017

bkbonde commented Jul 16, 2017

dipanjanS commented Jul 16, 2017

brycecf commented Oct 29, 2017

igatanasov commented Nov 13, 2017

dipanjanS commented Nov 13, 2017

prakritidev commented Nov 14, 2017 •

edited

Loading

akhilap commented Nov 15, 2017

prakritidev commented Nov 15, 2017

dipanjanS commented Nov 15, 2017

akhilap commented Nov 21, 2017

dipanjanS commented Nov 21, 2017

peterotool commented Jan 2, 2018 •

edited

Loading

dipanjanS commented Jan 2, 2018

peterotool commented Jan 2, 2018 via email •

edited

Loading

peterotool commented Apr 6, 2018

dipanjanS commented Jun 4, 2018

Convert code base for Python 3.x #3

Convert code base for Python 3.x #3

Comments

dipanjanS commented May 21, 2017

anubhav3itb commented May 29, 2017

Salil999 commented Jun 4, 2017 • edited Loading

dipanjanS commented Jun 4, 2017

Salil999 commented Jun 4, 2017

dipanjanS commented Jun 4, 2017

dipanjanS commented Jun 6, 2017

ambientlight commented Jul 3, 2017

dipanjanS commented Jul 3, 2017

ambientlight commented Jul 3, 2017

bkbonde commented Jul 16, 2017

dipanjanS commented Jul 16, 2017

brycecf commented Oct 29, 2017

igatanasov commented Nov 13, 2017

dipanjanS commented Nov 13, 2017

prakritidev commented Nov 14, 2017 • edited Loading

akhilap commented Nov 15, 2017

prakritidev commented Nov 15, 2017

dipanjanS commented Nov 15, 2017

akhilap commented Nov 21, 2017

dipanjanS commented Nov 21, 2017

peterotool commented Jan 2, 2018 • edited Loading

dipanjanS commented Jan 2, 2018

peterotool commented Jan 2, 2018 via email • edited Loading

peterotool commented Apr 6, 2018

dipanjanS commented Jun 4, 2018

Salil999 commented Jun 4, 2017 •

edited

Loading

prakritidev commented Nov 14, 2017 •

edited

Loading

peterotool commented Jan 2, 2018 •

edited

Loading

peterotool commented Jan 2, 2018 via email •

edited

Loading