Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert code base for Python 3.x #3

Closed
dipanjanS opened this issue May 21, 2017 · 25 comments
Closed

Convert code base for Python 3.x #3

dipanjanS opened this issue May 21, 2017 · 25 comments

Comments

@dipanjanS
Copy link
Owner

Python 3 is the future and even though a lot of legacy code and systems run on Python 2 (including our applications, which is why I had written this book in Python 2 in the first place). We need to slowly start migrating and building our code, apps and systems based on Python 3.

Looking for experts in Python 3.x as well as NLP and text analytics who could help out in migrating each chapter's codebase to Python 3.x, since I am occupied for a major part of this year on other projects. I do have some parts of it ready for Python 3.x and can offer help and support whenever needed.

Successful codebase migrations will make sure you are mentioned as a contributor in the acknowledgements & contributor list of this repository and project. Also you will get a mention in future versions of the book whenever that is in the pipeline.

@anubhav3itb
Copy link

I think I can help you with this.

@Salil999
Copy link

Salil999 commented Jun 4, 2017

If anyone is interested, I have updated almost all of chapters 1 to 4. Chapters 5 to 7 are displaying some 'lfs' error. I will try to resolve that later, but feel free to fork and make pull requests.

Here's the repo: text-analytics-with-python

@dipanjanS
Copy link
Owner Author

Thanks, but like I said we need to follow a structured workflow and approach instead of working in an ad-hoc manner for this to have hassle free merges. Please wait before further conversions because I need to restructure the current repo sand put out a plan. I will do so in a couple of days.

@Salil999
Copy link

Salil999 commented Jun 4, 2017

Okay, will hold off on it. I would like to note that the module "pattern" does not seem to have support for Python 3 yet. This will hopefully change in the future, but Chapters 5-7 (and I think some of Chapter 4) are at a roadblock for now.

@dipanjanS
Copy link
Owner Author

Sure and yeah I'm aware of the issue with pattern. There is an unofficial port but it's incomplete sadly since the last couple of years. I've thought of some strategies to tackle the same. Let me restructure the current repository then we can get started on this in more detail. I'll update once that is done then we can port and merge chapter by chapter.

@dipanjanS
Copy link
Owner Author

Here is the first phase of the plan, once each step is done it will be checked off to keep track. I am currently on vacation so will update you guys as soon as the re-structuring is done.

  • Re-structure current repository @dipanjanS

  • Contributors to pull in latest changes

  • Port code for chapters 1-3 and send pull requests for each chapter separately

  • Merge subsequent pull requests to main repository after review @dipanjanS

  • Look into the pattern repository and necessary modules needed @dipanjanS

  • Discuss strategies for porting remaining chapters and post the plan for the same

@ambientlight
Copy link
Contributor

@dipanjanS
This idea might sound a bit weird... but do you think it makes sense adding type hints into Python 3.x code examples?

it just might be a bit easier to read through the code in the book.
and code completion / correct jump to definition within PyCharm...

@dipanjanS
Copy link
Owner Author

@ambientlight Sorry I'm a bit tied up with work and a couple of other things so I'm not getting time to look into this. Maybe I will sometime soon. With regard to your query, are you talking about the type hints as in specifying the data type per variable in the code? If so maybe we can look into it once the entire code is ported.

@ambientlight
Copy link
Contributor

@dipanjanS got it! thanks a lot!
method parameters and return types I think would be good enough. normally variable is evident enough from the rhs of the expression.

I ported few things up to CH4. Something like this:

class Normalizer:

    stopwords: List[str] = nltk.corpus.stopwords.words('english')
    wnl = WordNetLemmatizer()

    @staticmethod
    def tokenize_text(text: str) -> List[str]:
        tokens: List[str] = nltk.word_tokenize(text)
        tokens = [token.strip() for token in tokens]
        return tokens

    @staticmethod
    def expand_contractions(text: str, contraction_mapping: Dict[str, str]) -> str:
        contractions_pattern = re.compile('({})'.format('|'.join(contraction_mapping.keys())),
                                          flags=re.IGNORECASE | re.DOTALL)

        def expand_match(contraction):
            match = contraction.group(0)
            first_char = match[0]
            expanded_contraction = \
                contraction_mapping.get(match) \
                if contraction_mapping.get(match) \
                else contraction_mapping.get(match.lower())

            expanded_contraction = first_char + expanded_contraction[1:]
            return expanded_contraction

        expanded_text = contractions_pattern.sub(expand_match, text)
        expanded_text = re.sub("'", "", expanded_text)
        return expanded_text

    # Annotate text tokens with POS tags
    @staticmethod
    def pos_tag_text(text: str) -> List[Tuple[str, str]]:
        # convert Penn treebank tag to wordnet tag
        def penn_to_wn_tags(pos_tag):
            if pos_tag.startswith('J'):
                return wn.ADJ
            elif pos_tag.startswith('V'):
                return wn.VERB
            elif pos_tag.startswith('N'):
                return wn.NOUN
            elif pos_tag.startswith('R'):
                return wn.ADV
            else:
                return None

        tagged_text = nltk.pos_tag(Normalizer.tokenize_text(text))
        tagged_lower_text = [(word.lower(), penn_to_wn_tags(pos_tag)) for word, pos_tag in tagged_text]
        return tagged_lower_text

I can contribute the typing later on if it would be appropriate.

@bkbonde
Copy link

bkbonde commented Jul 16, 2017

I am not sure if all is now ported to python 3, if not I can contribute, I will checkout repo and add some tests for python 3
Bhushan

@dipanjanS
Copy link
Owner Author

@ambientlight @pribond

Sure, thanks for the interest. Code is currently in Python 2. Unfortunately I am a bit pre-occupied with several things at work and one of my books. I'm planning to resume this around end of August hopefully or even earlier.

I still need to refactor the repository so that we have the code separate for Python 2 and 3. I will notify all in this thread once we are ready to start porting.

@brycecf
Copy link

brycecf commented Oct 29, 2017

@dipanjanS What's the status of this issue? I'd be happy to help out.

@igatanasov
Copy link

@dipanjanS is there any plan to convert this to Jupyter notebook?

@dipanjanS
Copy link
Owner Author

Sorry folks, a bit tied up with multiple engagements at the moment. Following is what I promise as soon as I can get to it.

  • Code in both Python 2 and 3
  • Jupyter notebooks besides normal code files

Collaborating with some folks from work for better output and ease of communication. In case I need additional help I will update here.

@prakritidev
Copy link

prakritidev commented Nov 14, 2017

@dipanjanS I can help you with this is this issue is still open. I think creating Jupyter notebooks will be more interactive. Let me know if you need help on this.

Thanks

@akhilap
Copy link

akhilap commented Nov 15, 2017

Hi,

Can you please help me with latest code for python 3.5 64bit operating system? I am using visual studio 2017 to run the code.

@prakritidev
Copy link

I would say, use Jupiter notebook rather than Visual studio. Converting python 2 into python 3 is simple

@dipanjanS
Copy link
Owner Author

Kindly go through the book to get details of what have been used. For now the code runs on Python 2.7.x and you can use the anaconda distribution. The same is mentioned in the book. There is a work in progress to convert the code into Python 3 as well as jupyter notebooks. Once that is done it will be updated here.

@akhilap
Copy link

akhilap commented Nov 21, 2017

Can you please give any steps guideline documents on how to convert the code in python 2.X to 3.X using jupitor notebook?

@dipanjanS
Copy link
Owner Author

Jupyter notebook is not used for code conversion, it is a mechanism to run code, document your findings and share it across with others easily if needed. You need to use your own logic and utility libraries like 2to3 or six to convert the code.

@peterotool
Copy link

peterotool commented Jan 2, 2018

Any plans to port the code to Python 3 in 2018?

@dipanjanS
Copy link
Owner Author

@peterotool Thanks for bringing this up! Yep, work is already underway on this, we are planning to bring out a new revised edition of this book with all code in Python 3 and also adding new examples, use-cases and so on. Stay tuned! The book is going to come back better and with more content!

@peterotool
Copy link

peterotool commented Jan 2, 2018 via email

@peterotool
Copy link

@dipanjanS it is possible to create a chatbot using some deep learning architecture?

@dipanjanS
Copy link
Owner Author

@samuelxmli Can you please stop spamming the same question everywhere? You have already created two issues\comments. Closing this issue since I have replied on the other thread and soon we will be doing a revised version of this book in Python 3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants