-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert code base for Python 3.x #3
Comments
I think I can help you with this. |
If anyone is interested, I have updated almost all of chapters 1 to 4. Chapters 5 to 7 are displaying some 'lfs' error. I will try to resolve that later, but feel free to fork and make pull requests. Here's the repo: text-analytics-with-python |
Thanks, but like I said we need to follow a structured workflow and approach instead of working in an ad-hoc manner for this to have hassle free merges. Please wait before further conversions because I need to restructure the current repo sand put out a plan. I will do so in a couple of days. |
Okay, will hold off on it. I would like to note that the module "pattern" does not seem to have support for Python 3 yet. This will hopefully change in the future, but Chapters 5-7 (and I think some of Chapter 4) are at a roadblock for now. |
Sure and yeah I'm aware of the issue with pattern. There is an unofficial port but it's incomplete sadly since the last couple of years. I've thought of some strategies to tackle the same. Let me restructure the current repository then we can get started on this in more detail. I'll update once that is done then we can port and merge chapter by chapter. |
Here is the first phase of the plan, once each step is done it will be checked off to keep track. I am currently on vacation so will update you guys as soon as the re-structuring is done.
|
@dipanjanS it just might be a bit easier to read through the code in the book. |
@ambientlight Sorry I'm a bit tied up with work and a couple of other things so I'm not getting time to look into this. Maybe I will sometime soon. With regard to your query, are you talking about the type hints as in specifying the data type per variable in the code? If so maybe we can look into it once the entire code is ported. |
@dipanjanS got it! thanks a lot! I ported few things up to CH4. Something like this: class Normalizer:
stopwords: List[str] = nltk.corpus.stopwords.words('english')
wnl = WordNetLemmatizer()
@staticmethod
def tokenize_text(text: str) -> List[str]:
tokens: List[str] = nltk.word_tokenize(text)
tokens = [token.strip() for token in tokens]
return tokens
@staticmethod
def expand_contractions(text: str, contraction_mapping: Dict[str, str]) -> str:
contractions_pattern = re.compile('({})'.format('|'.join(contraction_mapping.keys())),
flags=re.IGNORECASE | re.DOTALL)
def expand_match(contraction):
match = contraction.group(0)
first_char = match[0]
expanded_contraction = \
contraction_mapping.get(match) \
if contraction_mapping.get(match) \
else contraction_mapping.get(match.lower())
expanded_contraction = first_char + expanded_contraction[1:]
return expanded_contraction
expanded_text = contractions_pattern.sub(expand_match, text)
expanded_text = re.sub("'", "", expanded_text)
return expanded_text
# Annotate text tokens with POS tags
@staticmethod
def pos_tag_text(text: str) -> List[Tuple[str, str]]:
# convert Penn treebank tag to wordnet tag
def penn_to_wn_tags(pos_tag):
if pos_tag.startswith('J'):
return wn.ADJ
elif pos_tag.startswith('V'):
return wn.VERB
elif pos_tag.startswith('N'):
return wn.NOUN
elif pos_tag.startswith('R'):
return wn.ADV
else:
return None
tagged_text = nltk.pos_tag(Normalizer.tokenize_text(text))
tagged_lower_text = [(word.lower(), penn_to_wn_tags(pos_tag)) for word, pos_tag in tagged_text]
return tagged_lower_text I can contribute the typing later on if it would be appropriate. |
I am not sure if all is now ported to python 3, if not I can contribute, I will checkout repo and add some tests for python 3 |
Sure, thanks for the interest. Code is currently in Python 2. Unfortunately I am a bit pre-occupied with several things at work and one of my books. I'm planning to resume this around end of August hopefully or even earlier. I still need to refactor the repository so that we have the code separate for Python 2 and 3. I will notify all in this thread once we are ready to start porting. |
@dipanjanS What's the status of this issue? I'd be happy to help out. |
@dipanjanS is there any plan to convert this to Jupyter notebook? |
Sorry folks, a bit tied up with multiple engagements at the moment. Following is what I promise as soon as I can get to it.
Collaborating with some folks from work for better output and ease of communication. In case I need additional help I will update here. |
@dipanjanS I can help you with this is this issue is still open. I think creating Jupyter notebooks will be more interactive. Let me know if you need help on this. Thanks |
Hi, Can you please help me with latest code for python 3.5 64bit operating system? I am using visual studio 2017 to run the code. |
I would say, use Jupiter notebook rather than Visual studio. Converting python 2 into python 3 is simple |
Kindly go through the book to get details of what have been used. For now the code runs on Python 2.7.x and you can use the anaconda distribution. The same is mentioned in the book. There is a work in progress to convert the code into Python 3 as well as jupyter notebooks. Once that is done it will be updated here. |
Can you please give any steps guideline documents on how to convert the code in python 2.X to 3.X using jupitor notebook? |
Jupyter notebook is not used for code conversion, it is a mechanism to run code, document your findings and share it across with others easily if needed. You need to use your own logic and utility libraries like |
Any plans to port the code to Python 3 in 2018? |
@peterotool Thanks for bringing this up! Yep, work is already underway on this, we are planning to bring out a new revised edition of this book with all code in Python 3 and also adding new examples, use-cases and so on. Stay tuned! The book is going to come back better and with more content! |
@dipanjanS,
Do you have any twitter account where i can follow any news regarding this?
…On Tue, Jan 2, 2018 at 3:55 PM, Dipanjan Sarkar ***@***.***> wrote:
@peterotool <https://github.com/peterotool> Thanks for bringing this up!
Yep, work is already underway on this, we are planning to bring out a new
revised edition of this book with all code in Python 3 and also adding new
examples, use-cases and so on. Stay tuned! The book is going to come back
better and with more content!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABZ9AYvrzVf-FKm74CDAiKnK482gS_E8ks5tGnuigaJpZM4Nhpzn>
.
|
@dipanjanS it is possible to create a chatbot using some deep learning architecture? |
@samuelxmli Can you please stop spamming the same question everywhere? You have already created two issues\comments. Closing this issue since I have replied on the other thread and soon we will be doing a revised version of this book in Python 3.x |
Python 3 is the future and even though a lot of legacy code and systems run on Python 2 (including our applications, which is why I had written this book in Python 2 in the first place). We need to slowly start migrating and building our code, apps and systems based on Python 3.
Looking for experts in Python 3.x as well as NLP and text analytics who could help out in migrating each chapter's codebase to Python 3.x, since I am occupied for a major part of this year on other projects. I do have some parts of it ready for Python 3.x and can offer help and support whenever needed.
Successful codebase migrations will make sure you are mentioned as a contributor in the acknowledgements & contributor list of this repository and project. Also you will get a mention in future versions of the book whenever that is in the pipeline.
The text was updated successfully, but these errors were encountered: