NOTICE since the exercise notebook is a bit heavier than normal to run this time around there's a chance that the bug with the portal that causes incorrect grades to show up when re-submitting has a small chance of surfacing again. Please keep an eye out for it and if you think it may be occurring again, drop in to the #dev-ops channel and let us know!
This big learning unit continues the text classification specialization. Now that you know how to process text and extract meaningfull features, we will show you how to prepare these features so you can actually use them for your task, for example text classification.
We will focus on how to go from a huge set of features to a more tractable set, more usefull for the modelling of our problem.
If you are able to solve this BLU, you are equiped with one more set of tools to succeed in the hackathon :)
As in the previous BLUs, go through the Learning Notebooks (they are in the Learning Notebooks folder), then do the Exercise notebook, and submit it on the portal.
Don't forget to install the requirements:
pip install -r requirements
Additionally, you should also download an extra file before going through part III:
python -m spacy download en_core_web_md
You can and should ask for help, be it about Learning Notebooks, Exercises, or anything else. Please checkout the How to Ask for Help, and remember not to share code when asking for help about the exercises!
This repo is completely open source and is continuously improving over time. When you spot a mistake, please check whether it has been detected in the issues. If it hasn't, please open an issue, explaining in details where it is (e.g. in what notebook, and on what line), and how to reproduce the error. If it is an easy fix, feel free to make a pull request.