Word2vec Algorithm: Made as simple as possible, but no simpler
A Pythonic introduction to the word2vec algorithm. Word2vec, translating words (strings) to vectors (lists of floats), is a relatively new algorithm which has proven to be very useful for making sense of text data. You should walk out at the end with a conceptual understanding of the algorithm and be empowered to try it out on your favorite collection of text data.
“You shall know a word by the company it keeps” is a common refrain in Natural Language Processing (NLP). word2vec does that by training a neural network to learn which words tend to co-occur together and embeds the words in a meaningful vector space. From these "word embeddings", it is possible to compare words with distance measures, add/subtract words to explore relationships between concepts, and clustering to find semantically related words. Actually, word2vec is a general purpose algorithm that allows any sequential data to be encoded into meaningful vectors - including emojis!
Dr. Brian Spiering is a faculty member at GalvanizeU which offers a Master of Science in Data Science. His passions are Natural Language Processing (NLP), deep learning, and building data products. He is active in the San Francisco Data Science community through volunteering and mentoring.
Drop him a line firstname.lastname@example.org
Disclaimer: These are interactive notebooks that are meant to be run. There might be elements not rendered correctly on static GitHub pages.