Word2vec Algorithm: Made as simple as possible, but no simpler
This talk will give a Pythonic introduction to the word2vec algorithm. Word2vec, translating words (strings) to vectors (lists of floats), is a relatively new algorithm which has proved to be very useful for making sense of text data. You will gain a conceptual understanding of the algorithm and be empowered to try it out on your favorite collection of text data.
“You shall know a word by the company it keeps” is a common refrain in natural language processing (NLP). word2vec is a simple neural network that learns which words tend to co-occur and embeds the words in a vector space. From these word embeddings, it is possible to use distance measures to compare words, find neighbors by clustering, and add/subtract words to explore relationships between concepts. Actually, word2vec is a general purpose algorithm that allows any sequential data to be encoded into meaningful vectors - including emojis!
Dr. Brian Spiering is a faculty member at GalvanizeU, which offers a Master of Science in Data Science. His passions are natural language processing (NLP), deep learning, and building data products. He is active in the San Francisco data science community through volunteering and mentoring.
Drop him a line firstname.lastname@example.org
Presented at SF Python meetup
Disclaimer: These are interactive notebooks that are meant to be run. There might be elements not rendered correctly on static GitHub pages.