Skip to content
Natural Language Processing From Scratch
Jupyter Notebook
Branch: master
Clone or download
Latest commit 8c3b24b Nov 11, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
slides Updated figure style Nov 9, 2019
.gitignore
1. Text Representation.ipynb notebook cleanup Nov 11, 2019
2. Topic Modeling.ipynb
3. Sentiment Analysis.ipynb updated notebooks with watermark Nov 11, 2019
4. Applications.ipynb
LICENSE
README.md Update README.md Oct 19, 2019
d4sci.mplstyle

README.md

Natural Language Processing From Scratch

Code and slides to accompany the online series of webinars: https://data4sci.com/nlp by Data For Science.

The rise of online social platforms has resulted in an explosion of written text in the form of blogs, posts, tweets, wiki pages, and more. This new wealth of data provides a unique opportunity to explore natural language in its many forms, both as a way of automatically extracting information from written text and as a way of artificially producing text that looks natural.

In this class we introduce viewers to natural language processing from scratch. Each concept is introduced and explained through coding examples using nothing more than just plain Python and numpy. In this way, attendees learn in depth about the underlying concepts and techniques instead of just learning how to use a specific NLP library.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

1. Text Representation (50m)

  • Represent words and numbers
  • Use One-Hot Encoding
  • Implement Bag of Words
  • Apply stopwords
  • Understand TF/IDF
  • Understand Stemming
  • Break 10m

2. Topic Modeling (60m)

  • Find topics in documents
  • Perform Explicit Semantic Analysis
  • Understand Document clustering
  • Implement Latent Semantic Analysis
  • Implement Non-negative Matrix factorization
  • Break 10m

3. Segment 3 Sentiment Analysis (40m)

  • Quantify words and feelings
  • Use Negations and modifiers
  • Understand corpus based approaches
  • Break 10m

4. Segment 4 Applications (70m)

  • Understand Word2vec word embeddings
  • Define GloVe
  • Apply Language detection

Slides: http://data4sci.com/landing/nlp/

You can’t perform that action at this time.