Skip to content

This is a step by step tutorial for text analyst who want an easy start to basic and and common techniques in NLP, Text Analysis, Machine Learning, Topic Modelling and corpus Linguistics. The tutorial is pat of the "Data Visualisation Workshop for Critical Computational Discourse" at the Data Science Institute at Lancaster University, UK. Presen…

drelhaj/NLP_ML_Visualization_Tutorial

Repository files navigation

NLP_ML_Visualization_Tutorial

This is a step by step tutorial for text analyst who want an easy start to basic and and common techniques in NLP, Text Analysis, Machine Learning, Topic Modelling and Corpus Linguistics. The tutorial is pat of the "Visualise My Corpus" UCREL and DSG Seminar and Tutorial as well as the "Data Visualisation Workshop for Critical Computational Discourse" at the Data Science Institute at Lancaster University, UK.

Arabic Version of this Tutorial:

You can find the Arabic-customised version of this tutorial here: https://github.com/drelhaj/NLP_ML_Visualization_Tutorial/tree/master/Arabic_Tutorial

Author and Presenter:

Dr Mahmoud El-Haj https://www.lancaster.ac.uk/staff/elhaj

Presentation Slides:

If you have attended the 'Visualise My Corpus' talk before here are the introductory slides: https://www.lancaster.ac.uk/staff/elhaj/docs/visualise_my%20_corpus.pdf

Presentation YouTube Video:

A step by step presentation of the tutorials: https://youtu.be/g6tUQxIVesA

Tutorials

The repository is made up of 6 tutorials as follow:

  • 1- Visualaization using SpaCy: a basic introduction to using SpaCy and to visualise part of speech tagging and named entity recognition.
  • 2- Topic Modelling: Using LDA and LDAvis to display an interactive topic model.
  • 3- Word Clouds: an introduction to creating word clouds using basic word frequency and more towards focusing on other part of speech tags.
  • 4- Machine Learning: a basic introduction to SVM and Naive Bayse, this a simple classifier and the results are shown in a confusion matrix.
  • 5- Word Usage: show word usage in terms of frequency over a period of time
  • 6- Word Embeddings: a gentle start to word embeddings using gensim and visualising the vectors using TSNE and PCA.

Installation

You need Jupyter to run the notebooks https://jupyter.org/. Check the 0_Visualisation_Setup.ipynb for the required python packages. (https://github.com/drelhaj/NLP_ML_Visualization_Tutorial/blob/master/0_Visualisation_Setup.ipynb)

About

This is a step by step tutorial for text analyst who want an easy start to basic and and common techniques in NLP, Text Analysis, Machine Learning, Topic Modelling and corpus Linguistics. The tutorial is pat of the "Data Visualisation Workshop for Critical Computational Discourse" at the Data Science Institute at Lancaster University, UK. Presen…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published