Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

PLEASE NOTE: I am no longer keeping these GitHub projects up to date. All changes and updates will be made on GitLab in the future:

Re-using these materials

The content of the notebook is released under the Creative Commons Attributional Share-Alike license. The code may be used indepdently of the rest of the notebook, and is licensed under the BSD 3-clause license.


Notebooks and sample data from my Python for Linguists talk at UTA (Feb 16th, 2018).

The .ipynb file is what the talk was given directly from, with a bit of touching up. The .tex file was automatically generated from the .ipynb file using nbconvert, and compiled into the .pdf file also in this repository.

Running this notebook

To run this notebook, you will need to install the following Python libraries:

  • Jupyter (to run/view the notebook itself)
  • Gensim
  • spaCy
    • You'll need to download two of spaCy's language models: en_core_web_sm and en_core_web_lg. Installation instructions are here.
  • Numpy
  • Scikit-learn (goes by sklearn when installing)
  • Matplotlib
  • Natural Language Toolkit (NLTK)
  • tqdm
  • Pandas

Open Jupyter and navigate to this .ipynb file, then open it. Every major section is designed to be able to run independently, minus the "Setup Code" section, which should always be run first.

For the first part of the talk, you'll also need to download glen carrig.txt from the Github repository and have it in the same folder as this notebook.

For the second part of the talk, you'll need to download the Blog Authorship Corpus and unzip its files into a folder named "blogs" (case-sensitive) in the same folder as this notebook.


Notebooks and sample data from my Python for Linguists talk at UTA (Feb 14th, 2018).




No releases published


No packages published