Notebooks and sample data from my Python for Linguists talk at UTA (Feb 14th, 2018).
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Python for Linguists.ipynb
Python for Linguists.pdf
Python for Linguists.tex
glen carrig.txt

PLEASE NOTE: I am no longer keeping these GitHub projects up to date. All changes and updates will be made on GitLab in the future:

Re-using these materials

The content of the notebook is released under the Creative Commons Attributional Share-Alike license. The code may be used indepdently of the rest of the notebook, and is licensed under the BSD 3-clause license.


Notebooks and sample data from my Python for Linguists talk at UTA (Feb 16th, 2018).

The .ipynb file is what the talk was given directly from, with a bit of touching up. The .tex file was automatically generated from the .ipynb file using nbconvert, and compiled into the .pdf file also in this repository.

Running this notebook

To run this notebook, you will need to install the following Python libraries:

  • Jupyter (to run/view the notebook itself)
  • Gensim
  • spaCy
    • You'll need to download two of spaCy's language models: en_core_web_sm and en_core_web_lg. Installation instructions are here.
  • Numpy
  • Scikit-learn (goes by sklearn when installing)
  • Matplotlib
  • Natural Language Toolkit (NLTK)
  • tqdm
  • Pandas

Open Jupyter and navigate to this .ipynb file, then open it. Every major section is designed to be able to run independently, minus the "Setup Code" section, which should always be run first.

For the first part of the talk, you'll also need to download glen carrig.txt from the Github repository and have it in the same folder as this notebook.

For the second part of the talk, you'll need to download the Blog Authorship Corpus and unzip its files into a folder named "blogs" (case-sensitive) in the same folder as this notebook.