Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

PLEASE NOTE: I am no longer keeping these GitHub projects up to date. All changes and updates will be made on GitLab in the future: https://gitlab.com/andersonh/PythonForLinguistsTalk

Re-using these materials

The content of the notebook is released under the Creative Commons Attributional Share-Alike license. The code may be used indepdently of the rest of the notebook, and is licensed under the BSD 3-clause license.

PythonForLinguistsTalk

Notebooks and sample data from my Python for Linguists talk at UTA (Feb 16th, 2018).

The .ipynb file is what the talk was given directly from, with a bit of touching up. The .tex file was automatically generated from the .ipynb file using nbconvert, and compiled into the .pdf file also in this repository.

Running this notebook

To run this notebook, you will need to install the following Python libraries:

  • Jupyter (to run/view the notebook itself)
  • Gensim
  • spaCy
    • You'll need to download two of spaCy's language models: en_core_web_sm and en_core_web_lg. Installation instructions are here.
  • Numpy
  • Scikit-learn (goes by sklearn when installing)
  • Matplotlib
  • Natural Language Toolkit (NLTK)
  • tqdm
  • Pandas

Open Jupyter and navigate to this .ipynb file, then open it. Every major section is designed to be able to run independently, minus the "Setup Code" section, which should always be run first.

For the first part of the talk, you'll also need to download glen carrig.txt from the Github repository and have it in the same folder as this notebook.

For the second part of the talk, you'll need to download the Blog Authorship Corpus and unzip its files into a folder named "blogs" (case-sensitive) in the same folder as this notebook.

About

Notebooks and sample data from my Python for Linguists talk at UTA (Feb 14th, 2018).

Resources

License

Releases

No releases published

Packages

No packages published