Permalink
Browse files

add init from prev lecture

  • Loading branch information...
accenture-ops-ai-kyle committed Mar 16, 2017
1 parent a4c4c32 commit fccb570c6eb53aa1b11c338bbef8df2dec9a444e
Showing with 2,175 additions and 0 deletions.
  1. +164 −0 1 CLTK Setup.ipynb
  2. +212 −0 2 Import corpora.ipynb
  3. +892 −0 3 Basic NLP.ipynb
  4. +206 −0 4 Lemmatization.ipynb
  5. +214 −0 5 Text reuse.ipynb
  6. +51 −0 6 N-grams.ipynb
  7. +278 −0 7 Syllabification, prosody, phonetics.ipynb
  8. +148 −0 8 Part-of-speech tagging.ipynb
  9. +10 −0 README.md
View
@@ -0,0 +1,164 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Join the Slack channel\n",
+ "\n",
+ "Send your email to `kyle@kyle-p-johnson.com` and he'll add you to this channel. Other instructions will be sent via this route."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Install Python\n",
+ "\n",
+ "## Mac\n",
+ "\n",
+ "<https://www.python.org/downloads/> (currently is 3.5.2)\n",
+ "\n",
+ "\n",
+ "## Linux\n",
+ "\n",
+ "Open Terminal and check current version with `python --version` or `python3 --version`. If 3.4 or 3.5, you're fine. If Python version is out of date, run these:\n",
+ "\n",
+ "``` bash\n",
+ "$ curl -O https://raw.githubusercontent.com/kylepjohnson/python3_bootstrap/master/install.sh\n",
+ "\n",
+ "$ chmod +x install.sh\n",
+ "\n",
+ "$ ./install.sh\n",
+ "```\n",
+ "\n",
+ "This Linux build from source will take ~5 mins."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Install Git\n",
+ "\n",
+ "CLTK uses Git for corpus management. For Mac, install it from here: <https://git-scm.com/downloads>. For Linux, check if present (`git --version`); if not then use your package manager to get it (e.g., `apt-get install git`)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Make virtual environment\n",
+ "\n",
+ "This makes a special environment (a \"sandbox\") just for the cltk. If something goes wrong, you can just delete it and start again.\n",
+ "\n",
+ "``` bash\n",
+ "$ cd ~/\n",
+ "$ mkdir cltk\n",
+ "$ cd cltk\n",
+ "$ pyvenv venv\n",
+ "$ source venv/bin/activate\n",
+ "```\n",
+ "\n",
+ "Now you can see that you're not using your system Python but this particular one:\n",
+ "\n",
+ "``` bash\n",
+ "$ which python\n",
+ "```\n",
+ "\n",
+ "Note that every time you open a new Terminal window, you'll need to \"activate\" this environment with `source ~/cltk/venv/bin/activate`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Install CLTK\n",
+ "\n",
+ "``` bash\n",
+ "$ pip install cltk\n",
+ "```\n",
+ "\n",
+ "This will take a few minutes, as it will install several \"dependencies\", being other Python libraries which the CLTK uses.\n",
+ "\n",
+ "Also install Jupyter, which is a really handy way of writing code.\n",
+ "\n",
+ "``` bash\n",
+ "$ pip install jupyter\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Test Jupter\n",
+ "\n",
+ "Launch a notebook (such as this one) from the Terminal with `jupyter notebook`. Then open your preferred browser to <http://localhost:8888>."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Download these tutorials\n",
+ "\n",
+ "Now or sometime later, you may find these instructions at <https://github.com/kylepjohnson/notebooks/tree/master/public_talks/2016_12_08_harvard_classics>."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Join GitHub\n",
+ "\n",
+ "A nice way to share code. Do this later, then come visit us at <https://github.com/cltk/cltk/>."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
View
@@ -0,0 +1,212 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "The CLTK has a distributed infrastructure that lets you download official CLTK texts or other corpora shared by others. For full docs, see <http://docs.cltk.org/en/latest/importing_corpora.html>.\n",
+ "\n",
+ "To get started, from the Terminal, open a new Jupyter notebook from within your `~/cltk` directory (see notebook 1 for instructions): `jupyter notebook`. Then go to <http://localhost:8888>."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# See what corpora are available\n",
+ "\n",
+ "First we need to \"import\" the right part of the CLTK library. Think of this as pulling just the book you need off the shelf and having it ready to read."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "collapsed": true,
+ "deletable": true,
+ "editable": true
+ },
+ "outputs": [],
+ "source": [
+ "# this is the import of the right part of the CLTK library\n",
+ "from cltk.corpus.utils.importer import CorpusImporter"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "collapsed": false,
+ "deletable": true,
+ "editable": true
+ },
+ "outputs": [],
+ "source": [
+ "# See https://github.com/cltk for all official corpora\n",
+ "\n",
+ "my_latin_downloader = CorpusImporter('latin')\n",
+ "\n",
+ "# 'my_latin_downloader' is the variable by which we now call the CorpusImporter"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": false,
+ "deletable": true,
+ "editable": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['latin_text_perseus',\n",
+ " 'latin_treebank_perseus',\n",
+ " 'latin_treebank_perseus',\n",
+ " 'latin_text_latin_library',\n",
+ " 'phi5',\n",
+ " 'phi7',\n",
+ " 'latin_proper_names_cltk',\n",
+ " 'latin_models_cltk',\n",
+ " 'latin_pos_lemmata_cltk',\n",
+ " 'latin_treebank_index_thomisticus',\n",
+ " 'latin_lexica_perseus',\n",
+ " 'latin_training_set_sentence_cltk',\n",
+ " 'latin_word2vec_cltk',\n",
+ " 'latin_text_antique_digiliblt',\n",
+ " 'latin_text_corpus_grammaticorum_latinorum']"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "my_latin_downloader.list_corpora"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "# Import several corpora"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "deletable": true,
+ "editable": true
+ },
+ "outputs": [],
+ "source": [
+ "my_latin_downloader.import_corpus('latin_text_latin_library')\n",
+ "my_latin_downloader.import_corpus('latin_models_cltk')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "You can verify the files were downloaded in the Terminal with `$ ls -l ~/cltk_data/latin/text/latin_text_latin_library/`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": false,
+ "deletable": true,
+ "editable": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['greek_software_tlgu',\n",
+ " 'greek_text_perseus',\n",
+ " 'phi7',\n",
+ " 'tlg',\n",
+ " 'greek_proper_names_cltk',\n",
+ " 'greek_models_cltk',\n",
+ " 'greek_treebank_perseus',\n",
+ " 'greek_lexica_perseus',\n",
+ " 'greek_training_set_sentence_cltk',\n",
+ " 'greek_word2vec_cltk',\n",
+ " 'greek_text_lacus_curtius']"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Let's get a Greek corpus, too\n",
+ "\n",
+ "my_greek_downloader = CorpusImporter('greek')\n",
+ "my_greek_downloader.list_corpora"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "collapsed": false,
+ "deletable": true,
+ "editable": true
+ },
+ "outputs": [],
+ "source": [
+ "my_greek_downloader.import_corpus('greek_text_lacus_curtius')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "deletable": true,
+ "editable": true
+ },
+ "source": [
+ "Likewise, verify with `ls -l ~/cltk_data/greek/text/greek_text_lacus_curtius/plain/`"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
Oops, something went wrong.

0 comments on commit fccb570

Please sign in to comment.