Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Kyle Johnson
committed
Mar 16, 2017
1 parent
a4c4c32
commit fccb570
Showing
9 changed files
with
2,175 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,164 @@ | |||
{ | |||
"cells": [ | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Join the Slack channel\n", | |||
"\n", | |||
"Send your email to `kyle@kyle-p-johnson.com` and he'll add you to this channel. Other instructions will be sent via this route." | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Install Python\n", | |||
"\n", | |||
"## Mac\n", | |||
"\n", | |||
"<https://www.python.org/downloads/> (currently is 3.5.2)\n", | |||
"\n", | |||
"\n", | |||
"## Linux\n", | |||
"\n", | |||
"Open Terminal and check current version with `python --version` or `python3 --version`. If 3.4 or 3.5, you're fine. If Python version is out of date, run these:\n", | |||
"\n", | |||
"``` bash\n", | |||
"$ curl -O https://raw.githubusercontent.com/kylepjohnson/python3_bootstrap/master/install.sh\n", | |||
"\n", | |||
"$ chmod +x install.sh\n", | |||
"\n", | |||
"$ ./install.sh\n", | |||
"```\n", | |||
"\n", | |||
"This Linux build from source will take ~5 mins." | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Install Git\n", | |||
"\n", | |||
"CLTK uses Git for corpus management. For Mac, install it from here: <https://git-scm.com/downloads>. For Linux, check if present (`git --version`); if not then use your package manager to get it (e.g., `apt-get install git`)." | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Make virtual environment\n", | |||
"\n", | |||
"This makes a special environment (a \"sandbox\") just for the cltk. If something goes wrong, you can just delete it and start again.\n", | |||
"\n", | |||
"``` bash\n", | |||
"$ cd ~/\n", | |||
"$ mkdir cltk\n", | |||
"$ cd cltk\n", | |||
"$ pyvenv venv\n", | |||
"$ source venv/bin/activate\n", | |||
"```\n", | |||
"\n", | |||
"Now you can see that you're not using your system Python but this particular one:\n", | |||
"\n", | |||
"``` bash\n", | |||
"$ which python\n", | |||
"```\n", | |||
"\n", | |||
"Note that every time you open a new Terminal window, you'll need to \"activate\" this environment with `source ~/cltk/venv/bin/activate`." | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Install CLTK\n", | |||
"\n", | |||
"``` bash\n", | |||
"$ pip install cltk\n", | |||
"```\n", | |||
"\n", | |||
"This will take a few minutes, as it will install several \"dependencies\", being other Python libraries which the CLTK uses.\n", | |||
"\n", | |||
"Also install Jupyter, which is a really handy way of writing code.\n", | |||
"\n", | |||
"``` bash\n", | |||
"$ pip install jupyter\n", | |||
"```" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Test Jupter\n", | |||
"\n", | |||
"Launch a notebook (such as this one) from the Terminal with `jupyter notebook`. Then open your preferred browser to <http://localhost:8888>." | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Download these tutorials\n", | |||
"\n", | |||
"Now or sometime later, you may find these instructions at <https://github.com/kylepjohnson/notebooks/tree/master/public_talks/2016_12_08_harvard_classics>." | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Join GitHub\n", | |||
"\n", | |||
"A nice way to share code. Do this later, then come visit us at <https://github.com/cltk/cltk/>." | |||
] | |||
} | |||
], | |||
"metadata": { | |||
"kernelspec": { | |||
"display_name": "Python 3", | |||
"language": "python", | |||
"name": "python3" | |||
}, | |||
"language_info": { | |||
"codemirror_mode": { | |||
"name": "ipython", | |||
"version": 3 | |||
}, | |||
"file_extension": ".py", | |||
"mimetype": "text/x-python", | |||
"name": "python", | |||
"nbconvert_exporter": "python", | |||
"pygments_lexer": "ipython3", | |||
"version": "3.6.0" | |||
} | |||
}, | |||
"nbformat": 4, | |||
"nbformat_minor": 1 | |||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,212 @@ | |||
{ | |||
"cells": [ | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"The CLTK has a distributed infrastructure that lets you download official CLTK texts or other corpora shared by others. For full docs, see <http://docs.cltk.org/en/latest/importing_corpora.html>.\n", | |||
"\n", | |||
"To get started, from the Terminal, open a new Jupyter notebook from within your `~/cltk` directory (see notebook 1 for instructions): `jupyter notebook`. Then go to <http://localhost:8888>." | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# See what corpora are available\n", | |||
"\n", | |||
"First we need to \"import\" the right part of the CLTK library. Think of this as pulling just the book you need off the shelf and having it ready to read." | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": 1, | |||
"metadata": { | |||
"collapsed": true, | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# this is the import of the right part of the CLTK library\n", | |||
"from cltk.corpus.utils.importer import CorpusImporter" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": 2, | |||
"metadata": { | |||
"collapsed": false, | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# See https://github.com/cltk for all official corpora\n", | |||
"\n", | |||
"my_latin_downloader = CorpusImporter('latin')\n", | |||
"\n", | |||
"# 'my_latin_downloader' is the variable by which we now call the CorpusImporter" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": 3, | |||
"metadata": { | |||
"collapsed": false, | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"outputs": [ | |||
{ | |||
"data": { | |||
"text/plain": [ | |||
"['latin_text_perseus',\n", | |||
" 'latin_treebank_perseus',\n", | |||
" 'latin_treebank_perseus',\n", | |||
" 'latin_text_latin_library',\n", | |||
" 'phi5',\n", | |||
" 'phi7',\n", | |||
" 'latin_proper_names_cltk',\n", | |||
" 'latin_models_cltk',\n", | |||
" 'latin_pos_lemmata_cltk',\n", | |||
" 'latin_treebank_index_thomisticus',\n", | |||
" 'latin_lexica_perseus',\n", | |||
" 'latin_training_set_sentence_cltk',\n", | |||
" 'latin_word2vec_cltk',\n", | |||
" 'latin_text_antique_digiliblt',\n", | |||
" 'latin_text_corpus_grammaticorum_latinorum']" | |||
] | |||
}, | |||
"execution_count": 3, | |||
"metadata": {}, | |||
"output_type": "execute_result" | |||
} | |||
], | |||
"source": [ | |||
"my_latin_downloader.list_corpora" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"# Import several corpora" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": 4, | |||
"metadata": { | |||
"collapsed": true, | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"my_latin_downloader.import_corpus('latin_text_latin_library')\n", | |||
"my_latin_downloader.import_corpus('latin_models_cltk')" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"You can verify the files were downloaded in the Terminal with `$ ls -l ~/cltk_data/latin/text/latin_text_latin_library/`" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": 5, | |||
"metadata": { | |||
"collapsed": false, | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"outputs": [ | |||
{ | |||
"data": { | |||
"text/plain": [ | |||
"['greek_software_tlgu',\n", | |||
" 'greek_text_perseus',\n", | |||
" 'phi7',\n", | |||
" 'tlg',\n", | |||
" 'greek_proper_names_cltk',\n", | |||
" 'greek_models_cltk',\n", | |||
" 'greek_treebank_perseus',\n", | |||
" 'greek_lexica_perseus',\n", | |||
" 'greek_training_set_sentence_cltk',\n", | |||
" 'greek_word2vec_cltk',\n", | |||
" 'greek_text_lacus_curtius']" | |||
] | |||
}, | |||
"execution_count": 5, | |||
"metadata": {}, | |||
"output_type": "execute_result" | |||
} | |||
], | |||
"source": [ | |||
"# Let's get a Greek corpus, too\n", | |||
"\n", | |||
"my_greek_downloader = CorpusImporter('greek')\n", | |||
"my_greek_downloader.list_corpora" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": 6, | |||
"metadata": { | |||
"collapsed": false, | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"my_greek_downloader.import_corpus('greek_text_lacus_curtius')" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"metadata": { | |||
"deletable": true, | |||
"editable": true | |||
}, | |||
"source": [ | |||
"Likewise, verify with `ls -l ~/cltk_data/greek/text/greek_text_lacus_curtius/plain/`" | |||
] | |||
} | |||
], | |||
"metadata": { | |||
"kernelspec": { | |||
"display_name": "Python 3", | |||
"language": "python", | |||
"name": "python3" | |||
}, | |||
"language_info": { | |||
"codemirror_mode": { | |||
"name": "ipython", | |||
"version": 3 | |||
}, | |||
"file_extension": ".py", | |||
"mimetype": "text/x-python", | |||
"name": "python", | |||
"nbconvert_exporter": "python", | |||
"pygments_lexer": "ipython3", | |||
"version": "3.6.0" | |||
} | |||
}, | |||
"nbformat": 4, | |||
"nbformat_minor": 1 | |||
} |
Oops, something went wrong.