Skip to content

Commit

Permalink
add init from prev lecture
Browse files Browse the repository at this point in the history
  • Loading branch information
Kyle Johnson committed Mar 16, 2017
1 parent a4c4c32 commit fccb570
Show file tree
Hide file tree
Showing 9 changed files with 2,175 additions and 0 deletions.
164 changes: 164 additions & 0 deletions 1 CLTK Setup.ipynb
@@ -0,0 +1,164 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Join the Slack channel\n",
"\n",
"Send your email to `kyle@kyle-p-johnson.com` and he'll add you to this channel. Other instructions will be sent via this route."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Install Python\n",
"\n",
"## Mac\n",
"\n",
"<https://www.python.org/downloads/> (currently is 3.5.2)\n",
"\n",
"\n",
"## Linux\n",
"\n",
"Open Terminal and check current version with `python --version` or `python3 --version`. If 3.4 or 3.5, you're fine. If Python version is out of date, run these:\n",
"\n",
"``` bash\n",
"$ curl -O https://raw.githubusercontent.com/kylepjohnson/python3_bootstrap/master/install.sh\n",
"\n",
"$ chmod +x install.sh\n",
"\n",
"$ ./install.sh\n",
"```\n",
"\n",
"This Linux build from source will take ~5 mins."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Install Git\n",
"\n",
"CLTK uses Git for corpus management. For Mac, install it from here: <https://git-scm.com/downloads>. For Linux, check if present (`git --version`); if not then use your package manager to get it (e.g., `apt-get install git`)."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Make virtual environment\n",
"\n",
"This makes a special environment (a \"sandbox\") just for the cltk. If something goes wrong, you can just delete it and start again.\n",
"\n",
"``` bash\n",
"$ cd ~/\n",
"$ mkdir cltk\n",
"$ cd cltk\n",
"$ pyvenv venv\n",
"$ source venv/bin/activate\n",
"```\n",
"\n",
"Now you can see that you're not using your system Python but this particular one:\n",
"\n",
"``` bash\n",
"$ which python\n",
"```\n",
"\n",
"Note that every time you open a new Terminal window, you'll need to \"activate\" this environment with `source ~/cltk/venv/bin/activate`."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Install CLTK\n",
"\n",
"``` bash\n",
"$ pip install cltk\n",
"```\n",
"\n",
"This will take a few minutes, as it will install several \"dependencies\", being other Python libraries which the CLTK uses.\n",
"\n",
"Also install Jupyter, which is a really handy way of writing code.\n",
"\n",
"``` bash\n",
"$ pip install jupyter\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Test Jupter\n",
"\n",
"Launch a notebook (such as this one) from the Terminal with `jupyter notebook`. Then open your preferred browser to <http://localhost:8888>."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Download these tutorials\n",
"\n",
"Now or sometime later, you may find these instructions at <https://github.com/kylepjohnson/notebooks/tree/master/public_talks/2016_12_08_harvard_classics>."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Join GitHub\n",
"\n",
"A nice way to share code. Do this later, then come visit us at <https://github.com/cltk/cltk/>."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
212 changes: 212 additions & 0 deletions 2 Import corpora.ipynb
@@ -0,0 +1,212 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"The CLTK has a distributed infrastructure that lets you download official CLTK texts or other corpora shared by others. For full docs, see <http://docs.cltk.org/en/latest/importing_corpora.html>.\n",
"\n",
"To get started, from the Terminal, open a new Jupyter notebook from within your `~/cltk` directory (see notebook 1 for instructions): `jupyter notebook`. Then go to <http://localhost:8888>."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# See what corpora are available\n",
"\n",
"First we need to \"import\" the right part of the CLTK library. Think of this as pulling just the book you need off the shelf and having it ready to read."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# this is the import of the right part of the CLTK library\n",
"from cltk.corpus.utils.importer import CorpusImporter"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"# See https://github.com/cltk for all official corpora\n",
"\n",
"my_latin_downloader = CorpusImporter('latin')\n",
"\n",
"# 'my_latin_downloader' is the variable by which we now call the CorpusImporter"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"['latin_text_perseus',\n",
" 'latin_treebank_perseus',\n",
" 'latin_treebank_perseus',\n",
" 'latin_text_latin_library',\n",
" 'phi5',\n",
" 'phi7',\n",
" 'latin_proper_names_cltk',\n",
" 'latin_models_cltk',\n",
" 'latin_pos_lemmata_cltk',\n",
" 'latin_treebank_index_thomisticus',\n",
" 'latin_lexica_perseus',\n",
" 'latin_training_set_sentence_cltk',\n",
" 'latin_word2vec_cltk',\n",
" 'latin_text_antique_digiliblt',\n",
" 'latin_text_corpus_grammaticorum_latinorum']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_latin_downloader.list_corpora"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Import several corpora"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"my_latin_downloader.import_corpus('latin_text_latin_library')\n",
"my_latin_downloader.import_corpus('latin_models_cltk')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"You can verify the files were downloaded in the Terminal with `$ ls -l ~/cltk_data/latin/text/latin_text_latin_library/`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"data": {
"text/plain": [
"['greek_software_tlgu',\n",
" 'greek_text_perseus',\n",
" 'phi7',\n",
" 'tlg',\n",
" 'greek_proper_names_cltk',\n",
" 'greek_models_cltk',\n",
" 'greek_treebank_perseus',\n",
" 'greek_lexica_perseus',\n",
" 'greek_training_set_sentence_cltk',\n",
" 'greek_word2vec_cltk',\n",
" 'greek_text_lacus_curtius']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Let's get a Greek corpus, too\n",
"\n",
"my_greek_downloader = CorpusImporter('greek')\n",
"my_greek_downloader.list_corpora"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"my_greek_downloader.import_corpus('greek_text_lacus_curtius')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Likewise, verify with `ls -l ~/cltk_data/greek/text/greek_text_lacus_curtius/plain/`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

0 comments on commit fccb570

Please sign in to comment.