Permalink
Browse files

proofread and update

  • Loading branch information...
aurelberra committed Feb 4, 2018
1 parent 4d0d597 commit 58748187f8ee45214b4f4dd7ab274cee1c89a61d
@@ -8,7 +8,7 @@
"\n",
"## Mac\n",
"\n",
"<https://www.python.org/downloads/> (currently is 3.5.2)\n",
"See <https://www.python.org/downloads/> (current version is 3.6.4).\n",
"\n",
"\n",
"## Linux\n",
@@ -17,13 +17,11 @@
"\n",
"``` bash\n",
"$ curl -O https://raw.githubusercontent.com/kylepjohnson/python3_bootstrap/master/install.sh\n",
"\n",
"$ chmod +x install.sh\n",
"\n",
"$ ./install.sh\n",
"```\n",
"\n",
"This Linux build from source will take ~5 mins."
"This Linux build from source will take around 5 minutes."
]
},
{
@@ -32,16 +30,16 @@
"source": [
"# Install Git\n",
"\n",
"CLTK uses Git for corpus management. For Mac, install it from here: <https://git-scm.com/downloads>. For Linux, check if present (`git --version`); if not then use your package manager to get it (e.g., `apt-get install git`)."
"The CLTK uses Git for corpus management. For Mac, install it from here: <https://git-scm.com/downloads>. For Linux, check if present (`git --version`); if not then use your package manager to get it (e.g., `apt-get install git`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Make virtual environment\n",
"# Create a virtual environment\n",
"\n",
"This makes a special environment (a \"sandbox\") just for the cltk. If something goes wrong, you can just delete it and start again.\n",
"This makes a special environment (a \"sandbox\") just for the CLTK. If something goes wrong, you can just delete it and start again.\n",
"\n",
"``` bash\n",
"$ cd ~/\n",
@@ -83,9 +81,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Test Jupter\n",
"# Test Jupyter\n",
"\n",
"Launch a notebook (such as this one) from the Terminal with `jupyter notebook`. Then open your preferred browser to <http://localhost:8888>."
"From your `cltk` directory, launch a notebook (such as this one) from the Terminal with `jupyter notebook`. Then open your preferred browser to <http://localhost:8888>."
]
},
{
@@ -94,7 +92,7 @@
"source": [
"# Download these tutorials\n",
"\n",
"Now or sometime later, you may find these instructions at <https://github.com/kylepjohnson/notebooks/tree/master/public_talks/2016_12_08_harvard_classics>."
"You may find these instructions at <https://github.com/cltk/tutorials>."
]
},
{
@@ -103,7 +101,7 @@
"source": [
"# Join GitHub\n",
"\n",
"A nice way to share code. Do this later, then come visit us at <https://github.com/cltk/cltk/>."
"GitHub is a nice way to share code. Come visit us at <https://github.com/cltk/cltk/>!"
]
}
],
@@ -123,7 +121,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.6.4"
}
},
"nbformat": 4,
@@ -6,7 +6,7 @@
"source": [
"The CLTK has a distributed infrastructure that lets you download official CLTK texts or other corpora shared by others. For full docs, see <http://docs.cltk.org/en/latest/importing_corpora.html>.\n",
"\n",
"To get started, from the Terminal, open a new Jupyter notebook from within your `~/cltk` directory (see notebook 1 for instructions): `jupyter notebook`. Then go to <http://localhost:8888>."
"To get started, from the Terminal, open a new Jupyter notebook from within your `~/cltk` directory (see notebook 1 \"CLTK Setup\" for instructions): `jupyter notebook`. Then go to <http://localhost:8888>."
]
},
{
@@ -20,34 +20,31 @@
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# this is the import of the right part of the CLTK library\n",
"# This is the import of the right part of the CLTK library\n",
"\n",
"from cltk.corpus.utils.importer import CorpusImporter"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# See https://github.com/cltk for all official corpora\n",
"\n",
"my_latin_downloader = CorpusImporter('latin')\n",
"\n",
"# 'my_latin_downloader' is the variable by which we now call the CorpusImporter"
"# Now 'my_latin_downloader' is the variable by which we call the CorpusImporter"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"outputs": [
{
@@ -70,7 +67,7 @@
" 'latin_text_poeti_ditalia']"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -88,10 +85,8 @@
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"my_latin_downloader.import_corpus('latin_text_latin_library')\n",
@@ -335,9 +330,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Convert TEI XML corpus\n",
"# Convert TEI XML texts\n",
"\n",
"Here we'll convert the 1K Years' Greek corpus from TEI XML to plaintext"
"Here we'll convert the First 1K Years' Greek corpus from TEI XML to plain text."
]
},
{
@@ -358,7 +353,7 @@
"outputs": [],
"source": [
"#! If you get the following error: 'Install `bs4` and `lxml` to parse these TEI files.'\n",
"# then run: `pip install bs4 lxml`\n",
"# then run: `pip install bs4 lxml`.\n",
"\n",
"onekgreek_tei_xml_to_text()"
]
@@ -377,15 +372,16 @@
}
],
"source": [
"# count the converted plaintext files:\n",
"# Count the converted plaintext files\n",
"\n",
"!ls -l ~/cltk_data/greek/text/greek_text_first1kgreek_plaintext/ | wc -l"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Import local corpus"
"# Import local corpora"
]
},
{
@@ -458,7 +454,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.6.4"
}
},
"nbformat": 4,
Oops, something went wrong.

0 comments on commit 5874818

Please sign in to comment.