Formula of the Kalinsky-Harabasz score and minor corrections

ML4DS · Nov 18, 2019 · b267c0c · b267c0c
1 parent 3c4b75d
commit b267c0c
Show file tree

Hide file tree

Showing 7 changed files with 65,253 additions and 98 deletions.
diff --git a/TM1.IntrodNLP/NLP_py2_wikitools/notebooks/TM1_NLP.ipynb b/TM1.IntrodNLP/NLP_py2_wikitools/notebooks/TM1_NLP.ipynb
@@ -977,7 +977,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** Exercise**: There are usually many tokens that appear with very low frequency in the corpus. Count the number of tokens appearing only once, and what is the proportion of them in the token list."
+    "**Exercise**: There are usually many tokens that appear with very low frequency in the corpus. Count the number of tokens appearing only once, and what is the proportion of them in the token list."
    ]
   },
   {
@@ -1006,7 +1006,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** Exercise**: Represent graphically those 20 tokens that appear in the highest number of articles. Note that you can use the code above (headed by `# SORTED TOKEN FREQUENCIES`) with a very minor modification."
+    "**Exercise**: Represent graphically those 20 tokens that appear in the highest number of articles. Note that you can use the code above (headed by `# SORTED TOKEN FREQUENCIES`) with a very minor modification."
    ]
   },
   {
@@ -1056,7 +1056,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** Exercise**: Count the number of tokens appearing only in a single article.\n"
+    "**Exercise**: Count the number of tokens appearing only in a single article.\n"
    ]
   },
   {
@@ -1074,7 +1074,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** Exercise** (*All in one*): Note that, for pedagogical reasons, we have used a different `for` loop for each text processing step creating a new `corpus_xxx` variable after each step. For very large corpus, this could cause memory problems. \n",
+    "**Exercise** (*All in one*): Note that, for pedagogical reasons, we have used a different `for` loop for each text processing step creating a new `corpus_xxx` variable after each step. For very large corpus, this could cause memory problems. \n",
     "\n",
     "As a summary exercise, repeat the whole text processing, starting from corpus_text up to computing the bow, with the following modifications:\n",
     "\n",
@@ -1099,7 +1099,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** Exercise** (*Visualizing categories*): Repeat the previous exercise with a second wikipedia category. For instance, you can take \"communication\". \n",
+    "**Exercise** (*Visualizing categories*): Repeat the previous exercise with a second wikipedia category. For instance, you can take \"communication\". \n",
     "\n",
     "1. Save the result in variable `corpus_bow2`.\n",
     "2. Determine the most frequent terms in `corpus_bow1` (`term1`) and `corpus_bow2` (`term2`).\n",
@@ -1123,7 +1123,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** Exercise ** (bigrams): `nltk` provides an utility to compute n-grams from a list of tokens, in `nltk.util.ngrams`. Join all tokens in `corpus_clean` in a single list and compute the bigrams. Plot the 20 most frequent bigrams in the corpus."
+    "**Exercise** (bigrams): `nltk` provides an utility to compute n-grams from a list of tokens, in `nltk.util.ngrams`. Join all tokens in `corpus_clean` in a single list and compute the bigrams. Plot the 20 most frequent bigrams in the corpus."
    ]
   },
   {
@@ -1148,21 +1148,21 @@
   "anaconda-cloud": {},
   "celltoolbar": "Slideshow",
   "kernelspec": {
-   "display_name": "Python [conda root]",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "conda-root-py"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 3
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.14"
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
   }
  },
  "nbformat": 4,

diff --git a/U1.KMeans/.ipynb_checkpoints/KMeans-checkpoint.ipynb b/U1.KMeans/.ipynb_checkpoints/KMeans-checkpoint.ipynb