pass pandas to ds_tut, add ds_tut to pip, add colab link

ephes · Oct 26, 2023 · 1ed7993 · 1ed7993
1 parent 3cbbf71
commit 1ed7993
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 12 deletions.
diff --git a/index.ipynb b/index.ipynb
@@ -17,7 +17,7 @@
    "source": [
     "# Text Classification\n",
     "\n",
-    "Maybe let's start by doing some [text classification](text_classification.ipynb)."
+    "Maybe let's start by doing some [text classification](text_classification.ipynb) ([open in google colab[(https://colab.research.google.com/github/ephes/data_science_tutorial/blob/main/text_classification.ipynb))."
    ]
   },
   {

diff --git a/text_classification.ipynb b/text_classification.ipynb
@@ -23,7 +23,7 @@
     }
    ],
    "source": [
-    "%pip install -Uqq pandas seaborn"
+    "%pip install -Uqq pandas seaborn ds_tut"
    ]
   },
   {
@@ -50,12 +50,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Reuters-21578\n",
+    "# Text Classification (Reuters-21578)\n",
     "\n",
     "## Additional Information About the Dataset\n",
     "- [Paper comparing different text categorization methods (Thorsten Joachims 1998)](https://www.cs.cornell.edu/people/tj/publications/joachims_98a.pdf)\n",
     "- [Dataset Readme](http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt)\n",
-    "- [Link to Dataset Card on Hugging Face](https://huggingface.co/datasets/reuters21578)"
+    "- [Link to Dataset Card on Hugging Face](https://huggingface.co/datasets/reuters21578)\n",
+    "\n",
+    "Maybe just use the dataset from Hugging Face instead of downloading and parsing manually? TODO"
    ]
   },
   {
@@ -176,7 +178,7 @@
     "\n",
     "documents = pickle.load(open(reuters_documents_path, \"rb\"))\n",
     "reuters = pickle.load(open(reuters_corpus_path, \"rb\"))\n",
-    "df, top_ten_ids, train_labels, test_labels = reuters.build_dataframe()\n",
+    "df, top_ten_ids, train_labels, test_labels = reuters.build_dataframe(pd=pd)\n",
     "train, test = reuters.split_modapte()"
    ]
   },
@@ -394,13 +396,6 @@
     "    \"text-classification/images/TextClassificationFlowchart.png\"\n",
     "))"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {