Skip to content

Commit

Permalink
pass pandas to ds_tut, add ds_tut to pip, add colab link
Browse files Browse the repository at this point in the history
  • Loading branch information
ephes committed Oct 26, 2023
1 parent 3cbbf71 commit 1ed7993
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 12 deletions.
2 changes: 1 addition & 1 deletion index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"source": [
"# Text Classification\n",
"\n",
"Maybe let's start by doing some [text classification](text_classification.ipynb)."
"Maybe let's start by doing some [text classification](text_classification.ipynb) ([open in google colab[(https://colab.research.google.com/github/ephes/data_science_tutorial/blob/main/text_classification.ipynb))."
]
},
{
Expand Down
17 changes: 6 additions & 11 deletions text_classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
}
],
"source": [
"%pip install -Uqq pandas seaborn"
"%pip install -Uqq pandas seaborn ds_tut"
]
},
{
Expand All @@ -50,12 +50,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reuters-21578\n",
"# Text Classification (Reuters-21578)\n",
"\n",
"## Additional Information About the Dataset\n",
"- [Paper comparing different text categorization methods (Thorsten Joachims 1998)](https://www.cs.cornell.edu/people/tj/publications/joachims_98a.pdf)\n",
"- [Dataset Readme](http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt)\n",
"- [Link to Dataset Card on Hugging Face](https://huggingface.co/datasets/reuters21578)"
"- [Link to Dataset Card on Hugging Face](https://huggingface.co/datasets/reuters21578)\n",
"\n",
"Maybe just use the dataset from Hugging Face instead of downloading and parsing manually? TODO"
]
},
{
Expand Down Expand Up @@ -176,7 +178,7 @@
"\n",
"documents = pickle.load(open(reuters_documents_path, \"rb\"))\n",
"reuters = pickle.load(open(reuters_corpus_path, \"rb\"))\n",
"df, top_ten_ids, train_labels, test_labels = reuters.build_dataframe()\n",
"df, top_ten_ids, train_labels, test_labels = reuters.build_dataframe(pd=pd)\n",
"train, test = reuters.split_modapte()"
]
},
Expand Down Expand Up @@ -394,13 +396,6 @@
" \"text-classification/images/TextClassificationFlowchart.png\"\n",
"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit 1ed7993

Please sign in to comment.