The following is a file Loader script that tweaks the existing sklearn.datasets load_files function to recursively load nested directory structure.
The Following script expects you to have installed sklearn. If not already installed do a : pip install sklearn or sudo pip install sklearn
Load text files with categories as subfolder names. Individual samples are assumed to be files stored a heirarchical folder structure
The folder names are used as supervised signal label names. The
individual file names are not important.
This function does not try to extract features into a numpy array or
scipy sparse matrix. In addition, if load_content is false it
does not try to load the files in memory.
To use text files in a scikit-learn classification or clustering
algorithm, you will need to use the sklearn.feature_extraction.text
module to build a feature extraction transformer that suits your
problem.
If you set load_content=True, you should also specify the encoding of
the text using the 'encoding' parameter. For many modern text files,
'utf-8' will be the correct encoding. If you leave encoding equal to None,
then the content will be made of bytes instead of Unicode, and you will
not be able to use most functions in sklearn.feature_extraction.text
.
Similar feature extractors should be built for other kind of unstructured
data input such as images, audio, video, ...
-
Notifications
You must be signed in to change notification settings - Fork 0
This is a file loader API tweak for the Sklearn.datasets module in python
License
aravindr18/FileLoader-API
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This is a file loader API tweak for the Sklearn.datasets module in python
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published