Skip to content

ScienceOctopus/octoflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Octopus

TODOs

  • Text analysis: Looking at patterns of problem vs non-problem statements eg. often occuring bigrams, trigrams, phrases. Interesting library: scattertext

  • Rule-based matching as final processing step (after model prediction) to clean false positives and false negatives. Either regex or spaCy's Phrase Matcher [interactive] are good options

  • Hierarchical Clustering: exploratory notebooks understanding the current SotA in unsupervised clustering and trying promising libraries or algorithms with Octopus' data and seeing if it’s feasible

  • DevOps: hooks, AWS configs, scripts, GH actions and general CI / CD for successful testing, validating and building workflows

  • Software 2.0 Infra: Setup of an active learning for efficient human labeling using prodi.gy, labelstud.io or similar

  • Bespoke App for Language Model Interpretation ala Markus' Netlens

  • Clustering and Analysis (use clusteval or hnet) or define custom cluster-quality metric. Try different approaches (HDBSCAN, UMAP, T-SNE)

  • Bespoke App for open source contributors to label data and create Regex-like pattern matching through an easy to learn syntax eliminating/supporting software dev / modeling

  • Advanced: Automatic Pattern discovery : Given, examples of text, find the underlying common patterns of subsets of them. This probably involves evolutionary algorithms, a good comp. linguistics knowledge and will warrant a stand-alone library. Example: PatternOmatic(doesn't really work)

DATASETs /datasets

About

Classifiers, Clustering and crunching text data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages