Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
391e8ae
updates for gs/rs
dmorgankx Jul 28, 2020
ba7c41b
run with the removal of errors
cmccarthy1 Jul 28, 2020
a17d4e2
Merge pull request #4 from cmccarthy1/v1_0_0_review
dmorgankx Jul 29, 2020
e6bd3c3
change to images path required for .md display, update to AutoML note…
cmccarthy1 Jul 29, 2020
40695cd
Merge pull request #5 from cmccarthy1/v1_0_0_review
dmorgankx Jul 29, 2020
14fd86a
addition of feature impact/confmat for automl
dmorgankx Jul 30, 2020
239751d
updated Automl to reflect NLP addition. Fixed dockerfile
Dianeod Aug 25, 2020
f8831d1
added in images for AutoML NLP update
Dianeod Aug 25, 2020
5a26a88
removed image directory in docker
Dianeod Aug 25, 2020
38874b3
Merge pull request #7 from Dianeod/v1_0_0
dmorgankx Aug 26, 2020
5d5ada9
new clustering updates
dmorgankx Sep 10, 2020
97936fb
hc fixes
dmorgankx Sep 10, 2020
9fa49fa
ap fixes
dmorgankx Sep 11, 2020
2020acb
added time series notebooks
Dianeod Sep 11, 2020
4407c3c
updated docker to use pip to install ml requirements
Dianeod Sep 11, 2020
44811cc
added result show for ap
dmorgankx Sep 11, 2020
edb002c
rename notebook
Dianeod Sep 11, 2020
dcba842
updated README
Dianeod Sep 11, 2020
295ab73
updated README
Dianeod Sep 11, 2020
5e18d75
Merge branch 'v1_0_0' of https://github.com/Dianeod/mlnotebooks into …
dmorgankx Sep 14, 2020
70e5324
Delete 13 Time Series Forecasting.ipynb
Dianeod Sep 14, 2020
af03993
Merge branch 'v1_0_0' of https://github.com/Dianeod/mlnotebooks into …
dmorgankx Sep 14, 2020
1ee5497
time series review
dmorgankx Sep 14, 2020
c042cf3
Merge pull request #2 from dmorgankx/v1_0_0
Dianeod Sep 15, 2020
e8b8d3c
added extra notes for TS notebook
Dianeod Sep 15, 2020
f94d438
clustering updates
dmorgankx Sep 21, 2020
55052b6
Merge branch 'v1_0_0' of https://github.com/dmorgankx/mlnotebooks int…
Dianeod Sep 21, 2020
6e4f3c5
nlp updates
dmorgankx Sep 22, 2020
ceba8ae
Merge branch 'v1_0_0' of https://github.com/dmorgankx/mlnotebooks int…
Dianeod Sep 22, 2020
c0882a7
update to time series notebook, change to utilities to use util names…
cmccarthy1 Sep 22, 2020
c37fafd
Merge branch 'timeseries_review' of https://github.com/cmccarthy1/mln…
dmorgankx Sep 22, 2020
be71b77
clustering and automl review
Dianeod Sep 22, 2020
6d8857e
updated graphics
Dianeod Sep 22, 2020
83041bc
pulled updated version
Dianeod Sep 22, 2020
1a7ae1e
pulled updated version
Dianeod Sep 22, 2020
8ee2b1b
Merge pull request #8 from Dianeod/v1_0_0
dmorgankx Sep 22, 2020
98f7a44
general plotting functions
dmorgankx Sep 22, 2020
c489b6d
review of time series and utils update
Dianeod Sep 23, 2020
7109995
Merge branch 'v1_0_0' of https://github.com/dmorgankx/mlnotebooks int…
Dianeod Sep 23, 2020
124d7f4
cluster update
Dianeod Sep 24, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The Kx NLP library can be used to answer a variety of questions about unstructur

## ML-Toolkit

The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, and clustering algorithms.
The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, clustering algorithms, time series forecasting models and feature engineering functions.

## AutoML

Expand Down Expand Up @@ -47,6 +47,8 @@ The contents of the notebooks are as follows:

11. **Clustering**: Examples of how to use the k-means, DBSCAN, affinity propagation, hierarchical and CURE algorithms available within the ML-Toolkit are provided. The notebook demonstrates how to effectively visualize results produced and make use of scoring functions contained within the toolkit. A real-world application is also included.

12. **Time Series Forecasting**: The notebook looks at a variety of time series forecasting models contained within the ML-Toolkit such as AR, ARIMA and SARIMA models along with time series specific feature engineering tools for passing time series data to supervised machine learning models.

## Requirements

- kdb+>=? v3.5 64-bit
Expand Down Expand Up @@ -88,4 +90,4 @@ For subsequent runs, you will not be prompted to redo the license setup when cal
docker start -ai mymlnotebooks


**N.B.** [build instructions for the image are available](docker/README.md)
**N.B.** [build instructions for the image are available](docker/README.md)
25,001 changes: 25,001 additions & 0 deletions data/IMBD.csv

Large diffs are not rendered by default.

17,415 changes: 17,415 additions & 0 deletions data/london_merged.csv

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ FROM jupyterq AS mlnotebooks

COPY requirements.txt README.md /opt/kx/mlnotebooks/
COPY data/ /opt/kx/mlnotebooks/data/
COPY images/ /opt/kx/mlnotebooks/images/
COPY notebooks/ /opt/kx/mlnotebooks/notebooks/
COPY utils/ /opt/kx/mlnotebooks/utils/
#hack, better way, tensorflow-gpu should be used if possible
Expand Down Expand Up @@ -65,10 +64,10 @@ USER kx
RUN . /opt/conda/etc/profile.d/conda.sh \
&& conda activate kx \
&& conda install --file /opt/kx/nlp/requirements.txt \
&& conda update wrapt \
&& pip install -r /opt/kx/mlnotebooks/requirements.txt \
&& conda install -c anaconda graphviz \
&& conda install -c conda-forge --file /opt/kx/ml/requirements.txt \
&& pip install pip==9.0.1 \
&& pip install -r /opt/kx/ml/requirements.txt \
&& conda install -c conda-forge --file /opt/kx/automl/requirements.txt \
&& conda clean -y --all \
&& python -m spacy download en \
Expand Down
227 changes: 114 additions & 113 deletions notebooks/01 Decision Trees.ipynb

Large diffs are not rendered by default.

87 changes: 44 additions & 43 deletions notebooks/02 Random Forests.ipynb

Large diffs are not rendered by default.

103 changes: 52 additions & 51 deletions notebooks/03 Neural Networks.ipynb

Large diffs are not rendered by default.

183 changes: 92 additions & 91 deletions notebooks/04 Dimensionality Reduction.ipynb

Large diffs are not rendered by default.

53 changes: 27 additions & 26 deletions notebooks/05 Feature Engineering.ipynb

Large diffs are not rendered by default.

403 changes: 164 additions & 239 deletions notebooks/06 Feature Extraction and Selection.ipynb

Large diffs are not rendered by default.

319 changes: 221 additions & 98 deletions notebooks/07 Cross Validation.ipynb

Large diffs are not rendered by default.

18 changes: 9 additions & 9 deletions notebooks/08 Natural Language Processing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -361,17 +361,17 @@
],
"source": [
"/ plot occurence of top terms per chapter\n",
"plt[`:figure][`figsize pykw 20 10];\n",
".util.plt[`:figure][`figsize pykw 20 10];\n",
"{a:exec chapter from tab where term=x;\n",
" b:exec occurences from tab where term=x;\n",
" plt[`:plot][a;b];\n",
" .util.plt[`:plot][a;b];\n",
" }each key 10#keywords; \n",
"\n",
"plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
"plt[`:ylabel]\"Occurences\";\n",
"plt[`:xlabel]\"Chapter\";\n",
"plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
"plt[`:show][];"
".util.plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
".util.plt[`:ylabel]\"Occurences\";\n",
".util.plt[`:xlabel]\"Chapter\";\n",
".util.plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
".util.plt[`:show][];"
]
},
{
Expand Down Expand Up @@ -1146,7 +1146,7 @@
"source": [
"#This table can then be used to plot a graph. The below example was rendered in Analyst for Kx, where node size represents email volume.\n",
"\n",
"<img src=\"../images/network.png\" />"
"<img src=\"images/network.png\" />"
]
},
{
Expand Down Expand Up @@ -1466,7 +1466,7 @@
"file_extension": ".q",
"mimetype": "text/x-q",
"name": "q",
"version": "3.6.0"
"version": "4.0"
}
},
"nbformat": 4,
Expand Down
41 changes: 21 additions & 20 deletions notebooks/09 K Nearest Neighbours.ipynb

Large diffs are not rendered by default.

Loading