Skip to content

Commit

Permalink
move data science tools wvu https://github.com/WinVector/wvu
Browse files Browse the repository at this point in the history
  • Loading branch information
JohnMount committed Aug 28, 2022
1 parent 141a1e8 commit f55ebc5
Show file tree
Hide file tree
Showing 37 changed files with 1,434 additions and 7,560 deletions.
436 changes: 0 additions & 436 deletions README.ipynb

This file was deleted.

255 changes: 3 additions & 252 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,258 +1,9 @@
[wvpy](https://github.com/WinVector/wvpy) is a simple
set of utilities for teaching data science and machine learning methods.
They are not replacements for the obvious methods in sklearn.

Some notes on the Jupyter sheet runner can be found [here](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/)
wvpy tools for converting Jupyter notebooks to and from Python files.

Text and video tutotials here: [https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/).

```python
import numpy.random
import pandas
import wvpy.util
Many of the data science functions have been moved to wvu [https://github.com/WinVector/wvu](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/).

wvpy.__version__
```




'0.2.7'



Illustration of cross-method plan.


```python
wvpy.util.mk_cross_plan(10,2)
```




[{'train': [1, 2, 3, 4, 9], 'test': [0, 5, 6, 7, 8]},
{'train': [0, 5, 6, 7, 8], 'test': [1, 2, 3, 4, 9]}]



Plotting example


```python
help(wvpy.util.plot_roc)
```

Help on function plot_roc in module wvpy.util:

plot_roc(prediction, istrue, title='Receiver operating characteristic plot', *, truth_target=True, ideal_line_color=None, extra_points=None, show=True)
Plot a ROC curve of numeric prediction against boolean istrue.
:param prediction: column of numeric predictions
:param istrue: column of items to predict
:param title: plot title
:param truth_target: value to consider target or true.
:param ideal_line_color: if not None, color of ideal line
:param extra_points: data frame of additional point to annotate graph, columns fpr, tpr, label
:param show: logical, if True call matplotlib.pyplot.show()
:return: calculated area under the curve, plot produced by call.
Example:
import pandas
import wvpy.util
d = pandas.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [False, False, True, True, False]
})
wvpy.util.plot_roc(
prediction=d['x'],
istrue=d['y'],
ideal_line_color='lightgrey'
)
wvpy.util.plot_roc(
prediction=d['x'],
istrue=d['y'],
extra_points=pandas.DataFrame({
'tpr': [0, 1],
'fpr': [0, 1],
'label': ['AAA', 'BBB']
})
)




```python
d = pandas.concat([
pandas.DataFrame({
'x': numpy.random.normal(size=1000),
'y': numpy.random.choice([True, False],
p=(0.02, 0.98),
size=1000,
replace=True)}),
pandas.DataFrame({
'x': numpy.random.normal(size=200) + 5,
'y': numpy.random.choice([True, False],
size=200,
replace=True)}),
])
```


```python
wvpy.util.plot_roc(
prediction=d.x,
istrue=d.y,
ideal_line_color="DarkGrey",
title='Example ROC plot')
```


<Figure size 432x288 with 0 Axes>




![png](output_7_1.png)






0.903298366883511




```python
help(wvpy.util.threshold_plot)
```

Help on function threshold_plot in module wvpy.util:

threshold_plot(d: pandas.core.frame.DataFrame, pred_var, truth_var, truth_target=True, threshold_range=(-inf, inf), plotvars=('precision', 'recall'), title='Measures as a function of threshold', *, show=True)
Produce multiple facet plot relating the performance of using a threshold greater than or equal to
different values at predicting a truth target.
:param d: pandas.DataFrame to plot
:param pred_var: name of column of numeric predictions
:param truth_var: name of column with reference truth
:param truth_target: value considered true
:param threshold_range: x-axis range to plot
:param plotvars: list of metrics to plot, must come from ['threshold', 'count', 'fraction', 'precision',
'true_positive_rate', 'false_positive_rate', 'true_negative_rate', 'false_negative_rate',
'recall', 'sensitivity', 'specificity']
:param title: title for plot
:param show: logical, if True call matplotlib.pyplot.show()
:return: None, plot produced as a side effect
Example:
import pandas
import wvpy.util
d = pandas.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [False, False, True, True, False]
})
wvpy.util.threshold_plot(
d,
pred_var='x',
truth_var='y',
plotvars=("sensitivity", "specificity"),
)




```python
wvpy.util.threshold_plot(
d,
pred_var='x',
truth_var='y',
plotvars=("sensitivity", "specificity"),
title = "example plot"
)
```



![png](output_9_0.png)




```python

wvpy.util.threshold_plot(
d,
pred_var='x',
truth_var='y',
plotvars=("precision", "recall"),
title = "example plot"
)
```



![png](output_10_0.png)




```python
help(wvpy.util.gain_curve_plot)
```

Help on function gain_curve_plot in module wvpy.util:

gain_curve_plot(prediction, outcome, title='Gain curve plot', *, show=True)
plot cumulative outcome as a function of prediction order (descending)
:param prediction: vector of numeric predictions
:param outcome: vector of actual values
:param title: plot title
:param show: logical, if True call matplotlib.pyplot.show()
:return: None




```python
wvpy.util.gain_curve_plot(
prediction=d['x'],
outcome=d['y'],
title = "gain curve plot"
)
```



![png](output_12_0.png)




```python
wvpy.util.lift_curve_plot(
prediction=d['x'],
outcome=d['y'],
title = "lift curve plot"
)
```



![png](output_13_0.png)




```python

```
26 changes: 6 additions & 20 deletions coverage.txt
Original file line number Diff line number Diff line change
@@ -1,34 +1,20 @@
============================= test session starts ==============================
platform darwin -- Python 3.9.7, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /Users/johnmount/Documents/work/wvpy/pkg
plugins: anyio-3.5.0, cov-3.0.0
collected 20 items
collected 4 items

tests/test_cross_plan1.py . [ 5%]
tests/test_cross_predict.py .. [ 15%]
tests/test_deviance_calc.py . [ 20%]
tests/test_eval_fn_pre_row.py . [ 25%]
tests/test_match_auc.py . [ 30%]
tests/test_nb_fns.py .... [ 50%]
tests/test_onehot.py .. [ 60%]
tests/test_perm_score_vars.py . [ 65%]
tests/test_plots.py . [ 70%]
tests/test_se.py . [ 75%]
tests/test_search_grid.py .. [ 85%]
tests/test_stats1.py . [ 90%]
tests/test_threshold_stats.py . [ 95%]
tests/test_typs_in_frame.py . [100%]
tests/test_nb_fns.py .... [100%]

---------- coverage: platform darwin, python 3.9.7-final-0 -----------
---------- coverage: platform darwin, python 3.9.12-final-0 ----------
Name Stmts Miss Cover
---------------------------------------------
wvpy/__init__.py 3 0 100%
wvpy/jtools.py 206 76 63%
wvpy/pysheet.py 99 99 0%
wvpy/render_workbook.py 54 54 0%
wvpy/util.py 321 7 98%
---------------------------------------------
TOTAL 683 236 65%
TOTAL 362 229 37%


============================= 20 passed in 12.71s ==============================
============================== 4 passed in 8.92s ===============================
Binary file removed output_10_0.png
Binary file not shown.
Binary file removed output_12_0.png
Binary file not shown.
Binary file removed output_13_0.png
Binary file not shown.
Binary file removed output_7_1.png
Binary file not shown.
Binary file removed output_9_0.png
Binary file not shown.
4 changes: 3 additions & 1 deletion pkg/README.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Win Vector LLC tools for doing and teaching data science in Python 3
Win Vector LLC tools for converting Python Jupyter to and from Python source files
https://github.com/WinVector/wvpy

Some notes can be found here: https://github.com/WinVector/wvpy
and here: https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/

Many of the data science functions have been moved to wvu https://github.com/WinVector/wvu


0 comments on commit f55ebc5

Please sign in to comment.