Feature/accuracy #73

Rambatino · 2017-07-04T13:40:09Z

This adds an accuracy method to Tree.

fixes #72

codecov · 2017-07-04T13:43:27Z

Codecov Report

Merging #73 into master will increase coverage by 0.25%.
The diff coverage is 97.77%.

@@            Coverage Diff             @@
##           master      #73      +/-   ##
==========================================
+ Coverage    92.6%   92.85%   +0.25%     
==========================================
  Files           7        7              
  Lines         500      532      +32     
==========================================
+ Hits          463      494      +31     
- Misses         37       38       +1

Impacted Files	Coverage Δ
CHAID/node.py	`89.47% <100%> (-1.84%)`	⬇️
CHAID/tree.py	`91.72% <100%> (+1.63%)`	⬆️
CHAID/column.py	`89.3% <94.44%> (+0.65%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9acedff...0e53de8. Read the comment docs.

…ou print_tree() spec 🙏

Rambatino · 2017-07-07T21:49:28Z

CHAID/tree.py

@@ -111,13 +113,14 @@ def from_pandas_df(df, i_variables, d_variable, alpha_merge=0.05, max_depth=2,
            the type of dependent variable. Supported variable types are 'categorical' or
            'continuous'
        """
-        ind_df = df[list(i_variables.keys())]
+        df_ordered_keys = [x for x in df.columns if x in i_variables.keys()]


This is kinda gross. Will need to discuss this with you @xulaus, it's logic that seems to be the best in a given situation, but isn't foolproof. Actually, I don't think this can every go into prod.

Rambatino · 2017-07-07T21:49:46Z

CHAID/tree.py

        ind_values = ind_df.values
        dep_values = df[d_variable].values
        weights = df[weight] if weight is not None else None
        return Tree(ind_values, dep_values, alpha_merge, max_depth, min_parent_node_size,
                    min_child_node_size, list(ind_df.columns.values), split_threshold, weights,
-                    list(i_variables.values()), dep_variable_type)
+                    [i_variables[key] for key in df_ordered_keys], dep_variable_type)


same here ☝️

Rambatino · 2017-07-07T21:50:21Z

CHAID/tree.py

+                else:
+                    tree_predictions = tree_predictions.append(pd.DataFrame([[node.node_id, node.predict]] * len(index), index=index))
+        tree_predictions.columns = ['node_id', 'prediction']
+        # need to retroactively fill missing values


dataset may not contain all the other i variables at that node, and thus they will be missed off of the node

Mark Ramotowski added 2 commits June 14, 2017 22:53

Actually put v4.0.0 in the changelog

0583e8e

Added way of calculating accuracy

bce32c5

Mark Ramotowski added 3 commits July 6, 2017 18:02

Refactored the frequency determination to column

46f7d4f

Added working accuracy method

819b008

Fixed bug with differeing order of columns to variable types. Thank-y…

0e53de8

…ou print_tree() spec 🙏

Rambatino commented Jul 7, 2017

View reviewed changes

Rambatino mentioned this pull request Mar 2, 2018

Couple questions #85

Closed

Rambatino closed this Sep 20, 2018

Rambatino deleted the feature/accuracy branch September 20, 2018 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/accuracy #73

Feature/accuracy #73

Rambatino commented Jul 4, 2017

codecov bot commented Jul 4, 2017 •

edited

Loading

Rambatino Jul 7, 2017

Rambatino Jul 7, 2017

Rambatino Jul 7, 2017

Feature/accuracy #73

Feature/accuracy #73

Conversation

Rambatino commented Jul 4, 2017

codecov bot commented Jul 4, 2017 • edited Loading

Codecov Report

Rambatino Jul 7, 2017

Choose a reason for hiding this comment

Rambatino Jul 7, 2017

Choose a reason for hiding this comment

Rambatino Jul 7, 2017

Choose a reason for hiding this comment

codecov bot commented Jul 4, 2017 •

edited

Loading