Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/accuracy #73

Closed
wants to merge 5 commits into from
Closed

Feature/accuracy #73

wants to merge 5 commits into from

Conversation

Rambatino
Copy link
Owner

This adds an accuracy method to Tree.

fixes #72

@codecov
Copy link

codecov bot commented Jul 4, 2017

Codecov Report

Merging #73 into master will increase coverage by 0.25%.
The diff coverage is 97.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #73      +/-   ##
==========================================
+ Coverage    92.6%   92.85%   +0.25%     
==========================================
  Files           7        7              
  Lines         500      532      +32     
==========================================
+ Hits          463      494      +31     
- Misses         37       38       +1
Impacted Files Coverage Δ
CHAID/node.py 89.47% <100%> (-1.84%) ⬇️
CHAID/tree.py 91.72% <100%> (+1.63%) ⬆️
CHAID/column.py 89.3% <94.44%> (+0.65%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9acedff...0e53de8. Read the comment docs.

@@ -111,13 +113,14 @@ def from_pandas_df(df, i_variables, d_variable, alpha_merge=0.05, max_depth=2,
the type of dependent variable. Supported variable types are 'categorical' or
'continuous'
"""
ind_df = df[list(i_variables.keys())]
df_ordered_keys = [x for x in df.columns if x in i_variables.keys()]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda gross. Will need to discuss this with you @xulaus, it's logic that seems to be the best in a given situation, but isn't foolproof. Actually, I don't think this can every go into prod.

ind_values = ind_df.values
dep_values = df[d_variable].values
weights = df[weight] if weight is not None else None
return Tree(ind_values, dep_values, alpha_merge, max_depth, min_parent_node_size,
min_child_node_size, list(ind_df.columns.values), split_threshold, weights,
list(i_variables.values()), dep_variable_type)
[i_variables[key] for key in df_ordered_keys], dep_variable_type)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here ☝️

else:
tree_predictions = tree_predictions.append(pd.DataFrame([[node.node_id, node.predict]] * len(index), index=index))
tree_predictions.columns = ['node_id', 'prediction']
# need to retroactively fill missing values
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset may not contain all the other i variables at that node, and thus they will be missed off of the node

@Rambatino Rambatino mentioned this pull request Mar 2, 2018
@Rambatino Rambatino closed this Sep 20, 2018
@Rambatino Rambatino deleted the feature/accuracy branch September 20, 2018 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why isn't there a predict function ?
1 participant