Reporting ptype outputs via schemas #62

tahaceritli · 2020-09-10T17:30:58Z

tahaceritli · 2020-09-10T20:50:30Z

I have made some progress on this. Please see #64.

rolyp · 2020-09-14T10:41:57Z

@tahaceritli If you add new issues to the Project (drop-down menu on the right) they’ll appear in the kanban board!

tahaceritli · 2020-09-14T14:18:48Z

Thanks for pointing these out.

Now that we are switching to fit_schema, transform_schema and fit_transform_schema, I don't think we will use get_final_df anymore.
Schema is just a dictionary that returns some of the properties of Column. I'm open to alternative solutions.
Yes. I had copied the one in test_ptype to the Ptype class so that it would be easier to reach in the notebook. The other version of as_normal in Ptype just takes different parameters - it takes the schema and obtains the normal values according to the schema. I will now remove the other one in Ptype.
Now that we are presenting information in schemas, I think we won't need to put the data type inside the header (that's why it's not used in the notebooks for now. we may need it in the future though.)
If we will follow the sklearn notation (e.g., https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), it should just take as input df. But then we will need to have the interaction with the user through setters and getters. Also, we would need to store schema internally. But if we want to implement the interactions as Gerrit has suggested earlier, we would also need to treat schema as an input. I think this would also be easier for the user. But perhaps that's just me. I'm also happy to follow the standard sklearn notation.
Btw I will soon add fit_transform_schema which takes only a df. This will infer the corresponding schema using fit_schema and then create a new data frame with the changes.
Thanks. That's true. In fact, the data frame was updated because of "pd.to_numeric(df[col_name], errors="coerce").astype(new_dtype)". It should be okay now. But there may be another way of doing this.

Let me know what you think. I will try to sort out item 6 and push my changes in the meantime.

GjjvdBurg · 2020-09-14T14:47:57Z

Minor comment regarding no. 5: This reminds me of an earlier discussion (original message in #11) that this fits in the scikit-learn "transformer" idea: you fit the schema with, say, ptype.fit(df), then cast the types using new_df = ptype.transform(df). This can be combined in new_df = ptype.fit_transform(df), and you could then cast a second dataset using the same schema with ptype.transform(df2).

tahaceritli added type:feature task:core-api labels Sep 10, 2020

tahaceritli changed the title ~~Producing schemas~~ Reporting ptype outputs Sep 10, 2020

tahaceritli changed the title ~~Reporting ptype outputs~~ Reporting ptype outputs via schemas Sep 14, 2020

GjjvdBurg mentioned this issue Sep 14, 2020

Downcast column to inferred Pandas data type #37

Closed

2 tasks

tahaceritli mentioned this issue Sep 14, 2020

removed get_final_df and made the functions pure #65

Merged

rolyp closed this as completed Sep 22, 2020

tahaceritli mentioned this issue Sep 23, 2020

Specify treatment of categorical value #78

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reporting ptype outputs via schemas #62

Reporting ptype outputs via schemas #62

tahaceritli commented Sep 10, 2020 •

edited

Loading

tahaceritli commented Sep 10, 2020

rolyp commented Sep 14, 2020

tahaceritli commented Sep 14, 2020 •

edited

Loading

GjjvdBurg commented Sep 14, 2020

Reporting ptype outputs via schemas #62

Reporting ptype outputs via schemas #62

Comments

tahaceritli commented Sep 10, 2020 • edited Loading

tahaceritli commented Sep 10, 2020

rolyp commented Sep 14, 2020

tahaceritli commented Sep 14, 2020 • edited Loading

GjjvdBurg commented Sep 14, 2020

tahaceritli commented Sep 10, 2020 •

edited

Loading

tahaceritli commented Sep 14, 2020 •

edited

Loading