Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Re Python Script widget - pandas dataframe to out_data #2932

Closed
dsanalytics opened this issue Mar 2, 2018 · 11 comments
Closed

Q: Re Python Script widget - pandas dataframe to out_data #2932

dsanalytics opened this issue Mar 2, 2018 · 11 comments

Comments

@dsanalytics
Copy link

Is there a code example converting pandas dataframe to out_data? What I'm trying to accomplish is to get data from MySql using Python Script widget and then pass it to e.g. Data Sampler, Impute, etc.

Thank you.

@kernc
Copy link
Contributor

kernc commented Mar 2, 2018

With a recent version, this should mostly work:

from Orange.data.pandas_compat import table_from_frame

table = table_from_frame(df)

@dsanalytics
Copy link
Author

@kernc Where does one set continuous/discrete and feature/meta/class?
Also, what about index column that may be e.g. datetime or guid? How is that converted in out_data?
If your code indeed suffices, would the last line be out_data = table?

@kernc
Copy link
Contributor

kernc commented Mar 2, 2018

You prepare the columns beforehand on the frame. pd.Categorical or string columns (df.Column.astype(str)) are interpreted as discrete. String columns that aren't interpreted as discretes (if force_nominal=False) are put into metas automatically. Datetime columns are interpreted as TimeVariable. Any index besides a simple range index is converted to a column.

@dsanalytics
Copy link
Author

@kernc And that's exactly why I asked for a full working example from the community, instead of incomplete two-liners with comments like, you need to do X first, followed by Y, and perhaps Z.

P.S. Why don't you change your problematic avatar - imagine one browsing this post at work and a manager passing by. How's your avatar beneficial to Orange?

@kernc
Copy link
Contributor

kernc commented Mar 2, 2018

Beg your pardon? table = table_from_frame(df) is a full, working example. "It just works!" The function is used in one widget in Prototypes and in Timeseries. Unfortunately, no docs other than the docstring exist for the moment. Always welcome to contrib a more helpful example. 😄

@duohappy
Copy link

duohappy commented Apr 2, 2018

@kernc , I can not find out "table_from_frame" function in help documention, https://docs.orange.biolab.si/3/data-mining-library/reference/data.html. In here, I get it.

@ajdapretnar
Copy link
Contributor

It should be imported as:
from Orange.data.pandas_compat import table_from_frame

@ajdapretnar
Copy link
Contributor

For me, this works. @dsanalytics If you think something else should be added, please provide a detailed description.

@pchristian4481
Copy link

table_from_frame does work, but the output when connected to a Data Table widget, does not have the column names? How to include these? I am using the in_data to create something different with different column names (feature names). How can I include these new feature names? Thanks for the help.

@pmirla
Copy link

pmirla commented Apr 17, 2020

Same question as pchristian4481. How do I retain column names? when I link this downstream?
My code:
colnames = [i.name for i in in_data.domain]
df = Y_df.set_axis(colnames, axis=1, inplace=False)
table = table_from_frame(Y_df)
out_data = table

@pmirla
Copy link

pmirla commented Apr 17, 2020

This works. Might help others

import random
from Orange.data import Domain, Table
import numpy as np
import pandas as pd
from Orange.data.pandas_compat import table_from_frame

colnames = [i.name for i in in_data.domain]
arr_in_data = np.array(in_data)
col_to_search = "Date"
col_index = colnames.index(col_to_search)
df = pd.DataFrame(in_data.X)
table = table_from_frame(df)

from Orange.data import Domain, Table
domain = Domain([attr for attr in in_data.domain.attributes
if attr.is_continuous or len(attr.values) <= 5],
in_data.domain.class_vars)

out_data = Orange.data.Table(domain, df, in_data.Y)

out_data = Table(domain, out_data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants