Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas DataFrame as input #3

Closed
adamov-artem opened this issue Feb 11, 2019 · 1 comment
Closed

Pandas DataFrame as input #3

adamov-artem opened this issue Feb 11, 2019 · 1 comment

Comments

@adamov-artem
Copy link

Hi. Is it possible to run geosketch on a DataFrame object? I need to save row- and colnames of my count matrix.

@brianhie
Copy link
Owner

@adamov-artem you can call to_numpy() to get the count matrix, then do the PCA/geosketch routine, and then use the indices returned by gs() to index into your data frame using iloc().

For example,

X = data_frame.to_numpy()

# PCA.
from fbpca import pca
U, s, Vt = pca(X, k=100) # E.g., 100 PCs.
X_dimred = U[:, :100] * s[:100]

# Sketch.
from geosketch import gs
N = 20000 # Number of samples to obtain from the data set.
sketch_index = gs(X_dimred, N, replace=False)

data_frame_sketch = data_frame.iloc[sketch_index, :]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants