Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiple header_rows #166

Open
deanm0000 opened this issue Feb 1, 2024 · 3 comments
Open

Implement multiple header_rows #166

deanm0000 opened this issue Feb 1, 2024 · 3 comments

Comments

@deanm0000
Copy link
Contributor

I find that people will often have Excel sheets where they use multiple header rows to make up their column names.

Here's a snippet of how I deal with that coming from a df generated with python_calamine

header_merge_char="_" # for example
header_rows = 3 # again, for example
df.columns=[
    header_merge_char.join([y for y in x if y != ""])
    for x in zip(*[df.rows()[x] for x in range(header_rows)])
]

this is just a snippet and doesn't handle duplicate column names but that's a separate issue.

A more advanced version of this might infer the header_rows by skipping down (let's say) 10 rows and look for types starting there. Then choose a column which isn't a string and go back to the true row=0 and see how many rows down it needs to go before it no longer sees strings. Then that's the inferred header_rows

@deanm0000
Copy link
Contributor Author

In case it helps, I made this function using python-calamine which does header_row detecting

@PrettyWood
Copy link
Member

Hello @deanm0000
@lukapeschke and I are considering adding this to v0.12.0
Do you have a MRE with an excel file + expected output?
Thanks

@deanm0000
Copy link
Contributor Author

example.xlsx

Here's an attached excel file.

I'd expect the output to be (assuming polars)

pl.DataFrame({
    'index':[1,2,3,4,5,6],
    'Red_apple':[3,10,3,5,0,7],
    'Red_banana':[2,4,0,5,4,6],
    'Red_carrot':[7,9,10,5,1,5,],
    'Blue_apple':[6,4,0,3,2,5],
    'Blue_banana':[8,1,7,0,10,4],
    'Blue_carrot':[0,10,5,8,1,10]
})
shape: (6, 7)
┌───────┬───────────┬────────────┬────────────┬────────────┬─────────────┬─────────────┐
│ index ┆ Red_apple ┆ Red_banana ┆ Red_carrot ┆ Blue_apple ┆ Blue_banana ┆ Blue_carrot │
│ ---   ┆ ---       ┆ ---        ┆ ---        ┆ ---        ┆ ---         ┆ ---         │
│ i64   ┆ i64       ┆ i64        ┆ i64        ┆ i64        ┆ i64         ┆ i64         │
╞═══════╪═══════════╪════════════╪════════════╪════════════╪═════════════╪═════════════╡
│ 1     ┆ 3         ┆ 2          ┆ 7          ┆ 6          ┆ 8           ┆ 0           │
│ 2     ┆ 10        ┆ 4          ┆ 9          ┆ 4          ┆ 1           ┆ 10          │
│ 3     ┆ 3         ┆ 0          ┆ 10         ┆ 0          ┆ 7           ┆ 5           │
│ 4     ┆ 5         ┆ 5          ┆ 5          ┆ 3          ┆ 0           ┆ 8           │
│ 5     ┆ 0         ┆ 4          ┆ 1          ┆ 2          ┆ 10          ┆ 1           │
│ 6     ┆ 7         ┆ 6          ┆ 5          ┆ 5          ┆ 4           ┆ 10          │

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants