Unwanted row number column appears in csv / excel while scheduling in email. #22981

rohitpawar2811 · 2023-02-03T13:55:28Z

Hello

I am scheduling a chart(in table form) in email as csv form , I am getting an extra column which has a count from 0 to n-row .
Which I should not want.(And I think no one wants in CSV because we get numbering in csv opening platforms)

Like :

Findings :

When I directly download csv from chart I didn't getting that extra one column
And one scenario : In the case of Email it avoids the extra column add in csv only if the chart is not in table form.
Exactly I would say when the chart is in the form of bar/any other visualization except table then it will not add an extra column in csv while sending in email.

Using : superset-2.0.0-dev

My-Question :

It is intentionally did ?
Or any thing I am forgetting to set by which i can avoid this extra-column(with numbering)
Or this thing is resolved in the latest releases?

@michael-s-molina @codyml

rohitpawar2811 · 2023-02-06T11:04:51Z

def get_data(self, df: pd.DataFrame) -> Union[str, List[Dict[str, Any]]]:
        if self._query_context.result_format == ChartDataResultFormat.CSV:
            include_index = not isinstance(df.index, pd.RangeIndex)
            columns = list(df.columns)
            verbose_map = self._qc_datasource.data.get("verbose_map", {})
            if verbose_map:
                df.columns = [verbose_map.get(column, column) for column in columns]
            result = csv.df_to_escaped_csv(
                df, index=include_index, **config["CSV_EXPORT"]
            )
            return result or ""

        return df.to_dict(orient="records")

I have tracked the issue but I am here stuck that both in the case of
csv -direct-download
csv -email
both execute this function and genrate csv which has no index-colmn because Include_index is always false here
but some-how in case of csv-email from any else method index column is created again

But got nothing , any suggestion realted to this bug

rohitpawar2811 · 2023-02-07T09:12:57Z

@eschutho
This bug is created because of PostProcessing logic while downloading csv througth email-scheduling.

Can we stop PostProcessing from setting or from any other thing without modifing inside the code.?
@rusackas
I have to ask that is it really bug,Can I make pull request for removing index column in email-csv also
By manipulating postProcessing logic.

#Post-process the data so it matches the data presented in the chart. # This is needed for sending reports based on text charts that do the # post-processing of data, eg, the pivot table. if result_type == ChartDataResultType.POST_PROCESSED: result = apply_post_process(result, form_data,datasource)

rohitpawar2811 · 2023-02-07T10:40:16Z

Proposed Changes : It will facilitate a user user to chose that it wants index in csv or not while emailing through
CSV_INDEX={"index":False} //In superset_config.py
Changes did In superset/charts/post_processing.py
1.

from superset import (
    app
)

config = app.config

2.
(From this to ) processed_df.to_csv(buf)-> processed_df.to_csv(buf, **config["CSV_INDEX"])

def apply_post_process(
    result: Dict[Any, Any],
    form_data: Optional[Dict[str, Any]] = None,
    datasource: Optional["BaseDatasource"] = None,
) -> Dict[Any, Any]:
    form_data = form_data or {}

    viz_type = form_data.get("viz_type")
    if viz_type not in post_processors:
        return result

    post_processor = post_processors[viz_type]

    for query in result["queries"]:
        if query["result_format"] == ChartDataResultFormat.JSON:
            df = pd.DataFrame.from_dict(query["data"])
        elif query["result_format"] == ChartDataResultFormat.CSV:
            df = pd.read_csv(StringIO(query["data"]))
        else:
            raise Exception(f"Result format {query['result_format']} not supported")

        processed_df = post_processor(df, form_data, datasource)

        query["colnames"] = list(processed_df.columns)
        query["indexnames"] = list(processed_df.index)
        query["coltypes"] = extract_dataframe_dtypes(processed_df, datasource)
        query["rowcount"] = len(processed_df.index)

        # Flatten hierarchical columns/index since they are represented as
        # `Tuple[str]`. Otherwise encoding to JSON later will fail because
        # maps cannot have tuples as their keys in JSON.
        processed_df.columns = [
            " ".join(str(name) for name in column).strip()
            if isinstance(column, tuple)
            else column
            for column in processed_df.columns
        ]
        processed_df.index = [
            " ".join(str(name) for name in index).strip()
            if isinstance(index, tuple)
            else index
            for index in processed_df.index
        ]

        if query["result_format"] == ChartDataResultFormat.JSON:
            query["data"] = processed_df.to_dict()
        elif query["result_format"] == ChartDataResultFormat.CSV:
            buf = StringIO()
           **#Changed Line**
            processed_df.to_csv(buf, **config["CSV_INDEX"])
            buf.seek(0)
            query["data"] = buf.getvalue()

    return result

eschutho · 2023-02-09T01:00:30Z

Thank you @Rohit-pawar902 for the issue and also the fix! We can implement what you suggested here, but you're also welcome to put up your own PR if you like.

…l which is caused by post_processing.py and now it is Fixed (\apache#22981)

rohitpawar2811 · 2023-02-22T12:28:25Z

PR : #23155

Antonio-RiveroMartnez · 2023-02-22T16:49:44Z

Hey @Rohit-pawar902 ! Thanks for addressing the issue 👏 . I do have a couple of thoughts regarding this issue and how to address it though:

There's a bug reported already for the extra column in the CSV: Not possible to email a raw csv via "Alerts and Reports" #21568 , so, not sure if we want to treat this as an opt in/out config? I mean, if it's treated as a bug, you shouldn't be able to opt out from the fix.
If we decided to go with the opt in/out, should we instead use a Feature Flag instead? Since there's no extra configs in it just a True/False. Or do we see more configs being added here?

rohitpawar2811 · 2023-02-24T09:40:39Z

@Antonio-RiveroMartnez
Yep I am convinced with your 1 St statement that it's bug and we don't have to give opt out thing.

Actually previously I am not sure that it's really a bug or it is intentionally written like that , keeping this in mind I gave the opt in/out , so that someone wants the raw CSV or with index they can do that

Now I would make changes keeping you issue in mind and treating as bug.

rohitpawar2811 · 2023-02-24T09:46:57Z

@Antonio-RiveroMartnez
Can you help me that is I have to create new PR ? for changes(in which option in/out not there directly we would make index= false ) ,Or I can change the current PR

Antonio-RiveroMartnez · 2023-02-24T13:33:29Z

Hey @Rohit-pawar902 ! Yes, you can make your changes in the same PR, there's no need for a new one, make sure tho, you update whatever feels necessary in the PR Template to fit your new scope, i.e, title, description, etc.

rusackas · 2024-02-21T23:25:38Z

@rohitpawar2811 / @Antonio-RiveroMartnez This one seems to have gotten away from it. Is there any work left to be done on the linked PR that would close this issue?

rusackas · 2024-07-09T17:54:05Z

It looks like the PR by @SkinnyPigeon is close to what we want here, but might need a volunteer to move the choice for including/excluding row indices from a config variable to a report modal config option.

rohitpawar2811 added the #bug Bug report label Feb 3, 2023

rusackas assigned eschutho Feb 3, 2023

rusackas added the alert-reports Namespace | Anything related to the Alert & Reports feature label Feb 3, 2023

rohitpawar2811 added a commit to rohitpawar2811/supersetMaster that referenced this issue Feb 22, 2023

fix : We are geeting an extra column in csv while sending csv on emai…

d3753d0

…l which is caused by post_processing.py and now it is Fixed (\apache#22981)

rohitpawar2811 mentioned this issue Feb 22, 2023

fix: Extra column in csv(or non-raw csv) get while sending on email #23155

Closed

9 tasks

rusackas mentioned this issue Feb 14, 2024

Not possible to email a raw csv via "Alerts and Reports" #21568

Closed

1 task

SkinnyPigeon mentioned this issue Jun 23, 2024

feat(reports): adding flag for removing the index on the report CSV #29335

Open

9 tasks

sfirke changed the title ~~Redundant column problem comes in csv while scheduling in email.~~ Unwanted row number column appears in csv / excel while scheduling in email. Jul 26, 2024

sfirke mentioned this issue Jul 26, 2024

Unuseful column in export Excel #29717

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unwanted row number column appears in csv / excel while scheduling in email. #22981

Unwanted row number column appears in csv / excel while scheduling in email. #22981

rohitpawar2811 commented Feb 3, 2023 •

edited

Loading

rohitpawar2811 commented Feb 6, 2023 •

edited

Loading

rohitpawar2811 commented Feb 7, 2023 •

edited

Loading

rohitpawar2811 commented Feb 7, 2023 •

edited

Loading

eschutho commented Feb 9, 2023

rohitpawar2811 commented Feb 22, 2023

Antonio-RiveroMartnez commented Feb 22, 2023 •

edited

Loading

rohitpawar2811 commented Feb 24, 2023 •

edited

Loading

rohitpawar2811 commented Feb 24, 2023

Antonio-RiveroMartnez commented Feb 24, 2023 •

edited

Loading

rusackas commented Feb 21, 2024

rusackas commented Jul 9, 2024

Unwanted row number column appears in csv / excel while scheduling in email. #22981

Unwanted row number column appears in csv / excel while scheduling in email. #22981

Comments

rohitpawar2811 commented Feb 3, 2023 • edited Loading

rohitpawar2811 commented Feb 6, 2023 • edited Loading

rohitpawar2811 commented Feb 7, 2023 • edited Loading

rohitpawar2811 commented Feb 7, 2023 • edited Loading

eschutho commented Feb 9, 2023

rohitpawar2811 commented Feb 22, 2023

Antonio-RiveroMartnez commented Feb 22, 2023 • edited Loading

rohitpawar2811 commented Feb 24, 2023 • edited Loading

rohitpawar2811 commented Feb 24, 2023

Antonio-RiveroMartnez commented Feb 24, 2023 • edited Loading

rusackas commented Feb 21, 2024

rusackas commented Jul 9, 2024

rohitpawar2811 commented Feb 3, 2023 •

edited

Loading

rohitpawar2811 commented Feb 6, 2023 •

edited

Loading

rohitpawar2811 commented Feb 7, 2023 •

edited

Loading

rohitpawar2811 commented Feb 7, 2023 •

edited

Loading

Antonio-RiveroMartnez commented Feb 22, 2023 •

edited

Loading

rohitpawar2811 commented Feb 24, 2023 •

edited

Loading

Antonio-RiveroMartnez commented Feb 24, 2023 •

edited

Loading