Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scraping workbook, NOT worksheet with selectables #62

Open
federicoscabbia opened this issue May 9, 2022 · 6 comments
Open

scraping workbook, NOT worksheet with selectables #62

federicoscabbia opened this issue May 9, 2022 · 6 comments

Comments

@federicoscabbia
Copy link

Hi everyone,
I think i have found a tableau that has selectables not in the specific worksheet but in the workbook instead and i do not know how to scrape all the years. I tried this code to select the different years but when i look for the filters it gives back an empty list. Probably the the filtes are set at the workbook level and not at the worksheet level, however 'TableauWorkbook' object has no attribute 'getFilters'.

Could you please help me with that?

This is the table i'm referring to: https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1

And this is the code I used to scrape the data (as you can see i can only scrape the data from 1916 as it is set as the default year in the table).

from tableauscraper import TableauScraper as TS

url= "https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet
    
ws = ts.getWorksheet("tab1")

# show selectable values
print(ws.getSelectableItems())  # does not show years
print(ws.getFilters()) #empty
print(wb.getParameters())  #empty
@djay
Copy link

djay commented Oct 11, 2022

@federicoscabbia I have the same problem. I wrote this code to get around it.
It handles both the case where there is no worksheet (dashboardFilter) and when the filter doesn't list the values but still accepts them.

def force_setFilter(wb, ws_name, columnName, values):
    "setFilter but ignore the listed filter options. also gets around wrong ordinal value which makes index value incorrect"
    # TODO: remove if they fix https://github.com/bertrandmartel/tableau-scraping/issues/50

    scraper = wb._scraper
    tableauscraper.api.delayExecution(scraper)
    ws = next((ws for ws in wb.worksheets if ws.name == ws_name), None)

    filter = next((
        {
            "globalFieldName": t["globalFieldName"],
        }
        for t in (ws.getFilters() if ws is not None else [])
        if t["column"] == columnName
    ), None
    )

    try:
        if filter is None:
            resp =  tableauscraper.api.dashboardFilter(scraper, columnName, [values] if not isinstance(values, list) else values)
        else:
            payload = (
                ("dashboard", scraper.dashboard),
                ("globalFieldName", (None, filter["globalFieldName"])),
                ("qualifiedFieldCaption", (None, columnName)),
                ("membershipTarget", (None, "filter")),
                ("exclude", (None, "false")),
                ("filterValues", (None, json.dumps(values))),
                ("filterUpdateType", (None, "filter-replace"))
            )
            r = scraper.session.post(
                f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/dashboard-categorical-filter',
                files=payload,
                verify=scraper.verify
            )
            scraper.lastActionTime = time.time()

            if r.status_code >= 400:
                raise requests.exceptions.RequestException(r.content)
            resp = r.json()
            errors = [
                res['commandReturn']['commandValidationPresModel']['errorMessage']
                for res in resp['vqlCmdResponse']['cmdResultList']
                if not res['commandReturn'].get('commandValidationPresModel', {}).get('valid', True)
            ]
            if errors:
                wb._scraper.logger.error(str(", ".join(errors)))
                raise tableauscraper.api.APIResponseException(", ".join(errors))

        wb.updateFullData(resp)
        return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp)
    except ValueError as e:
        scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )
    except tableauscraper.api.APIResponseException as e:
        wb._scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )

@david-hammond
Copy link

david-hammond commented Oct 12, 2022

@djay This is perfect, can you help me with this? I tried your function but I think I am using it wrong

from tableauscraper import TableauScraper as TS
import pandas as pd

#copied and pasted you function in

def force_setFilter(wb, ws_name, columnName, values): "setFilter but ignore the listed filter options. also gets around wrong ordinal value which makes index value incorrect" # TODO: remove if they fix https://github.com/bertrandmartel/tableau-scraping/issues/50 scraper = wb._scraper tableauscraper.api.delayExecution(scraper) ws = next((ws for ws in wb.worksheets if ws.name == ws_name), None) filter = next(( { "globalFieldName": t["globalFieldName"], } for t in (ws.getFilters() if ws is not None else []) if t["column"] == columnName ), None ) try: if filter is None: resp = tableauscraper.api.dashboardFilter(scraper, columnName, [values] if not isinstance(values, list) else values) else: payload = ( ("dashboard", scraper.dashboard), ("globalFieldName", (None, filter["globalFieldName"])), ("qualifiedFieldCaption", (None, columnName)), ("membershipTarget", (None, "filter")), ("exclude", (None, "false")), ("filterValues", (None, json.dumps(values))), ("filterUpdateType", (None, "filter-replace")) ) r = scraper.session.post( f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/dashboard-categorical-filter', files=payload, verify=scraper.verify ) scraper.lastActionTime = time.time() if r.status_code >= 400: raise requests.exceptions.RequestException(r.content) resp = r.json() errors = [ res['commandReturn']['commandValidationPresModel']['errorMessage'] for res in resp['vqlCmdResponse']['cmdResultList'] if not res['commandReturn'].get('commandValidationPresModel', {}).get('valid', True) ] if errors: wb._scraper.logger.error(str(", ".join(errors))) raise tableauscraper.api.APIResponseException(", ".join(errors)) wb.updateFullData(resp) return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp) except ValueError as e: scraper.logger.error(str(e)) return tableauscraper.TableauWorkbook( scraper=scraper, originalData={}, originalInfo={}, data=[] ) except tableauscraper.api.APIResponseException as e: wb._scraper.logger.error(str(e)) return tableauscraper.TableauWorkbook( scraper=scraper, originalData={}, originalInfo={}, data=[] )

#then tried to use it in this way for this site

url = "https://public.tableau.com/views/DashboardRegional_15811027307400/DashboardRegional?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=no&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0"

ts = TS()
ts.loads(url)
wb=ts.getWorkbook()
sheetName = "Mapa"

#I tried this using your function
wb = force_setFilter(wb, "Mapa", "Sexo", "Hombre")
ws = wb.getWorksheet(sheetName)

@djay
Copy link

djay commented Oct 12, 2022

I haven't tried it yet but I think you can actually pick any reandom worksheet and do serFilter with dashboardFilter=True and check=False

@djay
Copy link

djay commented Oct 12, 2022 via email

@david-hammond
Copy link

@djay Brilliant, got it working, thanks so much!

@miriamventurini
Copy link

Did anyone try this solution on the tableau from the original question by @federicoscabbia ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants