Power BI Search Engine - Core

Power BI Search Engine - Core is a tool that allow you to crawl and scrap data from Microsoft Power BI reports and dashboards. Its goal is to provide data to build a search engine app to replace the default search engine of Power BI, which is based on the report name exclusively, not their content.

It has multiple features:

Workspace Crawling: list all reports and dashboards of a workspace
Content Crawling: list all pages of a report
Content Scraping: extract data from a page and determine tags
Content Screenshot: take a screenshot of a page
Data Exporter: export all the collected data as a JSON file (can be used with Microsoft Power Automate and Microsoft Power Apps to build a search engine app)

Installation

Requirements

Node.js
Power BI account
Windows 10
Unique Microsoft account for both Power BI and Windows 10

From Source

Clone or download this repository
Run npm i or npm ci (or equivalent) to install dependencies
Run npm run test-login to test your login credentials
Run npm run start to start the program

Usage

Configuration

Here is an example of the config.js file:

const config = {
    language: "english",
    /*
    pbiLogin is the login of the user that will be used to connect to Power BI
    It must be a valid login for the tenant
    The default value is the current user on the local machine
    ⚠️ - You need to set the domain name
    📝 - You can test if the login is valid by running `npm run test-login` in a terminal 
    */
    pbiLogin: `${require("os").userInfo().username}@domain.com`,
    /*
    uploadLocation is a path to a folder where the exports will be saved
    It musts have two subfolders: "Exports" and "Anomalies"
    */
    uploadLocation: "C:\\Existing\\Path\\For\\Exports",
    processes: [
        ["workspace-id-for-process-1", "workspace-id-for-process-1", "workspace-id-for-process-1"],
        ["workspace-id-for-process-2"],
        /*
        [
            "workspace-id-for-process-3",
        ]
        */
    ],
    /*
    lookFor is an array of CSS selectors that will be used to find the elements that will be read
    The text content of the resulting elements will then be converted to keywords
    */
    lookFor: [
        ".textbox p span.textRun",
        ".slicer-header-text",
        ".preTextWithEllipsis",
        ".columnHeaders div div .pivotTableCellWrap.cell-interactive.tablixAlignCenter",
        "[role=columnheader].pivotTableCellWrap",
        ".xAxisLabel",
        ".yAxisLabel",
        ".headerText .headerTitleWrapper .displayText",
    ],
};

module.exports = config;

📝 - No need to copy and paste the example above ! Instead, duplicate the config.example.js file and rename it to config.js.

🔐 - The config.js file is ignored by Git, so you can safely modify it without worrying about sharing anything sensitive.

⚠️ - It is recommended to use the npm run test-login command to check if your login credentials are valid before starting the program.

Run

Make sure you have a valid config.js file
Run npm run start to start the program
The program will continually log to inform you of its progress
⚠️ - Some reports and dashboards may take a long time to be processed, but the program is not stuck nonetheless.
When the program is done, the program will log the time it took to process all the reports and dashboards

Tips

I've been using this tool for a while now, and I've found that the following tips can help you get the best results:

You might want to check regularly for old reports and dashboards that are no longer used, and delete them, to avoid wasting time processing them.
Do the same thing where you use the collected data. With time, you might have a lot of data that is no longer relevant, for example deleted reports and dashboard that are still in the database. Deleting them will help you keep your search engine app as fast as possible.
After the first few runs, try to optimize your processes by having a similar amount of pages to explore in each of them. I achieved this by using an Excel sheet to keep track of the number of pages in each report and dashboard, then I sorted them by number of pages, then distributed them in processes of roughtly the same page count.
Don't ignore errors and anomalies. They can help you improve your data availability and quality. If you ignore errors and anomalies, you might end up with a lot of missing data, which will make your search engine app less useful.

Credits

Project made by :

@NeoOniX

Powered by :

Node.js

Built for :

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
src		src
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.prettierrc.js		.prettierrc.js
.puppeteerrc.cjs		.puppeteerrc.cjs
LICENSE		LICENSE
README.md		README.md
config.example.js		config.example.js
jsconfig.json		jsconfig.json
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Power BI Search Engine - Core

Table of Contents

Installation

Requirements

From Source

Usage

Configuration

Run

Tips

Credits

About

Releases 1

Languages

License

NeoOniX/PowerBI-SE-Core

Folders and files

Latest commit

History

Repository files navigation

Power BI Search Engine - Core

Table of Contents

Installation

Requirements

From Source

Usage

Configuration

Run

Tips

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages