This project standardizes and adds value to usage data about site content and users. Final output is two JSON files optimized for use in data visualization software.
- Prerequisites
- Overview
- Purpose 1: Identify site pages with zero views
- Purpose 2: Add metadata to site pages
- Purpose 3: Gather data by individual page view
- Steps
- Notes
One site generated by DocFx per guide on a local machine (or build pipeline), e.g.:
source/
product-guide-1/
articles/
page-1.md
page-2.md
templates/
docfx.json
index.md
Etc.
product-guide-2/
articles/
page-1.md
page-2.md
templates/
docfx.json
index.md
Etc.
product-guide-2a/
articles/
page-1.md
page-2.md
templates/
docfx.json
index.md
Etc.
index.md
Running DocFx on each directory produces:
source/
product-guide-1/
_site/
articles/
page-1.html
page-2.html
index.html
product-guide-2/
_site/
articles/
page-1.html
page-2.html
index.html
product-guide-2a/
_site/
articles/
page-1.html
page-2.html
index.html
Contents of each local directory are published to a corresponding directory in an Azure Blob (nested directories are supported), e.g.:
Azure Storage Account > Container > Blob:
blob/
product-guide-1/
articles/
page-1.html
page-2.html
index.html
product-guide-2/
articles/
page-1.html
page-2.html
index.html
product-guide-2a/
articles/
page-1.html
page-2.html
index.html
index.html
Files in the Azure Blob are published to the live site following the Azure Blob's directory structure, e.g.:
www.<your site>.com/index.html
www.<your site>.com/product-guide-1/index.html
www.<your site>.com/product-guide-1/articles/page-1.html
www.<your site>.com/product-guide-1/articles/page-2.html
www.<your site>.com/product-guide-2/index.html
www.<your site>.com/product-guide-2/articles/page-1.html
www.<your site>.com/product-guide-2/articles/page-2.html
www.<your site>.com/product-guide-2/product-guide-2a/articles/page-1.html
www.<your site>.com/product-guide-2/product-guide-2a/articles/page-2.html
www.<your site>.com/product-guide-2/product-guide-2a/index.html
- JavaScript code for Azure Application Insights tracking at the bottom of each site page
- To do this, add the code snippet in a
scripts.tmpl.partialfile in a custom DocFx template
- To do this, add the code snippet in a
- DocFx
- jq
- Azure Storage Account and Blob
- The Azure Blob's:
- Storage Account name
- Account key
- Container name
- Site root URL
Azure Application Insights provides data only site pages with >=1 view, e.g.:
url,name,Ocurrences
"https://<your site>.com/product-guide-1/index.html","Welcome",1600
"http://localhost:3000/product-guide-1/index.html","Welcome",1550
"https://<your site>.com/product-guide-1/page-1.html","Settings",1425
"https://<your site>.com/product-guide-1/page-2.html","Configuration | Articles",1420
"https://<your site>.com/product-guide-2/page-1.html","Glossary",1415
"https://<your site>.com/product-guide-2/page-1.html","Glossary - docs.<your site>.com",1413
"http://localhost:7000/product-guide-2/page-1.html","Glossary",1412
"https://<your site>.com/product-guide-2/page-2.html?q=admin%20console","Administration",1400
"https://<your site>.com/product-guide-1/page-1.html#options","Settings",1390
"https://<your site>.com/product-guide-2/product-guide-2a/page-1.html","Intro",1375
"https://<your site>.com/product-guide-2/page-3.html","Security, permissions, and identification",1350It can be helpful to know which pages have zero views.
The solution is to compare a (1) list of all pages that exist with a (2) list of pages tracked by Azure Application Insights. Any page not in both lists is assumed to have zero views.
When a DocFx site is generated locally (e.g., <local machine>/product-guide-1/_site/**), an index.json file is produced in _site. index.json lists all pages in the site.
index.json is deployed to the Azure Blob with the rest of the site (e.g., Azure Storage Account > Container > Blob > product-guide-1/index.json.
A Kusto query (kusto_queries/content.kusto) returns a list of pages with >=1 view. It also returns each page's view count, name, and URL. See step 2 below.
usage.sh uses the Azure CLI to download the index.json file from each directory (subsite) in the Azure Blob.
First, it combines each directory's (and any subdirectory's) index.json files into one JSON file (list 1). Next, the CSV exported from Azure Application Insights (list 2) is transformed to a JSON file. Finally, the script compares the two JSON files on URL, which is included in both lists. Pages in list 1 not included in list 2 are assumed to have zero views.
Before the comparison, the script also sums the value of page views for duplicate pages and addresses other irregularities.
| page | views |
|---|---|
| page-1 | 1300 |
| page-2 | 1200 |
| page-2 | 300 |
| page-4 | 200 |
| page |
|---|
| page-1 |
| page-2 |
| page-3 |
| page-4 |
| page | views |
|---|---|
| page-1 | 1300 |
| page-2 | 1500 |
| page-3 | 0 |
| page-4 | 200 |
Azure Application Insights data includes data for duplicate pages. usage.sh combines duplicate pages and sums their views.
Pages reached via search results and anchors are combined and view counts summed, e.g. (for a unique guide):
- Search:
page-1.html(800 views) andpage-1?=settings(100 views) are merged and becomepage-1.html(900 views) - Anchor:
page-2.html(700 views) andpage-2.html#anchor(50 views) are merged and becomepage-2.html(750 views)
Pages with different names that are actually the same page are combined and view counts summed, e.g.:
| Page name in export | Views |
|---|---|
About Product 1 |
100 |
About Product 1 - https://docs.<your site>.com/ |
30 |
About Product 1 - docs.<your site>.com/ |
200 |
About Product 1 | Articles |
50 |
About Product 1 |Articles |
10 |
becomes
| Page name in report | Views |
|---|---|
About Product 1 |
390 |
usage.sh consolidates views of a subsite's landing page. Some have a blank URL, others have a title of index.html, etc.
usage.sh removes pages not on the docs.<your site>.com domain, e.g.:
| Domain in export | Views |
|---|---|
docs.<your site>.com |
1000 |
http://127.0.0.1 |
200 |
0.0.0.0 |
100 |
localhost |
500 |
C:/Users |
100 |
becomes
| Domain in export | Views |
|---|---|
docs.<your site>.com |
1000 |
- Standardizes the case of page names, which is different for list 1 and list 2
- Removes commas in page names
Reporting can be enhanced when pages are categorized by guide names, tags, type of documentation (how-to, about, API reference, API conceptual, etc.), etc. It can also help to identify which pages are landing pages, etc.
If the path in the hrefFull key in content.json (object 1) includes the string in the path key in contentKey.json (object 2), all keys in object 2 are added to object 1.
[
{
"path": "product-guide-1/",
"guide": "product-guide-1",
"tag": "how-to"
},
{
"path": "product-guide-2/articles/home.html",
"guide": "product-guide-2",
"tag": "home-page"
},
{
"path": "/api/",
"guide": "api",
"restApiSubGuide": identity,
"tag": "restApireference"
}
][
{
"guide": "product-guide-1",
"hrefFull": "https://docs.<your site>.com/product-guide-1/articles/page-1.html",
"hrefSimple": "articles/product-guide-1/page-1.html",
"hrefSubsite": "product-guide-1",
"name": "page 1",
"tag": "how-to",
"views": 780
},
{
"guide": "product-guide-1",
"hrefFull": "https://docs.<your site>.com/product-guide-1/articles/page-2.html",
"hrefSimple": "articles/product-guide-1/page-2.html",
"hrefSubsite": "product-guide-1",
"name": "page 2",
"tag": "how-to",
"views": 625
},
{
"guide": "product-guide-2",
"hrefFull": "https://docs.<your site>.com/product-guide-2/articles/home.html",
"hrefSimple": "product-guide-2/articles/home.html",
"hrefSubsite": "product-guide-2",
"name": "welcome",
"tag": "home-page",
"views": 550
},
{
"guide": "api",
"hrefFull": "https://docs.<your site>.com/api/identity/ping.html",
"hrefSimple": "api/identity/ping.html",
"hrefSubsite": "api",
"name": "Ping",
"restApiSubGuide": identity,
"tag": "restApireference",
"views": 410
},
{
"guide": "api",
"hrefFull": "https://docs.<your site>.com/api/identity/feature-toggles.html",
"hrefSimple": "api/identity/feature-toggles.html",
"hrefSubsite": "api",
"name": "feature toggles",
"restApiSubGuide": identity,
"tag": "restApireference",
"views": 300
}
]* guide vs. subsite: guide is useful if the site directory setup is .../subsite/articles/product-guide-2/page-1.html.
In Azure Application Insights, a Kusto query (kusto_queries/users.kusto) returns the following data on each page view, e.g.:
"timestamp [UTC]",name,url,"user_Id",duration,"client_City","client_StateOrProvince","client_CountryOrRegion","client_Browser","client_OS","session_Id",itemType,"operation_Id",performanceBucket,"count_sum"
"10/30/2022, 7:54:05.267 PM","Page 1","https://docs.<your site>.com/product-guide-1/articles/page-1.html",abcd,511,Auburn,Washington,"United States","Chrome 106.0","Mac OS X 10.15","abc1234",pageView,abc1234,"500ms-1sec",1
"10/29/2022, 10:50:00.100 AM","Page 1","https://docs.<your site>.com/product-guide-1/articles/page-1.html",abcd,520,Cleveland,Ohio,"United States","Chrome 105.0","Windows 10","abc1234",pageView,abc1234,"500ms-1sec",1
"10/29/2022, 10:48:00.001 AM","Ping","https://docs.<your site>.com/api/identity/ping.html",abcd,520,Tokyo,,Japan,"Firefox 106.0","Android","abc1234",pageView,abc1234,"3sec-7sec",1usage.sh resolves the following issues in the export:
Similar to the method using contentKey.json to add metadata to pages, a browserKey.json file standardizes browser names based on specified strings, e.g.:
[
{
"browserKey": "Edg",
"browserGeneral": "Edge"
},
{
"browserKey": "Mobile Safari",
"browserGeneral": "Safari Mobile"
}
]If the browser reported by Azure Application Insights is Edg, usage.sh identifies from osBrowser.json that Edg includes the string Edg (as named in osBrowser) and adds an osBrowser category.
The Azure Application Insights export may include operating systems like:
Windows 8.1Windows 10Mac OS X 10.13Mac OS X 10.15iOS 15.6iOS 15.7LinuxAndroid- Etc.
It can be helpful to understand the operating systems used on a brand level instead of a detailed version.
A osKey.json file reduces the longer version name to a brand name based on a string in the version name, e.g.:
[
{
"osKey": "Windows",
"osGeneral": "Windows"
},
{
"osKey": "Mac OS",
"osGeneral": "Mac OS"
}
]
If the operating system reported by Azure Application Insights is Windows 10, usage.sh identifies from osKey.json that Windows 10 includes the string Windows (as named in osKey) and adds an osGeneral category.
If a guide name changes by way of the directory path, this step adds a value to join on later in analytics software. For example, if Product ABC is renamed to Product DEF, the directory name may change accordingly from /ABC to /DEF. That means the URL for a page in the first directory changes from https://docs.<your site>.com/product-abc/about.html to https://docs.<your site>.com/product-def/about.html. Join on joinPath in both content.json or users.json to see views or other data combined from both pages.
guideKey.json:
[
{
"existingPath": "old-guide-name-in-path-1",
"joinPath": "new-guide-name-in-path-1"
},
{
"existingPath": "old-guide-name-in-path-2",
"joinPath": "new-guide-name-in-path-2"
},
]- Go to Azure Application Insights >
Logs. - Copy and paste the contents of
kusto_queries/content.kustointo the query field. - Click
Run. - Export all rows as a CSV file to a local
azure_exportsdirectory. - Rename the exported file according to the table below.
- Repeat steps for
kusto_queries/users.kusto.
| Kusto query | Rename the export to |
|---|---|
| kusto_queries/content.kusto | content.csv |
| kusto_queries/users.kusto | users.csv |
- In
usage.sh, define the empty variables, e.g.:
siteUrl="" #Example: docs.<my site>.com
storageAccountName="" #Azure Storage account name
accountKey="" #Azure Storage account key
containerName="" #Azure Storage container name
- Execute the script locally:
./usage.sh
content.jsonusers.json
The process could be streamlined by creating a continuous export of Azure Application Insights data, storing it in a blob, and executing this Bash script inside an Azure DevOps pipeline YAML file.