# CNCF Landscape: {Category, Subcategory, Project } --> Stars, Commits, Contributors, Activity

On JupyterLab Shell Integration(s) and SList

More Info: _http://safaribooksonline.com/blog/2014/02/12/using-shell-commands-effectively-ipython_

---

> SList instances can be used like a regular list, but they provide several methods that are useful when working with shell output. The main properties available in an SList instance are:
> 
> * `.s` returns the elements joined together by spaces. 
>   * _This is useful for building command lines that take many arguments in a single invocation._
> * `.n` returns the elements joined together by a newline. 
>   * _Use this when you need the original output unmodified._
> * `.p` returns the elements as path objects, if they are filenames.
>   * _Use this when doing more advanced path manipulation_
> 
> In addition, SList instances support `grep()` and `fields()` methods. 

---

In [1]:
!pip list

Package                            Version
---------------------------------- --------------
adbc-driver-manager                1.2.0
adbc-driver-postgresql             1.2.0
aiofiles                           22.1.0
aiohappyeyeballs                   2.4.3
aiohttp                            3.11.0rc1
aiosignal                          1.3.1
aiosqlite                          0.20.0
altair                             5.4.1
altair-data-server                 0.4.1
altair-saver                       0.5.0
altair-viewer                      0.4.0
annotated-types                    0.7.0
anyio                              4.6.2.post1
appnope                            0.1.4
argon2-cffi                        23.1.0
argon2-cffi-bindings               21.2.0
arrow                              1.3.0
astroid                            3.3.5
asttokens                          2.4.1
asyncpg                            0.30.0
attrs                              24.2.0
autopep8                      

In [2]:
%load_ext jupyter_ai_magics

### Base imports and variables

In [3]:
import sys

In [4]:
# all generated output files land here
OUT_DIR='generated'

# TODO: factor out landscape ('cncf') so this can be used for landscape(s) generically (https://landscapes.dev) 
 
CNCF_LANDSCAPE_FNAME_BASE='cncf-landscape'
CNCF_LANDSCAPE_FNAME_ROOT=f'{OUT_DIR}/{CNCF_LANDSCAPE_FNAME_BASE}'

CNCF_PROJECTS_FNAME_BASE=f'cncf-projects'
CNCF_PROJECTS_FNAME_ROOT=f'{OUT_DIR}/{CNCF_PROJECTS_FNAME_BASE}'

print(f'Jupyter Kernel (venv): {sys.executable}')
print(f'Output Location:       {OUT_DIR}  (.json, .jsonl, .csv, .md, .svg, .png, ...)')
print(f'Output Landscape root: {CNCF_LANDSCAPE_FNAME_ROOT}')
print(f'Output Projects  root: {CNCF_PROJECTS_FNAME_ROOT}')

Jupyter Kernel (venv): /Users/matt/gh/cncf/landscape-graph/.venv-ipynb/bin/python3
Output Location:       generated  (.json, .jsonl, .csv, .md, .svg, .png, ...)
Output Landscape root: generated/cncf-landscape
Output Projects  root: generated/cncf-projects


### Create human friendly JSON (.json) and data friendly JSON Lines (.jsonl) from current landcape

In [5]:
!mkdir -p {OUT_DIR}

# note: items.json is now (as of sometime in 2024) actually an HTML file with the JSON embedded in the <SCRIPT> tag 
!wget -O {CNCF_LANDSCAPE_FNAME_ROOT}.json.compact.html https://landscape.cncf.io/data/items.json
!ls -lh {CNCF_LANDSCAPE_FNAME_ROOT}.json.compact.html

--2024-11-11 18:24:34--  https://landscape.cncf.io/data/items.json
Resolving landscape.cncf.io (landscape.cncf.io)... 13.249.190.103, 13.249.190.96, 13.249.190.11, ...
Connecting to landscape.cncf.io (landscape.cncf.io)|13.249.190.103|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 585563 (572K) [text/html]
Saving to: ‘generated/cncf-landscape.json.compact.html’


2024-11-11 18:24:35 (19.3 MB/s) - ‘generated/cncf-landscape.json.compact.html’ saved [585563/585563]

-rw-r--r--@ 1 matt  staff   572K Nov 11 15:06 generated/cncf-landscape.json.compact.html


In [6]:
pip install beautifulsoup4 lxml

[0mNote: you may need to restart the kernel to use updated packages.


In [11]:
from bs4 import BeautifulSoup
import json
import re
import os

# Set base directory for saving JSON files
base_dir = "generated"
os.makedirs(base_dir, exist_ok=True)  # Ensure the base directory exists

# Load the HTML file
with open(f'{base_dir}/cncf-landscape.json.compact.html', 'r') as file:
    content = file.read()

# Parse HTML with BeautifulSoup
soup = BeautifulSoup(content, 'lxml')

# Find all script tags and loop through them
script_tags = soup.find_all('script')
for script_tag in script_tags:
    if script_tag.string:
        # Find all window.<variable> declarations in the script content
        variables = re.findall(r'window\.(\w+)\s*=\s*({.*?});', script_tag.string, re.DOTALL)
        
        for variable_name, json_text in variables:
            # Clean up the JSON text (remove any trailing semicolon or extra whitespace)
            json_text = json_text.strip().rstrip(';')
            
            # Write the JSON to a separate file within the base directory
            try:
                data = json.loads(json_text)  # Validate JSON
                filename = os.path.join(base_dir, f'{variable_name}.json')
                with open(filename, 'w') as json_file:
                    json.dump(data, json_file, indent=4)
                print(f"Extracted JSON for '{variable_name}' saved to '{filename}'")
            except json.JSONDecodeError as e:
                print(f"JSON decoding failed for '{variable_name}':", e)


Extracted JSON for 'baseDS' saved to 'generated/baseDS.json'
Extracted JSON for 'statsDS' saved to 'generated/statsDS.json'


In [None]:
# create human friendly file
!jq . {CNCF_LANDSCAPE_FNAME_ROOT}.json.compact > {CNCF_LANDSCAPE_FNAME_ROOT}.json
!ls -lh {CNCF_LANDSCAPE_FNAME_ROOT}.json*
!echo "\n*Yes* indeed, that's 2+ MB of whitespace!\n"

In [None]:
# array of JSON --> JSONL
!jq  -c '.[]'  {CNCF_LANDSCAPE_FNAME_ROOT}.json.compact >  {CNCF_LANDSCAPE_FNAME_ROOT}.jsonl
!ls -lh {CNCF_LANDSCAPE_FNAME_ROOT}.jsonl
!wc -l  {CNCF_LANDSCAPE_FNAME_ROOT}.jsonl

### Filter Landscape: ~2200+ cards (cncf-landscape.jsonl) -->  ~180 CNCF Projects (cncf-projects.jsonl) 

In [None]:
!ls -lahF {CNCF_LANDSCAPE_FNAME_ROOT}.jsonl
!wc -l    {CNCF_LANDSCAPE_FNAME_ROOT}.jsonl
!echo ""

!set -x && jq -c 'select(.relation == "graduated" or .relation == "incubating" or .relation == "sandbox")' {CNCF_LANDSCAPE_FNAME_ROOT}.jsonl > {CNCF_PROJECTS_FNAME_ROOT}.jsonl 

!echo ""
!ls -lahF {CNCF_PROJECTS_FNAME_ROOT}.jsonl
!wc -l {CNCF_PROJECTS_FNAME_ROOT}.jsonl