### GitHub Repositories Dataset
- **Description:** GitHub repository metadata including stars, forks, programming languages, and repository descriptions.
- **Data Source:** [github_repo_for_cosmos.parquet](https://www.dropbox.com/scl/fi/kgdvp6dmp8ppnnmjabjzl/github_repo_for_cosmos.parquet?rlkey=dma2zk9uuzsctsjfevjumbrdg&dl=1)
  - **Potential columns for visualization:**
    - **X & Y Coordinates:** `x`, `y`
    - **Point Size:** `stars` (star count), `forks`
    - **Color:** `primaryLanguage`
    - **Label:** `nameWithOwner`
  - **Related code file:** [github_repos.py](https://github.com/thorwhalen/imbed_data_prep/blob/main/imbed_data_prep/github_repos.py)

## Get data

### Data parameters

In [7]:
ext = '.parquet'
src = 'https://www.dropbox.com/scl/fi/4oidj0wigc1ukjamk34tm/github_repositories.parquet?rlkey=z1zqgef8o1pf2bcxwsryxwnkg&dl=1'
target_filename = 'github_repositories.parquet'

### Install and import

In [8]:
import os
if not os.getenv('IN_COSMO_DEV_ENV'):
    %pip install -q cosmograph tabled cosmodata

import tabled
import cosmodata

from functools import partial 
from cosmograph import cosmo

### Load data

In [9]:
if ext:
    getter = partial(tabled.get_table, ext=ext)
else:
    getter = tabled.get_table
# acquire_data takes care of caching locally too, so next time access will be faster
# (If you want a fresh copy, you can delete the local cache file manually.)
data = cosmodata.acquire_data(src, target_filename, getter=getter)

## Peep at the data

In [10]:
mode = 'short'  #Literal['short', 'sample', 'stats'] = 'short',
exclude_cols = []
cosmodata.print_dataframe_info(data, exclude_cols, mode=mode)

DataFrame shape: (3065063, 27)
First row
------------------------------------------------------------
owner                                            xdedzl
name                               RuntimeTerrainEditor
stars                                                82
forks                                                29
watchers                                              6
isFork                                            False
isArchived                                        False
languages                   C#: 162842, ShaderLab: 1965
languageCount                                         2
topics                                                 
topicCount                                            0
diskUsageKb                                      140632
pullRequests                                          0
issues                                                0
description                                    运行时地形编辑器
primaryLanguage                                      C#
cr

## Visualize data

### Watchers and stars

This visualization aims to show the relationship between the number of watchers and stars each repository has. 

**Warning: This one takes a long time to load (3M points!!), and may make your computer go boom, so choose a sample accordingly...**

In [37]:
cosmo(
    data.head(1_000_000),  # data.head(100_000)
    point_x_by="x",
    point_y_by="y",
    point_size_by="stars",
    point_color_by="primaryLanguage",
    point_label_by="name",
    point_id_by="nameWithOwner",
    point_color_strategy="palette",
    show_labels=True,
    disable_point_size_legend=False,
    disable_point_color_legend=False,
)

Cosmograph(background_color=None, components_display_state_mode=None, disable_point_color_legend=False, disabl…

In [38]:
g = _
type(g)

cosmograph.widget.Cosmograph

In [40]:
g.capture_screenshot()