# DAO Community Git Hosting Platform Survey Report Data Generator using Python-Polars in Google Environment
---
## Bar Diagram Generator using Polars


![](https://img.shields.io/badge/Version%201.0.0-333333?style=for-the-badge)![](https://img.shields.io/badge/Made%20with-808080?style=for-the-badge)[![](https://img.shields.io/badge/Google%20Colaboratory-4d4d4d?style=for-the-badge&logo=googlecolab)](https://docs.jupyter.org/en/latest/)![](https://img.shields.io/badge/And-808080?style=for-the-badge)[![](https://img.shields.io/badge/Python%203.10.12-306998?style=for-the-badge&logo=Python&logoColor=FFD43B)](https://docs.python.org/3.10/)[![](https://img.shields.io/badge/Polars%200.17.3-FFD43B?style=for-the-badge&logo=Polars&logoColor=306998)](https://docs.python.org/3.11/)

![](https://img.shields.io/badge/Repo-808080?style=for-the-badge)[![](https://img.shields.io/badge/GitHub-6E5494?style=for-the-badge&logo=GitHub)](https://github.com/joshua-lagasca/DAO-Community-Git-Hosting-Platform-Survey---Google-Environment)

In [1]:
from __future__ import annotations

# Mount Drive

In [2]:
from pathlib import Path

from google.colab import drive

mount_point: Path = Path("/gdrive")

drive.mount(mountpoint=str(mount_point.resolve()), force_remount=True)

Mounted at /gdrive


In [3]:
base_path: Path = (
    mount_point
    / "MyDrive"
    / "Survey"
    / "DAO Community Git Hosting Platform Survey - Google Environment"
)
base_path.mkdir(parents=False, exist_ok=True)

output_data_path: Path = base_path / "Data"
output_data_path.mkdir(parents=False, exist_ok=True)

In [4]:
## NOTE: import-ipynb cannot work with notebooks in Google Drive, thus the workaround below.
type_objects_module = base_path / "Generator" / "Type Objects Polars.ipynb"

if type_objects_module.exists():
    type_objects_module: str = f"{type_objects_module}"
    %run -n "$type_objects_module"
    """Creates the ff:
        Eyears_of_experience,
        Egit_hosting_platform,
        Ecareer_level,
        Edao_pillar,
        Epast_next,
        Epast_next_all,
        TDcolumns,
        df_columns_dtypes_dict
    """
else:
    print(f"Module '{type_objects_module}' does not exist.")

In [5]:
%%capture --no-stderr
!pip show icecream 1>/dev/null; \
[ $? != 0 ] && { pip install icecream; };
from icecream import ic

In [7]:
import polars as pl

df: pl.dataframe.frame.DataFrame = pl.read_parquet(
    source=output_data_path / "base_data.parquet", use_pyarrow=True
)
ic(df.head())
ic(df.shape)

ic| df.head(): shape: (5, 8)
               ┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬──────────┬──────────┐
               │ used_git_h ┆ current_gi ┆ years_of_e ┆ past_next_ ┆ past_next_ ┆ career_lev ┆ dao_pill ┆ alias    │
               │ osting_pla ┆ t_hosting_ ┆ xperience  ┆ github     ┆ gitlab     ┆ el         ┆ ar       ┆ ---      │
               │ tform      ┆ platform   ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---      ┆ str      │
               │ ---        ┆ ---        ┆ str        ┆ str        ┆ str        ┆ str        ┆ str      ┆          │
               │ str        ┆ str        ┆            ┆            ┆            ┆            ┆          ┆          │
               ╞════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪══════════╪══════════╡
               │ GitHub,    ┆ GitHub,    ┆ 4 to 6     ┆ Worked     ┆ Worked     ┆ 2          ┆ Data     ┆ Jeremy   │
               │ GitLab,    ┆ GitLa

(105, 8)

---
---

## Bar
**Line, bar and pie charts 27.3.1** <br>
Chart Type: **Bar Chart (Grouped)** <br>
See [Flourish Sankey Diagram](https://app.flourish.studio/@flourish/sankey) for requirements. <br>
<hr>

This code generates three (3) diagram json files:
* `bar-dao_pillar-main.json`
* `bar-years_of_experience-main.json`
* `bar-career_level-main.json`
<br>

### `Data` Tab
#### Order
| A | B | C | D |
| --- | --- | --- | --- |
| dao_pillar | counts | group | row_filter |
| years_of_experience | counts | group | row_filter |
| career_level | counts | group | row_filter | <br>
<br>

#### Configuration
- **Labels/time: A**
- **Values: B**
- **Charts grid: C**
- **Row filter: D**

In [16]:
from typing import Any, Mapping, Optional, TypedDict


def bar_maker(
    df: pl.dataframe.frame.DataFrame, value_counts_column_name: str, *args, **kwargs
) -> Optional[pl.DataFrame]:
    """Creates json & parquets for bar diagrams.

    Args:
        Required:
            df: target Polars dataframe
            value_counts_column_name
        Optional:
            group_column_name
            row_filter_column_name
            debug: if True, then performs a dry run, saving to file disabled
            verbose: if True, then shows process

    Returns:
        Optional:
            Given: debug
            When:  True
            Then:  Polars DataFrame

    Raises:
        None
    """

    ## Prepare Args/Kwargs
    class TDdefaultKwargs(TypedDict):
        group_column_name: str
        row_filter_column_name: str
        debug: bool
        verbose: bool

    defaultKwargs: TDdefaultKwargs = TDdefaultKwargs(
        group_column_name="All respondents",
        row_filter_column_name="All respondents",
        debug=False,
        verbose=False,
    )
    allKwargs: Mapping[Any, Any] = {**defaultKwargs, **kwargs}

    group_column_name: str = allKwargs["group_column_name"]
    row_filter_column_name: str = allKwargs["row_filter_column_name"]
    debug: bool = allKwargs["debug"]
    verbose: bool = allKwargs["verbose"]

    ## Prepare Base DataFrame
    df_out: pl.dataframe.frame.DataFrame = (
        df.get_column(value_counts_column_name)
        .value_counts()
        .with_columns(
            pl.col("counts")
            # .cast(pl.Float32)
            .apply(lambda _: _ / df.height * 100, skip_nulls=True).round(2),
            pl.col(value_counts_column_name).fill_null("None"),
        )
        .sort(by="counts", descending=True)
        .with_columns(
            group=pl.lit(group_column_name),
            row_filter=pl.lit(row_filter_column_name),
        )
    )

    ## Prepare Base Path
    path_bar: Path = output_data_path / "Bar Diagrams"
    path_bar.mkdir(parents=False, exist_ok=True)

    file_name: Path = Path(
        str(
            "bar-"
            + value_counts_column_name
            + "-"
            + group_column_name.replace(" ", "_")
            + "-"
            + row_filter_column_name.replace(" ", "_")
        )
    ).with_suffix(".parquet")

    if verbose:
        ic(df_out)
        print()
        ic(f"file name: {value_counts_column_name}/parquet/{file_name}")
        print()

    if debug:
        return df_out
    else:
        ## Prepare Base Path
        path_bar_parquet: Path = path_bar / value_counts_column_name / "parquet"
        path_bar_parquet.mkdir(parents=True, exist_ok=True)

        ## Save dataframe to json
        ## Flourish accepts Excel, CSV, TSV, JSON, GeoJSON
        df_out.write_parquet(file=path_bar_parquet / file_name, use_pyarrow=True)


## --- Debug ---
df_test: pl.DataFrame = pl.DataFrame(
    {"metasyntactic": ["foo", None, "bar", "bar", "foo", "bar"]}
)

df_validate: pl.DataFrame = pl.DataFrame(
    {
        "metasyntactic": ["bar", "foo", "None"],
        "counts": [50.0, 33.33, 16.67],
        "group": ["All respondents", "All respondents", "All respondents"],
        "row_filter": ["All respondents", "All respondents", "All respondents"],
    }
)

from polars.testing import assert_frame_equal

assert_frame_equal(
    bar_maker(
        df=df_test, value_counts_column_name="metasyntactic", debug=True, verbose=True
    ),
    df_validate,
)

ic| df_out: shape: (3, 4)
            ┌───────────────┬────────┬─────────────────┬─────────────────┐
            │ metasyntactic ┆ counts ┆ group           ┆ row_filter      │
            │ ---           ┆ ---    ┆ ---             ┆ ---             │
            │ str           ┆ f64    ┆ str             ┆ str             │
            ╞═══════════════╪════════╪═════════════════╪═════════════════╡
            │ bar           ┆ 50.0   ┆ All respondents ┆ All respondents │
            │ foo           ┆ 33.33  ┆ All respondents ┆ All respondents │
            │ None          ┆ 16.67  ┆ All respondents ┆ All respondents │
            └───────────────┴────────┴─────────────────┴─────────────────┘
ic| f"file name: {value_counts_column_name}/parquet/{file_name}": ('file name: '
                                                                   'metasyntactic/parquet/bar-metasyntactic-All_respondents-All_respondents.parquet')






### Dry Run

In [17]:
_: Any = bar_maker(
    df=df, value_counts_column_name="dao_pillar", debug=True, verbose=True
)
_: Any = bar_maker(
    df=df, value_counts_column_name="years_of_experience", debug=True, verbose=True
)
_: Any = bar_maker(
    df=df, value_counts_column_name="career_level", debug=True, verbose=True
)

ic| df_out: shape: (4, 4)
            ┌──────────────────────────┬────────┬─────────────────┬─────────────────┐
            │ dao_pillar               ┆ counts ┆ group           ┆ row_filter      │
            │ ---                      ┆ ---    ┆ ---             ┆ ---             │
            │ str                      ┆ f64    ┆ str             ┆ str             │
            ╞══════════════════════════╪════════╪═════════════════╪═════════════════╡
            │ Data Engineering         ┆ 55.24  ┆ All respondents ┆ All respondents │
            │ Data Admin               ┆ 22.86  ┆ All respondents ┆ All respondents │
            │ Data Science & Analytics ┆ 18.1   ┆ All respondents ┆ All respondents │
            │ AFC                      ┆ 3.81   ┆ All respondents ┆ All respondents │
            └──────────────────────────┴────────┴─────────────────┴─────────────────┘
ic| f"file name: {value_counts_column_name}/parquet/{file_name}": ('file name: '
                                 





---             │
            │ str                 ┆ f64    ┆ str             ┆ str             │
            ╞═════════════════════╪════════╪═════════════════╪═════════════════╡
            │ <2 years            ┆ 52.38  ┆ All respondents ┆ All respondents │
            │ 2 to 4 years        ┆ 26.67  ┆ All respondents ┆ All respondents │
            │ None                ┆ 14.29  ┆ All respondents ┆ All respondents │
            │ 4 to 6 years        ┆ 5.71   ┆ All respondents ┆ All respondents │
            │ 8 to 10 years       ┆ 0.95   ┆ All respondents ┆ All respondents │
            └─────────────────────┴────────┴─────────────────┴─────────────────┘
ic| f"file name: {value_counts_column_name}/parquet/{file_name}": ('file name: '
                                                                   'years_of_experience/parquet/bar-years_of_experience-All_respondents-All_respondents.parquet')
ic| df_out: shape: (5, 4)
            ┌──────────────┬────────┬─────────────────┬──────────





┆ str             ┆ str             │
            ╞══════════════╪════════╪═════════════════╪═════════════════╡
            │ 2            ┆ 44.76  ┆ All respondents ┆ All respondents │
            │ 3            ┆ 24.76  ┆ All respondents ┆ All respondents │
            │ 1            ┆ 15.24  ┆ All respondents ┆ All respondents │
            │ 5            ┆ 7.62   ┆ All respondents ┆ All respondents │
            │ 4            ┆ 7.62   ┆ All respondents ┆ All respondents │
            └──────────────┴────────┴─────────────────┴─────────────────┘
ic| f"file name: {value_counts_column_name}/parquet/{file_name}": ('file name: '
                                                                   'career_level/parquet/bar-career_level-All_respondents-All_respondents.parquet')






### All Respondents

In [18]:
bar_maker(df=df, value_counts_column_name="dao_pillar")
bar_maker(df=df, value_counts_column_name="years_of_experience")
bar_maker(df=df, value_counts_column_name="career_level")

### By Category

`📝Note`: Wait for some process to finish writing the files, some cells might finish running but *Google Drive* might take a while to save the files.

In [19]:
for _ in Edao_pillar:
    df_tmp = df.filter(pl.col("dao_pillar") == _.value)
    ic(df_tmp)

    ## Skip filtered empty dataframes
    if df_tmp.is_empty():
        continue

    ## Value Count Column: years_of_experience
    ## Group Column: dao_pillar
    ## Row Filter Column: career_level
    for __ in Ecareer_level:
        df_tmp_bar: pl.DataFrame = df_tmp.filter(pl.col("career_level") == __.value)
        if df_tmp_bar.is_empty():
            continue
        bar_maker(
            df=df_tmp_bar,
            value_counts_column_name="years_of_experience",
            group_column_name=_.value,
            row_filter_column_name=__.value,
        )

    ## Value Count Column: career_level
    ## Group Column: dao_pillar
    ## Row Filter Column: years_of_experience
    for __ in Eyears_of_experience:
        df_tmp_bar: pl.DataFrame = df_tmp.filter(
            pl.col("years_of_experience") == __.value
        )
        if df_tmp_bar.is_empty():
            continue
        bar_maker(
            df=df_tmp_bar,
            value_counts_column_name="career_level",
            group_column_name=_.value,
            row_filter_column_name=__.value,
        )

ic| df_tmp: shape: (24, 8)
            ┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬──────────┬───────────┐
            │ used_git_h ┆ current_gi ┆ years_of_e ┆ past_next_ ┆ past_next_ ┆ career_lev ┆ dao_pill ┆ alias     │
            │ osting_pla ┆ t_hosting_ ┆ xperience  ┆ github     ┆ gitlab     ┆ el         ┆ ar       ┆ ---       │
            │ tform      ┆ platform   ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---      ┆ str       │
            │ ---        ┆ ---        ┆ str        ┆ str        ┆ str        ┆ str        ┆ str      ┆           │
            │ str        ┆ str        ┆            ┆            ┆            ┆            ┆          ┆           │
            ╞════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪══════════╪═══════════╡
            │ GitHub,    ┆ GitHub,    ┆ 4 to 6     ┆ Worked     ┆ Worked     ┆ 2          ┆ Data     ┆ Jeremy    │
            │ GitLab,    ┆ GitLab,    ┆ years      ┆ 

In [20]:
for _ in Eyears_of_experience:
    df_tmp: pl.DataFrame = df.filter(pl.col("years_of_experience") == _.value).head()
    ic(df_tmp)

    ## Skip filtered empty dataframes
    if df_tmp.is_empty():
        continue

    ## Value Count Column: dao_pillar
    ## Group Column: years_of_experience
    ## Row Filter Column: career_level
    for __ in Ecareer_level:
        df_tmp_bar: pl.DataFrame = df_tmp.filter(pl.col("career_level") == __.value)
        if df_tmp_bar.is_empty():
            continue
        bar_maker(
            df=df_tmp_bar,
            value_counts_column_name="dao_pillar",
            group_column_name=_.value,
            row_filter_column_name=__.value,
        )

ic| df_tmp: shape: (5, 8)
            ┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬──────────┬──────────┐
            │ used_git_h ┆ current_gi ┆ years_of_e ┆ past_next_ ┆ past_next_ ┆ career_lev ┆ dao_pill ┆ alias    │
            │ osting_pla ┆ t_hosting_ ┆ xperience  ┆ github     ┆ gitlab     ┆ el         ┆ ar       ┆ ---      │
            │ tform      ┆ platform   ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---      ┆ str      │
            │ ---        ┆ ---        ┆ str        ┆ str        ┆ str        ┆ str        ┆ str      ┆          │
            │ str        ┆ str        ┆            ┆            ┆            ┆            ┆          ┆          │
            ╞════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪══════════╪══════════╡
            │ GitLab     ┆ null       ┆ <2 years   ┆ Want to    ┆ Worked     ┆ 2          ┆ Data Eng ┆ Benjamin │
            │            ┆            ┆            ┆ work with

In [21]:
# Combine value counts parquets
# TODO: DRY code

path_bar_parquet: Path = output_data_path / "Bar Diagrams" / "dao_pillar" / "parquet"

df_tmp: pl.DataFrame = pl.DataFrame(
    schema={
        "dao_pillar": pl.Utf8,
        "counts": pl.Float64,
        "group": pl.Utf8,
        "row_filter": pl.Utf8,
    }
)

for _ in path_bar_parquet.glob("**/bar-dao_pillar-*.parquet"):
    with _ as file:
        df_pq = pl.read_parquet(
            source=file, use_pyarrow=True
        )  # .with_columns(pl.col("counts").cast(pl.Float32))
        ic(df_pq)
        df_tmp.extend(df_pq)
    # df_tmp.extend(df_tmp_tmp)

ic(df_tmp)  # .head()
df_tmp.write_json(
    file=output_data_path / "Bar Diagrams" / "dao_pillar" / "bar-dao_pillar-main.json",
    pretty=True,
    row_oriented=True,
)

ic| df_pq: shape: (4, 4)
           ┌──────────────────────────┬────────┬─────────────────┬─────────────────┐
           │ dao_pillar               ┆ counts ┆ group           ┆ row_filter      │
           │ ---                      ┆ ---    ┆ ---             ┆ ---             │
           │ str                      ┆ f64    ┆ str             ┆ str             │
           ╞══════════════════════════╪════════╪═════════════════╪═════════════════╡
           │ Data Engineering         ┆ 55.24  ┆ All respondents ┆ All respondents │
           │ Data Admin               ┆ 22.86  ┆ All respondents ┆ All respondents │
           │ Data Science & Analytics ┆ 18.1   ┆ All respondents ┆ All respondents │
           │ AFC                      ┆ 3.81   ┆ All respondents ┆ All respondents │
           └──────────────────────────┴────────┴─────────────────┴─────────────────┘
ic| df_pq: shape: (1, 4)
           ┌──────────────────┬────────┬──────────┬────────────┐
           │ dao_pillar       ┆ cou

In [22]:
# Combine value counts parquets

path_bar_parquet: Path = (
    output_data_path / "Bar Diagrams" / "years_of_experience" / "parquet"
)

df_tmp: pl.DataFrame = pl.DataFrame(
    schema={
        "years_of_experience": pl.Utf8,
        "counts": pl.Float64,
        "group": pl.Utf8,
        "row_filter": pl.Utf8,
    }
)

ic(df_tmp)

for _ in path_bar_parquet.glob("**/bar-years_of_experience-*.parquet"):
    with _ as file:
        df_pq = pl.read_parquet(
            source=file, use_pyarrow=True
        )  # .with_columns(pl.col("counts").cast(pl.Float32))
        ic(df_pq)
        df_tmp.extend(df_pq)
    # df_tmp.extend(df_tmp_tmp)

ic(df_tmp)  # .head()
df_tmp.write_json(
    file=output_data_path
    / "Bar Diagrams"
    / "years_of_experience"
    / "bar-years_of_experience-main.json",
    pretty=True,
    row_oriented=True,
)

ic| df_tmp: shape: (0, 4)
            ┌─────────────────────┬────────┬───────┬────────────┐
            │ years_of_experience ┆ counts ┆ group ┆ row_filter │
            │ ---                 ┆ ---    ┆ ---   ┆ ---        │
            │ str                 ┆ f64    ┆ str   ┆ str        │
            ╞═════════════════════╪════════╪═══════╪════════════╡
            └─────────────────────┴────────┴───────┴────────────┘
ic| df_pq: shape: (5, 4)
           ┌─────────────────────┬────────┬─────────────────┬─────────────────┐
           │ years_of_experience ┆ counts ┆ group           ┆ row_filter      │
           │ ---                 ┆ ---    ┆ ---             ┆ ---             │
           │ str                 ┆ f64    ┆ str             ┆ str             │
           ╞═════════════════════╪════════╪═════════════════╪═════════════════╡
           │ <2 years            ┆ 52.38  ┆ All respondents ┆ All respondents │
           │ 2 to 4 years        ┆ 26.67  ┆ All respondents ┆ All respond

In [23]:
# Combine value counts parquets

path_bar_parquet: Path = output_data_path / "Bar Diagrams" / "career_level" / "parquet"

df_tmp: pl.DataFrame = pl.DataFrame(
    schema={
        "career_level": pl.Utf8,
        "counts": pl.Float64,
        "group": pl.Utf8,
        "row_filter": pl.Utf8,
    }
)

ic(df_tmp)

for _ in path_bar_parquet.glob("**/bar-career_level-*.parquet"):
    with _ as file:
        df_pq = pl.read_parquet(
            source=file, use_pyarrow=True
        )  # .with_columns(pl.col("counts").cast(pl.Float32))
        ic(df_pq)
        df_tmp.extend(df_pq)
    # df_tmp.extend(df_tmp_tmp)

ic(df_tmp)  # .head()
df_tmp.write_json(
    file=output_data_path
    / "Bar Diagrams"
    / "career_level"
    / "bar-career_level-main.json",
    pretty=True,
    row_oriented=True,
)

ic| df_tmp: shape: (0, 4)
            ┌──────────────┬────────┬───────┬────────────┐
            │ career_level ┆ counts ┆ group ┆ row_filter │
            │ ---          ┆ ---    ┆ ---   ┆ ---        │
            │ str          ┆ f64    ┆ str   ┆ str        │
            ╞══════════════╪════════╪═══════╪════════════╡
            └──────────────┴────────┴───────┴────────────┘
ic| df_pq: shape: (5, 4)
           ┌──────────────┬────────┬─────────────────┬─────────────────┐
           │ career_level ┆ counts ┆ group           ┆ row_filter      │
           │ ---          ┆ ---    ┆ ---             ┆ ---             │
           │ str          ┆ f64    ┆ str             ┆ str             │
           ╞══════════════╪════════╪═════════════════╪═════════════════╡
           │ 2            ┆ 44.76  ┆ All respondents ┆ All respondents │
           │ 3            ┆ 24.76  ┆ All respondents ┆ All respondents │
           │ 1            ┆ 15.24  ┆ All respondents ┆ All respondents │
           

EOF