# 事前アンケートの質問項目

0. タイムスタンプ
1. 【Q1】What is your age ?
2. 【Q2】What gender do you identify as ?
3. 【Q3】Which geographical region are you currently working or attending school/university in ?
4. 【Q4】Which geographical region do you most strongly associate with ?
5. 【Q5】What is your job title ?
6. 【Q6】Which group do you belong to ?
7. 【Q7】What is your research type ?
8. 【Q8】How long have you been in this field ?
9. 【Q9】Are you satisfied with your career to date ?
10. 【Q10】How many hours, on average, do you spend on housework, childcare, and caregiving per day ?
11. 【Q11】Did you already sign up for the diversity session in ICRC2023?
12. 【Q12】What do you think about the initiatives on DE&I of your group? [Gender balance]
13. 【Q12】What do you think about the initiatives on DE&I of your group? [Diversity]
14. 【Q12】What do you think about the initiatives on DE&I of your group? [Equity]
15. 【Q12】What do you think about the initiatives on DE&I of your group? [Inclusion]
16. 【Q13】What is the percentage of female researcher in your group?
17. 【Q14】What do you think about the percentage above ?
18. 【Q15】Please let us know If your group has any good practice examples related to DE&I ?
19. 【Q16】Please let us know if there is anything your group needs to work on or if your group has any problems related to DE&I.
20. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Gender balance]
21. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Diversity]
22. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Equity]
23. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Inclusion]
24. 【Q18】Could you tell us more about your thoughts (agree / disagree) ?
25. 【Q19】When did you first become interested in science ?
26. 【Q20】Do you have any concerns / problems related to DE&I initiatives in science ?
27. 【Q21】What reasons do you think are hindering DE&I initiatives in science ?
28. 【Q22】Comments

Pythonのライブラリ
- ``pathlib.Path`` : パス操作
- ``pandas`` : データ集計
- ``altair (v5)`` : プロット作成

In [None]:
from pathlib import Path
import pandas as pd
import altair as alt
import titanite as ti

print(f"Pandas {pd.__version__}")
print(f"Altair {alt.__version__}")
print(f"Titanite {ti.__version__}")

事前アンケートの結果を読み込む

- 回答データはGoogleスプレッドシートからCSV形式でダウンロード（手動）
- ファイルパスは``../data/test_data/``にする（あとで変えるかも）
- 回答時刻（``timestamp``カラム）はDateTimeオブジェクトに変換する
- 回答数を集計（主に``sum``）するためのカラム``response``を追加する

In [None]:
f_cfg = "../sandbox/config.toml"
cfg = ti.Config(fname=f_cfg)
cfg.load()
category = cfg.categories()

In [None]:
f_csv = "../data/test_data/prepared_data.csv"
data = pd.read_csv(f_csv, parse_dates=["timestamp"])
data = ti.categorical_data(data, category)
data.info()

ビニングを追加

In [None]:
data["q10_binned"] = pd.cut(
    data["q10"],
    [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 25],
    labels = ["Prefer not to answer", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10+"],
    right=False)
data["q13_binned"] = pd.cut(
    data["q13"],
    [-1, 0, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100],
    labels = ["Prefer not to answer", "0", "10", "15", "20", "25", "30", "35", "40", "45", "50", "55", "60", "65", "70", "75", "80", "85", "90", "95"],
    right=False)

In [None]:
# データフレームを確認する（不要なときはコメントアウトしておく）
# data.head()
# data.columns

In [None]:
#alt.Chart(data).mark_bar().encode(x="q1", y="count()")

回答

すべての質問に対してヒストグラムを作成する

- 横軸を「質問番号」にしたヒストグラムを作成する
- 横軸はカテゴリカルなデータにして、任意の順番に並べたい（TODO）
  - いまは自動でアルファベット順に整列

In [None]:
def make_histogram(data: pd.DataFrame, x: str, title: str):
    g = list(set(["q02", x]))
    grouped = data.groupby(g)["response"].sum().reset_index().sort_values(by=x)
    h = alt.Chart(grouped).mark_bar().encode(
        alt.X(x),
        alt.Y("response"),
        alt.Color("q02:N"),
        alt.Order("response", sort="descending"),
    ).properties(
        title=title,
        width=400,
        height=400,
    )

    hs = h.encode(
        alt.Y("response").stack("normalize"),
    )

    chart = h | hs

    return {"data": grouped, "chart": chart}

In [None]:
category["gender"]

In [None]:
t = cfg.questions.get("q10", "Could not get title")
q = make_histogram(data, "q10_binned", t)
q["data"]
q["chart"]

In [None]:
# すべてのカラムに対して、2つの総当たりの組み合わせを取得する
# ただし：timestampとresponseは除外する
# ただし：自由記述は除外（極性などは残す）

headers = [
    "q01",
    "q02",
    "q03", "q03_regional", "q03_subregional",
    "q04", "q04_regional", "q04_subregional",
    "q05",
    "q06",
    "q07",
    "q08",
    "q09",
    "q10", "q10_binned",
    "q11",
    "q12_diversity", "q12_equity", "q12_genderbalance", "q12_inclusion",
    "q13", "q13_binned",
    "q14", 
    "q15_polarity", "q15_subjectivity",
    "q16_polarity", "q16_subjectivity",
    "q17_diversity", "q17_equity", "q17_genderbalance", "q17_inclusion",
    "q18_polarity", "q18_subjectivity",
    "q19",
    "q20_polarity", "q20_subjectivity",
    "q21_polarity", "q21_subjectivity",
    "q22_polarity", "q22_subjectivity",
]

In [None]:
#for (h1, h2) in matches:
#    key = h1.split("_")[0]
#    print(h1, h2, key)

In [None]:
for header in headers:
    key = header.split("_")[0]
    title = cfg.questions.get(key, "Could Not Find Key")
    hist = make_histogram(data, header, title)
    fname = f"../data/quick_summary/tmp_{header}.csv"
    hist["data"].to_csv(fname, index=False)
    fname = f"../data/quick_summary/tmp_{header}.png"
    hist["chart"].save(fname)
    print(f"Saved as {fname}")

2つの質問をクロス集計する

- すべてのカラムに対してクロス集計する
- CSV集計するときの見出し列をわかりやすくするために、転置（``.T``）している
  - もっとコードが読みやすくなる方法があるかもしれない

In [None]:
import itertools
matches = list(itertools.combinations(headers, 2))
len(matches)

In [None]:
ctabs = {}
for h1, h2 in matches:
    ctab = pd.crosstab(data[h1], data[h2], margins=True)
    name = f"{h1}_{h2}"
    fname = f"../data/quick_summary/tmp_crosstab_{name}.csv"
    ctab.T.to_csv(fname)
    print(f"Saved as {fname}")
    ctabs.update({name: ctab})