# 事前アンケートの質問項目

0. タイムスタンプ
1. 【Q1】What is your age ?
2. 【Q2】What gender do you identify as ?
3. 【Q3】Which geographical region are you currently working or attending school/university in ?
4. 【Q4】Which geographical region do you most strongly associate with ?
5. 【Q5】What is your job title ?
6. 【Q6】Which group do you belong to ?
7. 【Q7】What is your research type ?
8. 【Q8】How long have you been in this field ?
9. 【Q9】Are you satisfied with your career to date ?
10. 【Q10】How many hours, on average, do you spend on housework, childcare, and caregiving per day ?
11. 【Q11】Did you already sign up for the diversity session in ICRC2023?
12. 【Q12】What do you think about the initiatives on DE&I of your group? [Gender balance]
13. 【Q12】What do you think about the initiatives on DE&I of your group? [Diversity]
14. 【Q12】What do you think about the initiatives on DE&I of your group? [Equity]
15. 【Q12】What do you think about the initiatives on DE&I of your group? [Inclusion]
16. 【Q13】What is the percentage of female researcher in your group?
17. 【Q14】What do you think about the percentage above ?
18. 【Q15】Please let us know If your group has any good practice examples related to DE&I ?
19. 【Q16】Please let us know if there is anything your group needs to work on or if your group has any problems related to DE&I.
20. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Gender balance]
21. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Diversity]
22. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Equity]
23. 【Q17】What are your thoughts on diversity, equity & inclusion initiatives ? [Inclusion]
24. 【Q18】Could you tell us more about your thoughts (agree / disagree) ?
25. 【Q19】When did you first become interested in science ?
26. 【Q20】Do you have any concerns / problems related to DE&I initiatives in science ?
27. 【Q21】What reasons do you think are hindering DE&I initiatives in science ?
28. 【Q22】Comments

Pythonのライブラリ
- ``pathlib.Path`` : パス操作
- ``pandas`` : データ集計
- ``altair (v5)`` : プロット作成

In [1]:
from pathlib import Path
import pandas as pd
import altair as alt
import titanite as ti

print(f"Pandas {pd.__version__}")
print(f"Altair {alt.__version__}")
print(f"Titanite {ti.__version__}")

Pandas 2.0.3
Altair 5.0.1
Titanite 0.1.2


事前アンケートの結果を読み込む

- 回答データはGoogleスプレッドシートからCSV形式でダウンロード（手動）
- ファイルパスは``../data/test_data/``にする（あとで変えるかも）
- 回答時刻（``timestamp``カラム）はDateTimeオブジェクトに変換する
- 回答数を集計（主に``sum``）するためのカラム``response``を追加する

In [2]:
f_cfg = "../sandbox/config.toml"
cfg = ti.Config(fname=f_cfg)
cfg.load()
category = cfg.categories()

In [3]:
f_csv = "../data/test_data/all.csv"
data = pd.read_csv(f_csv, parse_dates=["timestamp"])
data = ti.categorical_data(data, category)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 279 entries, 0 to 278
Data columns (total 52 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   timestamp          279 non-null    datetime64[ns]
 1   q1                 279 non-null    category      
 2   q2                 279 non-null    category      
 3   q3                 279 non-null    category      
 4   q4                 279 non-null    category      
 5   q5                 279 non-null    category      
 6   q6                 279 non-null    category      
 7   q7                 255 non-null    category      
 8   q8                 279 non-null    category      
 9   q9                 279 non-null    category      
 10  q10                279 non-null    int64         
 11  q11                279 non-null    category      
 12  q12_genderbalance  279 non-null    category      
 13  q12_diversity      279 non-null    category      
 14  q12_equity

ビニングを追加

In [4]:
data["q10_binned"] = pd.cut(
    data["q10"],
    [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 25],
    labels = ["Prefer not to answer", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10+"],
    right=False)
data["q13_binned"] = pd.cut(
    data["q13"],
    [-1, 0, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100],
    labels = ["Prefer not to answer", "0", "10", "15", "20", "25", "30", "35", "40", "45", "50", "55", "60", "65", "70", "75", "80", "85", "90", "95"],
    right=False)

In [5]:
# データフレームを確認する（不要なときはコメントアウトしておく）
# data.head()
# data.columns

In [6]:
#alt.Chart(data).mark_bar().encode(x="q1", y="count()")

回答時刻を調べてみる

- [バブル図](https://altair-viz.github.io/gallery/histogram_scatterplot.html)を使ってみる → いまいちだったのでヒートマップにした
- ``timestamp``カラムを使って、回答時刻の分布を調べる
- 記録されている日時は、たぶんUTC+0900？（よく分かってない）
- 年月日（=date）ごとにリサンプルして合計したいので、データフレームのインデックスを``timestamp``にする
- ``date``と``時刻``のヒートマップを作りたいので、それ用のカラムを追加する

In [7]:
data.index = data["timestamp"]
tm = data.resample("H")["response"].sum().reset_index()
# tm["yyyymmdd"] = pd.to_datetime(tm["timestamp"].dt.date)
# tm["HH"] = tm["timestamp"].dt.hour
# tm.info()

In [8]:
#tm

回答

In [9]:
base = alt.Chart(tm).encode(
    alt.X("hours(timestamp):T").title("時刻"),
    alt.Y("yearmonthdate(timestamp):T").title("記入日").axis(format="%Y-%m-%d").sort("descending"),
).properties(
    width=800,
    height=400,
)


hm = base.mark_rect().encode(
    alt.Color("response").scale(scheme="blues").title("回答数"),
    #size="response"
)

tt = base.mark_text(baseline="line-top").encode(
    alt.Text("response"),
)


(hm + tt).save("tmp_registered.png")
(hm + tt)

すべての質問に対してヒストグラムを作成する

- 横軸を「質問番号」にしたヒストグラムを作成する
- 横軸はカテゴリカルなデータにして、任意の順番に並べたい（TODO）
  - いまは自動でアルファベット順に整列

In [10]:
def make_histogram(data: pd.DataFrame, x: str, title: str):
    g = list(set(["q2", x]))
    grouped = data.groupby(g)["response"].sum().reset_index().sort_values(by=x)
    h = alt.Chart(grouped).mark_bar().encode(
        alt.X(x),
        alt.Y("response"),
        alt.Color("q2:N"),
        alt.Order("response", sort="descending"),
    ).properties(
        title=title,
        width=400,
        height=400,
    )

    hs = h.encode(
        alt.Y("response").stack("normalize"),
    )

    chart = h | hs

    return {"data": grouped, "chart": chart}

In [11]:
category["gender"]

CategoricalDtype(categories=['Male', 'Female', 'Non-binary', 'Prefer to self-identify',
                  'Prefer not to answer'],
, ordered=True)

In [12]:
t = cfg.questions.get("q10", "Could not get title")
q = make_histogram(data, "q10_binned", t)
q["data"]
q["chart"]

In [13]:
# sorted(data.columns)[:-2]

In [14]:
# すべてのカラムでループ
# ただし timestampとresponseは除外
cols = sorted(data.columns)[:-2]
for col in cols:
    key = col.split("_")[0]
    title = cfg.questions.get(key, "Could Not Find Key")
    hist = make_histogram(data, col, title)
    fname = f"../data/quick_summary/tmp_{col}.csv"
    hist["data"].to_csv(fname, index=False)
    fname = f"../data/quick_summary/tmp_{col}.png"
    hist["chart"].save(fname)
    print(f"Saved as {fname}")
    

Saved as ../data/quick_summary/tmp_q1.png
Saved as ../data/quick_summary/tmp_q10.png
Saved as ../data/quick_summary/tmp_q10_binned.png
Saved as ../data/quick_summary/tmp_q11.png
Saved as ../data/quick_summary/tmp_q12_diversity.png
Saved as ../data/quick_summary/tmp_q12_equity.png
Saved as ../data/quick_summary/tmp_q12_genderbalance.png
Saved as ../data/quick_summary/tmp_q12_inclusion.png
Saved as ../data/quick_summary/tmp_q13.png
Saved as ../data/quick_summary/tmp_q13_binned.png
Saved as ../data/quick_summary/tmp_q14.png
Saved as ../data/quick_summary/tmp_q15.png
Saved as ../data/quick_summary/tmp_q15_ja.png
Saved as ../data/quick_summary/tmp_q15_polarity.png
Saved as ../data/quick_summary/tmp_q15_subjectivity.png
Saved as ../data/quick_summary/tmp_q16.png
Saved as ../data/quick_summary/tmp_q16_ja.png
Saved as ../data/quick_summary/tmp_q16_polarity.png
Saved as ../data/quick_summary/tmp_q16_subjectivity.png
Saved as ../data/quick_summary/tmp_q17_diversity.png
Saved as ../data/quick_sum

2つの質問をクロス集計する

- すべてのカラムに対してクロス集計する
- CSV集計するときの見出し列をわかりやすくするために、転置（``.T``）している
  - もっとコードが読みやすくなる方法があるかもしれない

In [15]:
cols = sorted(data.columns)[:-2]
ctabs = {}
for c1 in cols:
    for c2 in cols:
        ctab = pd.crosstab(data[c1], data[c2])
        name = f"{c1}_{c2}"
        fname = f"../data/quick_summary/tmp_crosstab_{name}.csv"
        ctab.T.to_csv(fname)
        print(f"Saved as {fname}")
        ctabs.update({name: ctab})

Saved as ../data/quick_summary/tmp_crosstab_q1_q1.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q10.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q10_binned.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q11.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q12_diversity.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q12_equity.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q12_genderbalance.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q12_inclusion.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q13.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q13_binned.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q14.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q15.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q15_ja.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q15_polarity.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q15_subjectivity.csv
Saved as ../data/quick_summary/tmp_crosstab_q1_q16.csv
Saved as ../data/quick_summary/tmp_crosst