# 3. Analyse unanswered questions by department

The third stage of our analysis is to look at unanswered questions.

The big departments with relatively high 'not held' counts are the DHSC, DWP, MoJ, and Home Office - these departments provide "not answered" responses to a high proportion of requests for quantitative information. 

In [104]:
import pandas as pd

%matplotlib inline
pd.set_option('display.max_colwidth', None)

In [2]:
df = pd.read_csv('./data/output/questions_with_flags.csv', low_memory=False)
len(df)

203938

## Which department gets the most quantitative questions?

- As an absolute number, DHSC, Home Office, DWP, MoD, MoJ
- As a proportion of all their WQs, the Attorney General, MoJ, MoD, DWP and the Home Office.

So that suggests that there is a particularly high demand for quantitative and statistical info from those departments.

In [3]:
df_quantitative = df[df.is_quantitative]

In [99]:
# pd.concat(
#     [df_quantitative.department.value_counts().head(10),
#      df_quantitative.department.value_counts(normalize=True).head(10)], 
#     axis=1)

Among departments of any size, the MoJ and Attorney General receive the highest proportion of quantitative questions, followed by the MoD, DWP, Home Office and DfE.

In [100]:
temp = df.groupby(df.department).agg(total_quant=('is_quantitative', 'sum'), total_questions=('is_quantitative', 'size'))
temp['proportion_quant'] = temp.total_quant / temp.total_questions * 100
temp[temp.total_questions > 100].sort_values("proportion_quant", ascending=False)

Unnamed: 0_level_0,total_quant,total_questions,proportion_quant
department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Attorney General,251,679,36.966127
Ministry of Justice,2545,7080,35.946328
Ministry of Defence,3081,10172,30.289029
Department for Work and Pensions,3015,10708,28.156518
Home Office,4269,16181,26.382795
Scotland Office,110,462,23.809524
Cabinet Office,1125,4789,23.491334
Department for Education,2554,13135,19.444233
House of Commons Commission,48,256,18.75
Treasury,2144,12138,17.663536


# Proportion unanswered by department

Examine the proportion of answers that came back saying "we do not hold this data", by department.

Obviously this needs to be approached with caution, because as previously noted this contains both false positives and false negatives.

But it gives us a rough idea!

In [101]:
df_by_department = df.groupby("department").agg(
    {'contains_not_held_string': 'sum',
     'is_quantitative': 'sum',
     'contains_not_held_string_and_isquant': 'sum',
     'url': 'count'
})
df_by_department.rename(columns={"url": "total_questions"}, inplace=True)
df_by_department.head()
df_by_department["percent_wqs_quantitative"] = (df_by_department.is_quantitative / df_by_department.total_questions * 100)
df_by_department["percent_wqs_notheld"] = df_by_department.contains_not_held_string / df_by_department.is_quantitative * 100
df_by_department["percent_of_quant_wqs_notheld"] = df_by_department.contains_not_held_string_and_isquant / df_by_department.is_quantitative * 100
df_by_department.sort_values(by="percent_of_quant_wqs_notheld", ascending=False, inplace=True)

The big departments with relatively high 'not held' counts are the DHSC, DWP, MoJ, and Home Office.

The big departments with relatively low 'not held' counts include DfT and the Treasury.

In [8]:
cols = ["total_questions", "is_quantitative", "percent_of_quant_wqs_notheld"]
df_by_department[df_by_department.total_questions > 100][cols].sort_values("percent_of_quant_wqs_notheld", ascending=False)

Unnamed: 0_level_0,total_questions,is_quantitative,percent_of_quant_wqs_notheld
department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Church Commissioners,264,30,46.666667
Department of Health and Social Care,39376,6659,42.138459
Home Office,16181,4269,34.340595
Attorney General,679,251,33.864542
Department for Work and Pensions,10708,3015,33.864013
Ministry of Justice,7080,2545,30.844794
Leader of the House,144,23,30.434783
Department for Education,13135,2554,26.781519
"Ministry of Housing, Communities and Local Government",5012,784,23.086735
Treasury,12138,2144,20.848881


In [9]:
df_by_department.to_csv("./data/output/by_department.csv", index=False)