# Unfounded sexual assault claims
*June 15, 2022*

Statistics Canada has data on unfounded criminal code violations?? This is news to me. Let's check it out by looking at lower-level sexual assault stuff.

First, import pandas for data analysis on some modules for dealing with zipfiles.

In [22]:
import pandas as pd
from zipfile import ZipFile
import requests
from io import BytesIO

Read in our zipped data and pull the actual data out of it.

In [23]:
r = requests.get("https://www150.statcan.gc.ca/n1/en/tbl/csv/35100177-eng.zip?st=oN_UFW50")
files = ZipFile(BytesIO(r.content))
file = files.open(files.namelist()[0])
raw = pd.read_csv(file, encoding="utf-8")

raw.head(5)

  raw = pd.read_csv(file, encoding="utf-8")


Unnamed: 0,REF_DATE,GEO,DGUID,Violations,Statistics,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
0,1998,Canada,2016A000011124,"Total, all violations [0]",Actual incidents,Number,223,units,0,v44348247,1.1.1,2688540.0,,,,0
1,1998,Canada,2016A000011124,"Total, all violations [0]","Rate per 100,000 population",Rate,257,units,0,v44396346,1.1.2,8915.12,,,,2
2,1998,Canada,2016A000011124,"Total, all violations [0]",Percentage change in rate,Percent,239,units,0,v44391402,1.1.3,,..,,,2
3,1998,Canada,2016A000011124,"Total, all violations [0]",Total cleared,Number,223,units,0,v44327422,1.1.4,1073453.0,,,,0
4,1998,Canada,2016A000011124,"Total, all violations [0]",Cleared by charge,Number,223,units,0,v44327628,1.1.5,705133.0,,,,0


We'll start by looking at unfounded rates for level 1 sexual assault in every region across Canada, and showing which region currently has the highest rate.

In [24]:
data = (raw
        .loc[(raw["Statistics"].isin(["Percent unfounded"])) &
             (raw["Violations"] == "Sexual assault, level 1 [1330]"), :]
        .pivot(index="GEO", columns="REF_DATE", values="VALUE")
        .dropna(how="all", axis=1)
        )

data.sort_values(2020, ascending=False).head(5)

REF_DATE,2017,2018,2019,2020,2021
GEO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Moncton, New Brunswick [13305]",30.71,21.01,23.75,21.68,14.81
Northwest Territories [61],14.55,18.59,16.72,19.41,14.34
Prince Edward Island [11],25.81,12.5,25.0,17.36,19.78
"Thunder Bay, Ontario [35595]",17.27,12.26,14.19,15.56,3.89
"Abbotsford-Mission, British Columbia [59932]",19.44,16.78,13.07,14.59,10.49


Now let's take a look at which areas have improved the least since 2017 by adding a new column.

In [25]:
data["diff"] = data[2020] - data[2017]
data = data.sort_values("diff", ascending=False)

data.head(10)

REF_DATE,2017,2018,2019,2020,2021,diff
GEO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Northwest Territories [61],14.55,18.59,16.72,19.41,14.34,4.86
"Saskatoon, Saskatchewan [47725]",3.5,2.65,4.15,4.67,6.29,1.17
"Brantford, Ontario [35543]",13.48,9.32,10.48,14.57,10.43,1.09
"Calgary, Alberta [48825]",5.13,5.41,5.54,5.83,4.63,0.7
"Sherbrooke, Quebec [24433]",6.34,9.7,4.46,5.86,3.48,-0.48
"Saguenay, Quebec [24408]",15.25,13.87,17.26,14.57,10.29,-0.68
"Peterborough, Ontario [35529]",10.87,7.38,5.04,9.9,5.56,-0.97
Newfoundland and Labrador [10],12.63,19.2,14.44,10.97,10.71,-1.66
"Thunder Bay, Ontario [35595]",17.27,12.26,14.19,15.56,3.89,-1.71
"Halifax, Nova Scotia [12205]",4.46,2.42,6.26,2.57,3.58,-1.89


In [26]:
data.sort_values("diff", ascending=True).head(10)

REF_DATE,2017,2018,2019,2020,2021,diff
GEO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Kelowna, British Columbia [59915]",40.68,16.51,13.3,11.97,14.91,-28.71
"London, Ontario [35555]",16.95,2.9,0.48,1.53,0.73,-15.42
"Lethbridge, Alberta [48810]",19.48,11.81,10.34,4.47,7.64,-15.01
"Trois-Rivières, Quebec [24442]",20.0,12.5,12.0,5.26,8.77,-14.74
Nunavut [62],23.59,20.85,14.4,12.89,16.92,-10.7
"Kitchener-Cambridge-Waterloo, Ontario [35541]",14.76,7.49,5.56,4.73,4.65,-10.03
"Belleville, Ontario [35522]",23.19,16.28,7.24,13.33,12.75,-9.86
"Moncton, New Brunswick [13305]",30.71,21.01,23.75,21.68,14.81,-9.03
Prince Edward Island [11],25.81,12.5,25.0,17.36,19.78,-8.45
"Québec, Quebec [24421]",19.67,20.04,15.56,11.25,6.23,-8.42


Let's also take a moment to put this into context - what's the average unfounded rate for all violations across Canada? Are these numbers very high, or about what we might expect for unfounded rates?

In [27]:
data = (raw
        .loc[(raw["Statistics"] == "Percent unfounded") &
              (raw["Violations"] == "Total, all violations [0]") &
              (raw["GEO"] == "Canada") &
              (raw["REF_DATE"] >= 2017), ["REF_DATE", "GEO", "VALUE"]]
        )

data.head(5)

Unnamed: 0,REF_DATE,GEO,VALUE
3058344,2017,Canada,6.83
3312120,2018,Canada,6.21
3565896,2019,Canada,5.83
3819672,2020,Canada,5.75
4073448,2021,Canada,5.47


The answer to the question: yes, the unfounded rates for level 1 sexual assault are, in many places, far above the average unfounded rates for all crimes across Canada.

### Northwest Territories

Let's take a closer look at the Northwest Territories, where things seem to be getting worse in terms of concluding lower-severity sexual assault cases as unfounded. Maybe they just don't have a lot of cases, and this is a sample size issue.

In [28]:
nwt = (raw
        .loc[(raw["Statistics"].isin(["Actual incidents"])) &
             (raw["Violations"] == "Sexual assault, level 1 [1330]") &
             (raw["GEO"] == "Northwest Territories [61]"), ["REF_DATE", "VALUE"]]
        .set_index("REF_DATE")
        )

nwt.tail(5)

Unnamed: 0_level_0,VALUE
REF_DATE,Unnamed: 1_level_1
2017,182.0
2018,162.0
2019,259.0
2020,245.0
2021,227.0


It looks like they have a reasonable number of cases. Certainly enough that it makes their unfounded rate interesting.