## Jittering data for a visualization on intimate partner violence

Going to walk through how I jittered data for [this visualization](https://www.datawrapper.de/_/w9650/) for a CBC story on intimate partner violence. The dataset used here is stripped down greatly to only the columns that are needed, so that the names of victims are not revealed.

First, let's import a few libraries we need, and load in the data.

In [49]:
import pandas as pd         ## Standard data analysis library.
from random import random   ## For generating random numbers.
import numpy as np          ## For some of the number crunching in the jitter function below.

raw = pd.read_csv('../raw/RAW-2021-CBC-IPVDATA.csv', encoding="latin-1")

display(raw.head(3))

Unnamed: 0,VICTIM_AGE,VICTIM_SEX,VICTIM_ETH
0,42.0,Female,Caucasian
1,46.0,Female,Caucasian
2,25.0,Male,Indigenous


Let's start by codifying male/female in our dataset. Female = 1, male = 1. We can't jitter strings, after all!

In [52]:
viz_data["VICTIM_SEX"] = (raw["VICTIM_SEX"]
                          .fillna("Unknown")
                          .replace({"Female": 1, "Male": 2, "Unknown": np.nan})
                          )

display(viz_data.head(3))

Unnamed: 0,VICTIM_AGE,VICTIM_SEX,VICTIM_ETH
0,42.0,1,0.630245
1,46.0,1,0.887253
2,25.0,2,1.909973


Next, we'll do the same for victim ethnicity, with a code grouping as follows:
* **White**: 1
* **Indigenous**: 2
* **Black and people of colour**: 3
* **Unknown**: 4

In [44]:
viz_data["VICTIM_ETH"] = (viz_data["VICTIM_ETH"]
                        .fillna("Unknown")
                        .replace({
                            'Southeast Asian': 3,
                            'Arab': 3,
                            'Unknown': 4,
                            'Black': 3,
                            'East Asian': 3,
                            'South Asian': 3,
                            'Filipino': 3,
                            'Persian': 3,
                            'Latin American': 3,
                            "Caucasian": 1,
                            "Indigenous": 2,
                        })
                        )

display(viz_data.head(3))

Unnamed: 0,VICTIM_AGE,VICTIM_SEX,VICTIM_ETH
0,42.0,1,1
1,46.0,1,1
2,25.0,2,2


Here, we'll define a function that jitters two columns of data for use in a Datawrapper scatter plot. This was adapted from R code shared [here](https://github.com/datawrapper/snippets/tree/master/2021-08-05-summer-winter-olympics-temperature) and was inspired by a [blog post on Datawrapper's blog](https://blog.datawrapper.de/summer-winter-olympics-temperature/).

In [38]:
def rand_in_circle(frame, x_radius, y_radius):
  x_skew = frame[0]
  y_skew = frame[1]
  t = 2 * np.pi * np.random.uniform()
  u = np.random.uniform() + np.random.uniform()
  if u>1:
    r = 2-u
  else:
    r = u
  return [(r * x_radius * np.cos(t)) + x_skew, (r * y_radius * np.sin(t)) + y_skew]


def jitter(dframe, x_value_col, y_value_col, scale_factor=1):
  df = dframe[[x_value_col, y_value_col]]
  values_x = df.value_counts()
  all = []

  for item in values_x.items():
    subframe = df.query(x_value_col + ' == ' + str(item[0][0])).query(y_value_col + ' == ' + str(item[0][1]))

    x_range = df[x_value_col].max() - df[x_value_col].min()
    y_range = df[y_value_col].max() - df[y_value_col].min()

    x_radius = (item[1] / df.iloc[:,0].count()) * x_range * scale_factor
    y_radius = (item[1] / df.iloc[:,0].count()) * y_range * scale_factor

    subframe = subframe.apply(lambda x: rand_in_circle(x, x_radius, y_radius), axis=1, result_type="expand")
    all.append(subframe)
  
  dframe[[x_value_col, y_value_col]] = pd.concat(all)
  return dframe

Now, we implement it. You need to fiddle with the last parameter in the `jitter()` function in order to find a jitter that's appropriate. The higher the value, the greater the points will jitter.

In [67]:
jitterplot = jitter(viz_data, "VICTIM_SEX", "VICTIM_ETH", 0.4)

display(jitterplot.head(3))

Unnamed: 0,VICTIM_AGE,VICTIM_SEX,VICTIM_ETH
0,42.0,0.993082,0.54782
1,46.0,1.045785,0.759255
2,25.0,2.098636,1.746835


That's it! Your dataset is jittered. The `jitter()` function can be used for any two columns that need to be jittered. Note that the function jitters in two dimensions, and in Datawrapper men are graphed at 2 on the horizontal axis, and women on 1 (plus or minus the jitter).

See the final result [here](https://www.datawrapper.de/_/w9650/).

\-30\-