In [7]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("risk-work-inventory.csv")

Create an x and y to build the plot from. We're going to plot story points by risk score.

In [8]:
x = df['Story Points'].fillna(0)
x

0      2
1      5
2      8
3      5
4      5
5      2
6      5
7      5
8      3
9      8
10     3
11     2
12     3
13    13
14     2
15     2
16     5
17     3
18     3
19     2
20     8
21     5
22     5
23     2
24     2
25     5
26     5
27    13
28     8
29     3
30     3
31     5
32     5
33     5
34     5
35     8
36     2
37     5
38    10
39     3
Name: Story Points, dtype: int64

In [9]:
y = df['Risk'].fillna(0)
y

0     1
1     3
2     2
3     2
4     2
5     1
6     1
7     1
8     1
9     3
10    3
11    1
12    3
13    3
14    1
15    1
16    2
17    3
18    2
19    1
20    3
21    3
22    1
23    1
24    1
25    1
26    2
27    3
28    1
29    1
30    1
31    2
32    2
33    2
34    2
35    2
36    3
37    1
38    1
39    2
Name: Risk, dtype: int64

To create the annotations, we'll need to assign a name to each dot. Since we're using a swarm plot, we really just have buckets... as many buckets as there are unique risk scores. So it doesn't matter which name goes to which dot as long as it's in the right bucket.

To get the list of names in the bucket for Risk == 3:

In [10]:
#ndf.loc[(ndf['Risk'] == 3.0)]
df[df['Risk'] == 3.0][['Feature Name']]

Unnamed: 0,Feature Name
1,blind brothers
9,compose lamp
10,copy opinion
12,determine boot
13,discover dinosaurs
17,feel discussion
20,judge slope
21,launch account
27,rate interest
36,split silver


The "buckets" I'm referring to are actually called [collections](https://matplotlib.org/3.2.1/api/collections_api.html) in matplotlib. 

In [11]:
%matplotlib ipympl


sw = sns.swarmplot(x, y)
sw.collections

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.collections.PathCollection at 0x1a269e71d0>,
 <matplotlib.collections.PathCollection at 0x1a25a50ad0>,
 <matplotlib.collections.PathCollection at 0x1a269e7b50>,
 <matplotlib.collections.PathCollection at 0x1a269e7f90>,
 <matplotlib.collections.PathCollection at 0x1a269e7ed0>,
 <matplotlib.collections.PathCollection at 0x1a26714490>]

We get the specific coordinates of each point in the bucket by calling get_offsets().

In [12]:
pts = np.sort(x.unique())

In [13]:
pts

array([ 2,  3,  5,  8, 10, 13])

In [14]:
np.argwhere(pts==13)

array([[5]])

Now, if we had a story point value of n, it would go to bucket...

In [15]:
def ptstobucket(n, x=x, sw=sw):
    pts = np.sort(x.unique())
    t_space = np.linspace(0, len(sw.collections)-1, len(pts))
    idxs = np.argwhere(pts==n)
    if len(idxs) > 0:
        return t_space[idxs[0]]

In [16]:
for n in pts:
    print(f"{n}: {ptstobucket(n)}")

2: [0.]
3: [1.]
5: [2.]
8: [3.]
10: [4.]
13: [5.]


In [17]:
for c in sw.collections:
    print(c.get_offsets().astype(np.float16))

[[-0.      1.    ]
 [-0.0882  1.    ]
 [ 0.0882  1.    ]
 [ 0.1764  1.    ]
 [-0.1764  1.    ]
 [ 0.2646  1.    ]
 [-0.2646  1.    ]
 [ 0.3528  1.    ]
 [-0.      3.    ]]
[[1.     1.    ]
 [0.9116 1.    ]
 [1.088  1.    ]
 [1.     2.    ]
 [0.9116 2.    ]
 [1.     3.    ]
 [0.9116 3.    ]
 [1.088  3.    ]]
[[2.    1.   ]
 [1.912 1.   ]
 [2.088 1.   ]
 [1.823 1.   ]
 [2.176 1.   ]
 [2.    2.   ]
 [1.912 2.   ]
 [2.088 2.   ]
 [1.823 2.   ]
 [2.176 2.   ]
 [1.735 2.   ]
 [2.264 2.   ]
 [1.647 2.   ]
 [2.    3.   ]
 [1.912 3.   ]]
[[3.    1.   ]
 [3.    2.   ]
 [2.912 2.   ]
 [3.    3.   ]
 [2.912 3.   ]]
[[4. 1.]]
[[5.   3.  ]
 [4.91 3.  ]]


Create a translation function that will accurately convert offsets into pts values.

Determine the bucket that is closest to the given value by calculating the distance between each bucket and the value and then taking the lowest distance.

In [18]:
def offsettopts(n):
    buckets = np.linspace(0,len(sw.collections)-1,len(sw.collections))
    distances = np.abs(buckets - n)
    bucket = distances.argmin()
    return pts[bucket]

In [19]:
offsettopts(0)

2

In [20]:
offsettopts(6)

13

In [21]:
for c in sw.collections:
    for (i,j) in c.get_offsets():
        print(f"{i}: {offsettopts(i)}")

-2.220446049250313e-16: 2
-0.08820564516129048: 2
0.08820564516129026: 2
0.17641129032258052: 2
-0.17641129032258096: 2
0.26461693548387055: 2
-0.2646169354838712: 2
0.3528225806451608: 2
-2.220446049250313e-16: 2
1.0: 3
0.9117943548387095: 3
1.08820564516129: 3
1.0: 3
0.9117943548387095: 3
1.0: 3
0.9117943548387095: 3
1.08820564516129: 3
1.9999999999999996: 5
1.911794354838709: 5
2.08820564516129: 5
1.8235887096774182: 5
2.1764112903225805: 5
1.9999999999999996: 5
1.911794354838709: 5
2.08820564516129: 5
1.8235887096774182: 5
2.1764112903225805: 5
1.7353830645161277: 5
2.264616935483871: 5
1.6471774193548372: 5
1.9999999999999996: 5
1.911794354838709: 5
3.0: 8
3.0: 8
2.911794354838709: 8
3.0: 8
2.911794354838709: 8
3.999999999999999: 10
4.999999999999999: 13
4.911794354838709: 13


So, for each offset, we can assign the first name that fits the criteria and then remove it from the list. We'll take the first one by using the head() method and remove it by using drop().

The criteria are... given that (i, j) = offset:
* `offsettopts(i) = pts`
* `j = risk`

In [23]:
j = 1
i = 1.102906586021505
names = df[(df['Story Points'] == offsettopts(i)) & (df['Risk'] == j)][['Feature Name']]
names

Unnamed: 0,Feature Name
8,collect chance
29,report chicken
30,represent powder


In [24]:
names.head(1).values.all()

'collect chance'

We'll call our lookup table, which will be keyed by point coordinate, nlookup.

In [27]:
nlookup = {}
df = df.fillna(0)
for c in sw.collections:
    for (i, j) in c.get_offsets().astype(np.float16):
        name = df[(df['Story Points'] == offsettopts(i)) & (df['Risk'] == j)][['Feature Name']].head(1)
        print(f"Adding {name.values.all()} for {i},{j}")
        nlookup[(i,j)] = name.values.all()
        df = df.drop(name.index)


Adding afford wire for -0.0,1.0
Adding box north for -0.08819580078125,1.0
Adding describe stream for 0.08819580078125,1.0
Adding embarrass clover for 0.1763916015625,1.0
Adding escape apples for -0.1763916015625,1.0
Adding influence brush for 0.2646484375,1.0
Adding officiate week for -0.2646484375,1.0
Adding pick shirt for 0.352783203125,1.0
Adding split silver for -0.0,3.0
Adding collect chance for 1.0,1.0
Adding report chicken for 0.91162109375,1.0
Adding represent powder for 1.087890625,1.0
Adding flow fog for 1.0,2.0
Adding translate furniture for 0.91162109375,2.0
Adding copy opinion for 1.0,3.0
Adding determine boot for 0.91162109375,3.0
Adding feel discussion for 1.087890625,3.0
Adding build table for 2.0,1.0
Adding burn window for 1.912109375,1.0
Adding manage existence for 2.087890625,1.0
Adding prevent apparatus for 1.8232421875,1.0
Adding study snails for 2.17578125,1.0
Adding bolt mask for 2.0,2.0
Adding bow brush for 1.912109375,2.0
Adding estimate history for 2.08789062

At this point, the DataFrame should be empty. 

In [29]:
df

Unnamed: 0,Risk,Story Points,Feature Name


The nlookup dict should have an entry for every record that was in df.

In [30]:
len(nlookup)

40

Now we need to define our annotation function to lookup the name for the dot being hovered over by the mouse pointer. This code was adapted from the great example given by [ImportanceOfBeingEarnest](https://stackoverflow.com/users/4124317/importanceofbeingernest), who is apparently a member of the matplotlib dev team, in [this Stack Overflow post](https://stackoverflow.com/questions/7908636/possible-to-make-labels-appear-when-hovering-over-a-point-in-matplotlib).

In [31]:
%matplotlib ipympl

sw = sns.swarmplot(x, y)
annot = sw.annotate("", xy=(0,0), xytext=(20,20),textcoords="offset points",
                    bbox=dict(boxstyle="round", fc="w"),
                    arrowprops=dict(arrowstyle="->"))
annot.set_visible(False)
curdot = None

def update_annot(pc, ind):
    pos = pc.get_offsets()[ind["ind"][0]]
    annot.xy = pos
    i, j = pos.astype(np.float16)
    text = nlookup[(i,j)]
    annot.set_text(text)

def hover(event):
    global curdot
    vis = annot.get_visible()
    #print(event)
    for pc in sw.collections:
        (status, ind) = pc.contains(event)
        if status is True:
            update_annot(pc, ind)
            annot.set_visible(True)
            sw.figure.canvas.draw_idle()
            break

    if status is False and vis:
        annot.set_visible(False)
        sw.figure.canvas.draw_idle()
                
sw.figure.canvas.mpl_connect("motion_notify_event", hover)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

7