
---
# **THE DEGRADATION OF VISUALIZATION**
---
#### *Making data very **very** hard to read*


Using data categorizing politicians’ tweets by their target audience and content.
The full data set can be found [here]( https://www.kaggle.com/yamqwe/classification-of-pol-sociale ), and the transformed and modified one can be found [here](https://github.com/Brian-Masse/CSC630/blob/main/Group-task3/PART%20II/data/Political%20Tweet%20data.xlsx)

In [746]:
import pandas as pd
import altair as alt
import math

xls = pd.ExcelFile(
    "../PART II/data/Political Tweet data.xlsx"
)

data = pd.read_excel( xls, "misleading data" )
chart = alt.Chart(data)


---
### **THE DATA**
---

The data set we looked at collected thousands of tweets from various politicians and assigned each of them categories. First was their target audience, or whether the tweet was intended for a **political** or **neutral** demographic.

It also assigned the contents of the tweet into 1 of 6 categories, **policy, attacking, information-based, mobilizing, support-based, constituency-based, or sharing general media**

all transformations made to this data set will be explained later in context 


---
### **THE INFRASTRUCTURE FOR CONFUSION**
---

First I imported / updated some of my old color classes and functions. Mainly this enables me to create / define colors in the RGB space, while also being able to use them as hex codes, which is the only code that altair supports. 

This color class also allows me to generate a pellet comprised of a gradient between two colors, with a specified number of steps, which was specifically helpful for assigning the color ramp of the graphs in this project

In [747]:
class color:

    hex_values = [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"  ]

    def __init__(self, R, G, B):
        self.R = R
        self.G = G
        self.B = B

    def return_color_between(self, color2, perc):
        r_change = color2.R - self.R
        g_change = color2.G - self.G
        b_change = color2.B - self.B

        r = self.R + (r_change * perc)
        g = self.G + (g_change * perc)
        b = self.B + (b_change * perc)

        return color(r, g, b)

    def return_color_in(self, code):
        if code == "RGB":
            return "{} ({}, {}, {})".format(code, self.R * 255, self.G * 255, self.B * 255)
        if code == "HSB":
            return "{} ({}, {}, {})".format(code, self.R * 360, self.G * 100, self.B * 100)
        if code == "HEX":
            hex1 = self.return_hex(self.R)
            hex2 = self.return_hex(self.G)
            hex3 = self.return_hex(self.B)
            return "#{}{}{}{}{}{}".format( hex1[0], hex1[1], hex2[0], hex2[1], hex3[0], hex3[1] )

    def return_hex(self, component ):
        rounded = math.floor(component / 16)
        remainder = (component / 16) - rounded
        
        hex1 = self.hex_values[rounded]
        hex2 = self.hex_values[math.floor(remainder * 16)]
        return (hex1, hex2)

    def return_color_grad(self, second_color, steps):
        colors = []
        for step in range(0, steps):
            interval = step / (steps - 1)
            color = self.return_color_between(second_color, interval)
            colors.append(color.return_color_in("HEX"))
        return colors



this pallet class was created specifically for this project to allow me to quickly style all the graphs in the same way. The class mainly serves to transport and group hex codes into the various outlets where I need to assign colors; it does not transform the data in any way it simply passes along strings corresponding to colors

In [748]:
class pallet:
    def __init__(self, grad, primary_color, secondary_color, background, secondary_background):
        self.grad = grad
        self.primary_color = primary_color
        self.secondary_color = secondary_color
        self.background = background 
        self.secondary_background = secondary_background

color2 = color( 125, 85, 110 )
color1 = color( 164, 109, 168 )
color3 = color( 77, 33, 148 )

smooth_grad = color1.return_color_grad( color3, 7 )
grad = [ "#C2C6F2", "#F2D8CE", "#B6DBF2", "#F2E09D", "#AAF2E7", "#C8F29D", "#EDC1F3" ]

prim = color2.return_color_in("HEX")
sec =  color(148, 52, 92).return_color_in("HEX")
back = color(242, 225, 206).return_color_in("HEX")
sec_back = color(204, 185, 163).return_color_in("HEX")

prim_pallet = pallet( grad, prim, sec, back, sec_back  )

color1A = color( 182, 207, 183 )
color2A = color( 53, 79, 97 ) 
color3A = color( 25, 45, 59 )

gradA = color1A.return_color_grad(color2A, 7) 

primA = color3A.return_color_in("HEX")
secA = color2A.return_color_in("HEX")
backA = color(172, 185, 194).return_color_in("HEX")
sec_backA = color( 144, 157, 166 ).return_color_in("HEX")

palletA = pallet( gradA, primA, secA, backA, sec_backA )

these functions are used by each graph to apply certain styling elements, so I don’t have to repeat any formatting code, which is a problem that I had faced in some of the earlier Altair graphs that I created

Each function takes in a chart and pallet class. The chart is the one to be styled while the pallet specifies what colors to use on it. This allows me to easily change the pallet for an individual graph, or change the pallet for all graphs at once


In [750]:

def apply_bar_styling(chart, pallet):
    return chart.configure_bar(
        cornerRadius=5
    ).configure_range (
        category=pallet.grad,
        heatmap=pallet.grad
    ).configure_mark(
        cornerRadius=5,
        stroke=pallet.primary_color,
        strokeWidth=2
    )

def apply_text(chart, pallet):
    return chart.configure(
        font="DINCondensed-Bold",
        background=pallet.background
    ).configure_view(
        strokeWidth=0
    ).configure_axis(
        labelFontSize=13,
        titleFontSize=25,
        labelColor=pallet.secondary_color,
        titleColor=pallet.primary_color,
        # domainColor=pallet.secondary_background,
        domainColor="#FFFFFF00",
        tickColor=pallet.secondary_background,
        gridColor=pallet.secondary_background
    ).configure_title(
        fontSize=30,
        color=pallet.primary_color 
    )

def apply_legend(chart, pallet):
    return chart.configure_legend(
        cornerRadius=10,
        fillColor=pallet.secondary_background,
        labelColor=pallet.primary_color,
        labelFontSize=13,
        padding=10,
        titleColor=pallet.primary_color,
        titleFontSize=15
    )

def apply_all(chart, pallet):
    tmp_chart = apply_text(chart, pallet)
    tmp_chart = apply_legend(tmp_chart, pallet)
    tmp_chart = apply_bar_styling(tmp_chart, pallet)
    return tmp_chart

### **LET THE CONFUSION BEGIN!**

For this project, I will be exploring two series of visualizations, comprised of the various iterations I put them through to make them worse! When first considering what would make for the worst visualization, I found two keys categories, ine is the content orientation / presentation, and the other is the pure visuals of the graph. The later is fairly easy to tamper with; low contrast and poor colors already lead to a pretty horrendous visualization, so I spent most of the time focussed on the former. This lead me to the conclusion: to make the most misleading visualization, that presents wholly incorrect patterns, I must first identify the actual patterns that exist, and then attempt to convince viewers of my visualization, the exact opposite.

And so starts that journey:

---
### **ITERATION 1**
---

##### **VIS 1**

I thought it would be interesting to look at the relationship between the number of tweets in a certain category, and what their target audience is. After trying out some graph marks I settled on a stacked column chart, as it best conveys that each target audience is being broken down into content categories. This allows for the comparison of categories between each other, as well as the difference in total number of tweets in each target audience. Although this is a lot of information, with clear labels I think this is an effective graph that properly demonstrates an array of interesting relationships.

**But that is not what we are here for.**

In [751]:
vis1 = chart.mark_bar().encode (
    alt.X( "bias-trun", title="Target Audience" ),
    alt.Y( "value", title="Number of Tweets" ),
    alt.Color( "category")
).properties(
    title="The categorical breakdown of different target audiences"
)

apply_all(vis1, prim_pallet)


##### **VIS 2**

For this second visualization I searched for a mark that implicitly made the content of this visualization hard to read. While I dug around with the visual variables, color, in the form of a heat map, I found, was terrible for conveying numerical data and was difficult to compare to other cells if they were not adjacent, making this graph:
1. not quantitative
2. not really associative or selective
3. and with terrible length

I also find that the heatmap, especially for this data set, is very unintuitive, as the pattern the colors form means nothing, and all the axes are qualitative, which is typically not the case for heatmaps

**Off to a great start.**

In [752]:
unequal_heatMap = chart.mark_rect().encode (
    alt.X( "bias-trun", title="Target Audience" ),
    alt.Y( "category", title="Category of Tweet Content" ),
    alt.Color( "value", title="number of Tweets")
).properties(
    title="The categorical breakdown of different target audiences"
)

apply_all(unequal_heatMap, palletA)


---
### **REVISING THE INITIAL GRAPH**
---


##### **VIS 3**
While we talked with **Dakota and Claire** in class on Monday, they mentioned that although our first graph, (iteration 1, visualization 1), was readable, it was not initially super obvious what they were looking at, and so they suggested we mess around with our marks and styles to potentially make the visualization more intuitive. Though they did not offer any specific suggestions on how to accomplish this, I was compelled by the pi-charts they showed us, and wondered if that was a better way to display our data. 

Turns out: **They were super right!** 

I've found this visualization to be the most clear, and most intuitive, and after polling it with my family, a *highly* educated spectator, I got some more positive feedback for this visualization!

In [753]:
vis1 = chart.mark_arc().encode(
    # alt.X( "bias-trun", title="target Audience" ),
    alt.Theta( "perc-partisan:Q", title="number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The categories of partisan tweets"
)

vis2 = chart.mark_arc().encode(
    alt.Theta( "perc-neutral:Q", title="number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The categories of neutral tweets"
)

apply_all(vis1 | vis2, prim_pallet)


---
### **ITERATION 2**
---

The next iteration of these visualizations display a similar set of data, however, instead of displaying the number of tweets for each category, it looks at that number as a percentage of the total number of tweets in that target audience. Which despite being a mouthful, creates a **great lack of clarity** in two important ways

2.	Because the data the graph is conveying is now a percent, viewers must first understand that it is indeed a percent, instead of naturally assuming the values are directly the number of tweets in a category. Furthermore, they must also understand what both the part and whole are before they can understand what they are reading! **oh-ho! how terrible and tragic!**


3. The labels now must try to explain this complex relationship, so they have become longer  and so *so* much worse. **:)**

---
##### **VIS 1**

Transforming the data is most impactful for this visualization because of its ability to convey the total number of tweets in a target audience by the total height of a bar. However, by making all the internal categories percentages of the whole, they add up to 100%, instead of the total number of tweets. So, it now *appears* as if the number of tweets in the neutral and partisan audiences are equal, which is **certainly not the case**


In [754]:
vis1 = chart.mark_bar().encode(
    alt.X( "bias-trun", title="target Audience" ),
    alt.Y( "perc:Q", title="number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The categorical breakdown of different target audiences, normalized"
)

apply_all(vis1, prim_pallet)

##### **VIS 2**

Changing the data around does not do too much damage to this visualization, however it does make the labels on the heatmap legend feel even more arbitrary *( 58? what does that even mean )*, and the label, to avoid being truncated, must be reduced to such a complexity that it makes no sense.

**and this is just iteration 2!**

In [755]:
equal_heatmap = chart.mark_rect().encode(
    alt.X( "bias-trun", title="Target Audience" ),
    alt.Y( "category", title="Category of Tweet Content" ),
    alt.Color( "perc:Q", title="Tweet # in category / total tweets ")
).properties(
    title="The categorical breakdown of different target audiences, normalized"
)

apply_all(equal_heatmap, palletA)


---
### **ITERATION 3**
---

At this point I realized I had good graphs and was able to determine: **almost NO ONE posted angry tweets this day** which I felt was quite disappointing. **!HOWEVER!** through the magic of manipulation *we can change that,* and so I set out to make the number of attacking tweets seem like a majority! To go about this I played with many marks, orientations, and axis names, but in the end, all of them seemed **just too readable!** So instead I settled for a good old fashion data transformation:

#### **ENTER IN THE MAGICAL 1 / X**

I decided what better way to make the data and trends of a graph more clear than to just invert all the numbers!

And so, that is exactly what I did! This iteration, an "inversion" of the first,  does not just present the data relationships completely  backward, but it **certainly does not make it clear what transformation I used to get the data to that point :))**

---
##### **VIS 1**
The confusion is most evident with this visualization, as it is fairly common to make the assumption that "big bar = more thing". However, that is exactly the trap I want the reader to fall in, for it is actually the microscopic slivers that display the most tweets! **how lovely!**

In [756]:
vis1 = chart.mark_bar().encode (
    alt.X( "bias-trun", title="target Audience" ),
    alt.Y( "invert", title="Number of tweets in a category, inverted" ),
    alt.Color( "category" )
).properties(
    title="The inverted categorical breakdown of different target audiences"
)

apply_all(vis1, prim_pallet)

##### **VIS 2**

This visualization is largely unaffected because it was already ***so*** unintuitive from the start. But I suppose it is still confusing, once the reader relates dark color = higher number, to then understand the distinction that higher number != more tweets!

In [757]:
inverted_heatmap = chart.mark_rect().encode(
    alt.X( "bias-trun", title="Target Audience" ),
    alt.Y( "category", title="Category of Tweet Content" ),
    alt.Color( "invert:Q", title="# of tweets in a category, inverted")
).properties(
    title="The inverted categorical breakdown of different target audiences"
)

apply_all(inverted_heatmap, palletA)


---
### **GRAPH 4**
---

Finally, it was time to Frankenstein the last two iterations together, so I had the inverted number of tweets of a certain category, in a certain target audience as a percentage out of the sum of all the inverted numbers of tweets in these categories.**PERFECT**

**secretive:** check ✅

**misleading and overly long titles:** check ✅

**making the wrong assumptions easy and intuitive to assume:** check ✅

**difficult to get the actual data from the graph:** CHECK! ✅✅✅✅


This graph perfectly combines all the terrible practices of rendering data into one, horrible, impossibly challenging data visualization.

In [758]:
vis1 = chart.mark_bar().encode (
    alt.X( "bias-trun", title="target Audience"  ),
    alt.Y( "invert-perc", title="inverted number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The inverted categorical breakdown of different target audiences, normalized"
)

apply_all(vis1, prim_pallet)

In [759]:
inverted_equal_heatmap = chart.mark_rect().encode(
    alt.X( "bias-trun", title="Target Audience" ),
    alt.Y( "category", title="Category of Tweet Content" ),
    alt.Color( "invert-perc", title="inverted tweet # in category / total")
).properties(
    title="The inverted categorical breakdown of different target audiences, normalized"
)

apply_all(inverted_equal_heatmap, palletA)


---
### **REVISED VISUALIZATION**
---

##### **VIS 3**

As mentioned previously, the discussion that we had with Dakota and Claire led me to realize that the original, stacked bar chart was not the most intuitive way to understand / make connections with the data, so I ended up creating a pi-chart to make it more readable. Seeing as how much of the misleading thus far has been about trying to make understanding the wrong relationships and patterns more intuitive, I thought I would revisit the pi-chart with this very misleading, inverted set of data to hopefully maximize the confusion. 

***Upon looking at it, I think it is innocently deceiving, which is truly wonderful!***

In [760]:
vis1 = chart.mark_arc().encode(
    # alt.X( "bias-trun", title="target Audience" ),
    alt.Theta( "invert-perc-partisan:Q", title="number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The inverted categorical breakdown of partisan tweets"
)

vis2 = chart.mark_arc().encode(
    alt.Theta( "invert-perc-neutral:Q", title="number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The inverted categorical breakdown of neutral tweets"
)

apply_all(vis1 | vis2, prim_pallet)

and last but *certainly* not least: a fresh coat of the worst paint possible is the cherry on top!

So I just went ahead and mixed the least contrasty-colors, and made all the color ranges tiny, to make sure that no reader can even attempt to understand this nightmare :)

***ENJOY :)***

In [761]:
gradColor = color( 235, 196, 152 )
gradColor2 = color( 176, 142, 104 )
grad = gradColor.return_color_grad(gradColor2, 7)

test = color(217, 197, 173).return_color_in("HEX")
sec =  color(217, 197, 173).return_color_in("HEX")
back = color(242, 225, 206).return_color_in("HEX")
sec_back = color(204, 185, 163).return_color_in("HEX")

prim_pallet = pallet( grad, test, sec, back, sec_back  )

gradColor = color( 36, 44, 69 )
gradColor2 = color( 42, 56, 99 ) 

grad = gradColor.return_color_grad(gradColor2, 7) 

prim = color( 184, 196, 204 ).return_color_in("HEX")
sec = color( 184, 196, 204 ).return_color_in("HEX")
back = color(172, 185, 194).return_color_in("HEX")
sec_back = color( 144, 157, 166 ).return_color_in("HEX")

In [762]:
apply_all(vis1, prim_pallet) 

In [763]:
prim_pallet = pallet( grad, prim, sec, back, sec_back )
apply_all(inverted_equal_heatmap, prim_pallet)