# **THE DEGRADATION OF COMMUNICATION**
#### *Making data very very hard to read*


Using data categorizing politiciansâ€™ tweets by their target audience and content.
The full data set can be found [here]( https://www.kaggle.com/yamqwe/classification-of-pol-sociale ), and the transformed and modified one can be found [here](https://github.com/Brian-Masse/CSC630/blob/main/Group-task3/PART%20II/data/Political%20Tweet%20data.xlsx)

In [446]:
import pandas as pd
import altair as alt
import math

xls = pd.ExcelFile(
    "../PART II/data/Political Tweet data.xlsx"
)

data = pd.read_excel( xls, "misleading data" )
chart = alt.Chart(data)

### **THE DATA**

The data set we looked at collected thousands of tweets from various politicians and assigned them categoires. First was their target audience, or whether the tweet was intended for a **political** or **neutral** demographic.

It also assigned the contents of the tweet into 1 of 6 categories, **policy, attacking, information-based, mobilizing, support-based, constituency-based, or sharing generl media**

all transformations made to this data set will be explained later in context 

### **THE INFASTRUCTURE FOR CONFUSION**

First I imported / updated some of my old color classes and functions. Mainly this enables me to created RGB based colors, but then use them as hex codes, the main code that altair enables. 

This color class also allows me to generate a pellet comprised of a gradient betweeen two colors, with a specifed number of steps, which was specifically helpful for assigning the color ramp of the graphs in this project

In [447]:
class color:

    hex_values = [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"  ]

    def __init__(self, R, G, B):
        self.R = R
        self.G = G
        self.B = B

    def return_color_between(self, color2, perc):
        r_change = color2.R - self.R
        g_change = color2.G - self.G
        b_change = color2.B - self.B

        r = self.R + (r_change * perc)
        g = self.G + (g_change * perc)
        b = self.B + (b_change * perc)

        return color(r, g, b)

    def return_color_in(self, code):
        if code == "RGB":
            return "{} ({}, {}, {})".format(code, self.R * 255, self.G * 255, self.B * 255)
        if code == "HSB":
            return "{} ({}, {}, {})".format(code, self.R * 360, self.G * 100, self.B * 100)
        if code == "HEX":
            hex1 = self.return_hex(self.R)
            hex2 = self.return_hex(self.G)
            hex3 = self.return_hex(self.B)
            return "#{}{}{}{}{}{}".format( hex1[0], hex1[1], hex2[0], hex2[1], hex3[0], hex3[1] )

    def return_hex(self, component ):
        rounded = math.floor(component / 16)
        remainder = (component / 16) - rounded

        hex1 = self.hex_values[rounded]
        hex2 = self.hex_values[round(remainder * 16)]
        return (hex1, hex2)

    def return_color_grad(self, second_color, steps):
        colors = []
        for step in range(0, steps):
            interval = step / (steps - 1)
            color = self.return_color_between(second_color, interval)
            colors.append(color.return_color_in("HEX"))
        return colors



this pallet class was created specificaly for this project, to allow me to quickly style all the graphs similarly. It mainly serves to transport and group mainy hex codes into the various outlets where I need to assign colors; it does not transform the data it simply passes it along

In [448]:
class pallet:
    def __init__(self, grad, primary_color, secondary_color, background, secondary_background):
        self.grad = grad
        self.primary_color = primary_color
        self.secondary_color = secondary_color
        self.background = background 
        self.secondary_background = secondary_background

color2 = color( 120, 38, 72 )
color1 = color( 164, 109, 168 )

grad = color1.return_color_grad( color2, 7 )

prim = color2.return_color_in("HEX")
sec =  color(148, 52, 92).return_color_in("HEX")
back = color(242, 225, 206).return_color_in("HEX")
sec_back = color(204, 185, 163).return_color_in("HEX")

pallet = pallet( grad, prim, sec, back, sec_back  )

these functions are used by each graphs to apply certain styling elemetns, meaning I never needed to repeat a style element, which is a problem that I had faced in some of the early Altair graphs that I made

in each of these function they take in a pellet class, allowing me to easily change the pallet for an individual graph, or change the pallet for all the graph at once

In [449]:

def apply_bar_styling(chart, pallet):
    return chart.configure_bar(
        cornerRadius=5
    ).configure_range (
        category=pallet.grad
    )

def apply_text(chart, pallet):
    return chart.configure(
        font="DINCondensed-Bold",
        background=pallet.background
    ).configure_axis(
        labelFontSize=13,
        titleFontSize=25,
        labelColor=pallet.secondary_color,
        titleColor=pallet.primary_color,
        domainColor=pallet.secondary_background,
        tickColor=pallet.secondary_background,
        gridColor=pallet.secondary_background
    ).configure_title(
        fontSize=30,
        color=pallet.primary_color 
    )

def apply_legend(chart, pallet):
    return chart.configure_legend(
        cornerRadius=10,
        fillColor=pallet.secondary_background,
        labelColor=pallet.primary_color,
        labelFontSize=13,
        padding=10,
        titleColor=pallet.primary_color,
        titleFontSize=15


    )

def apply_all(chart, pallet):
    tmp_chart = apply_text(chart, pallet)
    tmp_chart = apply_legend(tmp_chart, pallet)
    tmp_chart = apply_bar_styling(tmp_chart, pallet)
    return tmp_chart

### **LET THE CONFUSION BEGIN!**

When I started to look for ways to present this data in a misleading ways, I thought it would be fun to first identify actual patterns that exist, and then attempt to convice viewers of my visualization, the exact opposite.

This brought me to my first graph, which simply, and accuratley displayed the reltionship between target demographic and the category of the tweet. The graph type that I used also enables the comparrison of these categories between eachother, as well as the difference in total number of tweets in each target audience. Although this is a lot of information, with clear labels I think this is an effective graph that properly demonstrates an array of interesting realtionships.

**But that is not what we are here for.**

In [450]:
unequal = chart.mark_bar().encode (
    alt.X( "bias-trun", title="Target Audience" ),
    alt.Y( "value", title="Number of Tweets" ),
    alt.Color( "category")
).properties(
    title="The categorical breakdown of different target audiences"
)


apply_all(unequal, pallet)

The next graph displays a similar set of data, however, instead of displaying the number of tweets for each category, it looks at that number as a percentage of the total number of tweets in that target audience. Which despite being a mouthful, creates a **great lack of clarity** in three important ways

1. It now *appears* as if the number of tweets in the neutral and partisan audiences are equal, which is **certainly not the case**


2. Because the data the graph is converying is now a percent, viewers must first understand that it is indeed a percent, instead of naturally assuming the values are direclty the number of tweets in a category, but also they must understand what both the part and whole are before they can truly understand what they are reading! **oh-ho! how terrible and tragic!**


3. The labels now must try to explain this complex realtionship, so they have become longer and so *so* much worse. **:)**


In [451]:
equal = chart.mark_bar().encode(
    alt.X( "bias-trun", title="target Audience" ),
    alt.Y( "perc:Q", title="number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The categorical breakdown of different target audiences, normalized"
)

apply_all(equal, pallet)

At this point I realized I wanted to make it seem that attackign tweets were the most popular, when in actuality they are barely a category at all. To go about this, I was consdierng many ways of presenting the data, but all of them seemed **just too readable!** So instead settled for a good old fashion data transformation:

#### **ENTER IN THE MAGICAL 1 / X**

I decided what better way to make the data and trends of a graph more clear than to just invert all the numbers!

And so, that is exactly what I did! The following graph, an "inversion" of the first graph,  does not just present the data relationships completley backward, but it **certainly does not make it clear what transformation I used to get the data to that point :))**



In [452]:
unequal_misleading = chart.mark_bar().encode (
    alt.X( "bias-trun", title="target Audience" ),
    alt.Y( "invert", title="Number of tweets in a category, inverted" ),
    alt.Color( "category" )
).properties(
    title="The inverted categorical breakdown of different target audiences"
)

apply_all(unequal_misleading, pallet)

and to top it all off, I though why not just Frankenstien the last two graphs together, so I had the inverted number of tweets of a certain category, in a certain target audience as a percentage out of the sum of all the inverted numbers of tweets in these categories. **PERFECT**

secretive: check
miselading and overly long titles: check
making the wrong assumptions easy and intuitive to assume: check
difficult to get the actual data from the graph: CHECK!

This graph perfectly combines all the terrible ppractices of rendering data into one, horrible, impossiblly challenging data visualization. **ENJOY!**

In [453]:
equal_misleading = chart.mark_bar().encode (
    alt.X( "bias-trun", title="target Audience"  ),
    alt.Y( "invert-perc", title="inverted number of tweets in a category over total tweets"),
    alt.Color( "category" )
).properties(
    title="The inverted categorical breakdown of different target audiences, normalized"
)

apply_all(equal_misleading, pallet)