<a href="https://colab.research.google.com/github/JeremyJChu/misleading_graphic/blob/main/output/notebooks/misleading_graphic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Making a Misleading Graphic Targetting Immigrants**

Author: Jeremy Chu (An Asian Immigrant)

In [1]:
!pip install geopandas

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/d7/bf/e9cefb69d39155d122b6ddca53893b61535fa6ffdad70bf5ef708977f53f/geopandas-0.9.0-py2.py3-none-any.whl (994kB)
[K     |████████████████████████████████| 1.0MB 5.6MB/s 
[?25hCollecting pyproj>=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/b1/72/d52e9ca81caef056062d71991b0e9b1d16af042245627c5d0e4916a36c4f/pyproj-3.0.1-cp37-cp37m-manylinux2010_x86_64.whl (6.5MB)
[K     |████████████████████████████████| 6.5MB 12.5MB/s 
[?25hCollecting fiona>=1.8
[?25l  Downloading https://files.pythonhosted.org/packages/47/c2/67d1d0acbaaee3b03e5e22e3b96c33219cb5dd392531c9ff9cee7c2eb3e4/Fiona-1.8.18-cp37-cp37m-manylinux1_x86_64.whl (14.8MB)
[K     |████████████████████████████████| 14.8MB 264kB/s 
Collecting click-plugins>=1.0
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Collecting mun

In [2]:
# Importing library and packages

import pandas as pd
import numpy as np
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

In [3]:
# Reading data
df = pd.read_csv("https://raw.githubusercontent.com/JeremyJChu/misleading_graphic/main/inputs/data/14100083.csv")
df2 = pd.read_csv('https://raw.githubusercontent.com/JeremyJChu/misleading_graphic/main/inputs/data/06_neighbourhood-profiles_cleaned%20copy.csv')

# Preamble



No racist intent is behind the making of this graphic. This was an experiment to test how legitimate data could transform into racist propaganda. The audience was set to be an older, more racist group of Canadians who have already subscribed to a more anti-immigration rhetoric following COVID-19. Please do not redistribute the graph without permission from the author. 

**Data**

Two datasets were used for this project.

1. Labour force characteristics by immigrant status, annual from Statistics Canada [Link](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410008301)

2. Neighbourhood Profiles from Open Data Toronto [Link](https://open.toronto.ca/dataset/neighbourhood-profiles/)

# Section 1: First Graph

In [4]:
# Filtering data to only show Toronto 
to_df = df[df['GEO'] == 'Toronto, Ontario']
to_df = to_df[to_df['Age group'] == '15 years and over']

In [5]:
# My data explaratory steps. Checking to see what labour force characteristics there are.
print(to_df['Labour force characteristics'].unique())

['Population' 'Labour force' 'Employment' 'Full-time employment'
 'Part-time employment' 'Unemployment' 'Not in labour force'
 'Unemployment rate' 'Participation rate' 'Employment rate']


In [6]:
# As I was uncertain as to whether employment/unemployment rate or raw numbers would be more misleading, I made dataset subsets of each.
unemr_df = to_df[(to_df['UOM'] == 'Percentage') & (to_df['Labour force characteristics'] == 'Unemployment rate')]
emr_df = to_df[(to_df['UOM'] == 'Percentage') & (to_df['Labour force characteristics'] == 'Employment rate')]
unem_df = to_df[to_df['Labour force characteristics'] == 'Unemployment']
em_df = to_df[to_df['Labour force characteristics'] == 'Employment']

In [7]:
# Here are my initial exploration drafts
# Unemployment Rate Graph
fig = px.line(unemr_df, x = "REF_DATE", y = "VALUE", color="Immigrant status")
fig.show()

**Graph 1 Key Takeaway:** Higher unemployment numbers for Born in Canada vs Landed Immigrants

In [8]:
# Employment Rate Graph
fig2 = px.line(emr_df, x = "REF_DATE", y = "VALUE", color="Immigrant status")
fig2.show()

**Graph 2 Key Takeaway:** Vastly higher employment rate of Canadians over Landed imigrants. 

In [9]:
# Unemployment Numbers Graph
fig3 = px.line(unem_df, x = "REF_DATE", y = "VALUE", color="Immigrant status")
fig3.show()

**Graph 3 Key Takeaway:** Similar unemploment numbers between Canadians and immigrants

In [10]:
# Employment Numbers Graph
fig4 = px.line(em_df, x = "REF_DATE", y = "VALUE", color="Immigrant status")
fig4.show()

**Graph 4 Key Takeaway:** Immigrants show higher employment numbers than Canadians

**Observation Notes**

Taking inspiration from Faith Goldy, I wanted to paint immigrants in a negative light somehow. Ideally, I could go about it by having immigrants overtake Canadians in some way, or show that they were having a better time during COVID-19. Honestly, the latter has been debunked by research showing minorities being disproportionately affected by COVID and has been used in a different "minorities spread COVID" narrative instead. Since the topic is immigration/immigrants, I could not go that route but instead opted for the tried and true "immigrants are taking our jobs" rhetoric. 

Having a clearer racist topic in mind, I set my task into selecting the proper subset of data to work with. Ultimately I decided to go with employment numbers because I felt presenting data in a ratio form was too fair. I wanted to show disproportion in numbers rather than single digit difference in rates. And since immigrants had more employment numbers than Canadians, I could make a scandal out of it. In realtiy of course, the employment rate of Canadians is much higher than immigrants, but no one needs to know that. So employment numbers it was. 

In [11]:
# More cleaning, getting rid of the 'Total Population' row.
em_df = em_df[em_df['Immigrant status'] != 'Total population']

**Making the Employment Graph**

The code below details my creation of the employment numbers graph. I narrowed down the data points to only show employment numbers between Landed immigrantes and Born in Canada. The code also contains my attempts at making the values for landed immigrants a negative value to see how it would appear as a up and down bar chart. That didn't go over well and was subsequently removed from even my draft. 

My first iteration was a simply line graph comparing the 2. It did its job in showing that there was a gap in employment numbers between landed immigrants and those born in Canada, but it was a bit too simple for my liking. Of course, my goal was to incite more or less the older, racist population so the graph couldn't be too complex either. So I decided to go for a bar chart instead. There would be more things on the screen, and everyone knows how to read a bar chart.

In [12]:
# Data Massaging
fig5_df = em_df[(em_df['Immigrant status'] == 'Born in Canada') | (em_df['Immigrant status'] == 'Landed immigrants')]
fig5_df_im = em_df[(em_df['Immigrant status'] == 'Landed immigrants')]
fig5_df_can = em_df[(em_df['Immigrant status'] == 'Born in Canada')]
fig5_df_imnegative = fig5_df_im.copy()
fig5_df_imnegative["VALUE"] = -abs(fig5_df_imnegative["VALUE"])

# Creating a simple line chart
fig5 = go.Figure()
fig5.add_trace(go.Scatter(x=fig5_df_im["REF_DATE"], y=fig5_df_im["VALUE"], mode='lines',
        #name=labels[i],
        line=dict(color="red"),
        connectgaps=True,
    ))
fig5.add_trace(go.Scatter(x=fig5_df_can["REF_DATE"], y=fig5_df_can["VALUE"], mode='lines',
        #name=labels[i],
        line=dict(color="blue"),
        connectgaps=True,
    ))

fig5.show()

**Draft 1:** Not bad, can see that immigrants gradually overtook Canadians. But it's lacking the impact I wanted. 

**Final Draft Part 1** 

Will explain why part 1 later. For better impact, as mentioned, I transitioned my data into a bar chart instead. I then manually altered the colour scale so that the nice blue for Canadians was dimming in colour as time passed while immigrants became a deeper and deeper red. To further hammer the point home, I overlayed my first line graph onto the bars so that everyone could see the trajectory of employment numbers at a glance. I got rid of all the grid lines because I didn't want my audience examining the numbers closely.

For the y axis, I first debated whether I wanted to keep the number ticks at all. I experimented with removing them and while it becomes even more misleading, I felt it became too fake. Rather than an unintenional misleading graph, it was a too "in-your-face" attempt in misleading the public and lacked the sense of subtlety. I wanted a more official looking graph so I kept then numbers in.

That didn't stop me from tweaking the y axis however. I settled on the range 1300 to 1800 because I could shrink the early years to be a much shorter bar, and then have the rise in immigrants be a much larger deal than it actually was. Coulded with the increasing red and diminishing blue, I wanted my audience to fixate on the end. Take a look below at what the result was. 

In [13]:
# Creating my own colour scale
immigrant_invasion = ['#f8cece','#f8cece','#f8cece','#f8cece','#f8cece','#ee7f81','#ee7f81','#ee7f81','#ee7f81','#e22529','#e22529','#e22529','#e22529','#e22529','#e22529']
canadian_downfall = ['#231ac5','#231ac5','#231ac5','#231ac5','#231ac5','#231ac5','#231ac5','#7872ec','#7872ec','#7872ec','#7872ec','#7872ec','#7872ec','#bebbf6','#bebbf6']

# Creating the graph
fig6 = go.Figure()

# First the bar charts, separately in their own trace so I could adjust their colours
fig6.add_trace(go.Bar(x = fig5_df_can["REF_DATE"],
                      y = fig5_df_can["VALUE"],
                      marker_color = canadian_downfall,
                      name='Immigrants',
                      showlegend=False
                      #width = 0.3
                      ))
fig6.add_trace(go.Bar(x = fig5_df_im["REF_DATE"],
                      y = fig5_df_im["VALUE"],
                      marker_color = immigrant_invasion,
                      name='Canadians',
                      showlegend=False
                      #width = 0.3
                      ))

# Then adding in the line graph as a trend line
fig6.add_trace(go.Scatter(x=fig5_df_im["REF_DATE"], y=fig5_df_im["VALUE"], mode='lines',
        name='Immigrants',
        line=dict(color="red"),
        connectgaps=True,
        showlegend=False
    ))
fig6.add_trace(go.Scatter(x=fig5_df_can["REF_DATE"], y=fig5_df_can["VALUE"], mode='lines',
        name='Canadians',
        line=dict(color="blue"),
        connectgaps=True,
        showlegend=False
    ))

# Squeezing the y axes so that differences are more pronounced
fig6.update_yaxes(range=[1300, 1800],
                  showgrid = False,
                  #visible=False, 
                  showticklabels=True,
                  title_text = "Employment")

fig6.update_xaxes(dtick="Y1",
                  tickformat="Y",)

fig6.update_layout(#paper_bgcolor='#6b6266',
                   plot_bgcolor='white'#,
                   #title = "Significantly More Immigrants are Employed than Canadians during COVID<br><br>" +
                   #"From 2015 onwards, new immigrants have overtaken hardworking Canadians in employment numbers.<br>When COVID-19 hit, legitimate Canadians were struggling to find jobs. Immigrants enjoyed a 16% higher chance of finding a job than hardworking Canadians"
                   ) 
fig6.update_traces(aaxis_showgrid=False, selector=dict(type='carpet'))

# Add Image, StatsCan logo for legitimacy
fig6.add_layout_image(
    dict(
        source="https://raw.githubusercontent.com/JeremyJChu/misleading_graphic/main/inputs/images/STATS-Canada.jpeg",
        xref="paper", yref="paper",
        x=1, y=1.05,
        sizex=0.2, sizey=0.2,
        xanchor="right", yanchor="bottom"
    )
)

fig6.show()

**Post-Processing** 

You will notice the lack of titles or descriptions in the above graph. I went through a couple of iterations off screen to see whether the text capababilities of Plotly was good enough for my inflammatory purposes and I was unsatisfied. Perhaps in large due to my still relative inexperience towards plotly, I simply felt that post-processing the graph in Photoshop would be much easier and I could make it much more fun. 

It was then that I decided I wanted to make a sort of pamphlet ad style. While I debated on adding a lot more graphs, I decided I wanted 1 more. The next section will detail my data exploration and experiments for my second graph.

One last note. I decided to add in that little statistics canada logo on the top right to show that the data was from statscan. Just a little authority to back up the graph. After all, none of the data is fake, it's just that the interpretation is "unique". 

# Section 2: Second Graph

**Data** 

Data was taken from the Neighbourhood Profiles dataset from Open Data Toronto. [Link Here](https://open.toronto.ca/dataset/neighbourhood-profiles/)

The data transformation and cleaning was done prior to this project, in which I filtered the data to show income levels of economic units in each neighbourhood. The [code](https://colab.research.google.com/drive/1qyjOoCe31BdHIYs_GeAcDGiPZdu-qnbl?usp=sharing) has been linked if interested.

Basically I used the Organization for Economic Cooperation and
Development’s [online calculator](http://www.oecd.org/social/under-pressure-the-squeezed-middle-class-689afed1-en.htm), under the assumption that the average
household is 3.16 people large (based on the dataset), and split households into low and middle plus income.

In [18]:
fig6 = px.scatter(df2, x="percent_nol", y="percent China", trendline="ols")
fig6.update_xaxes(range=(0.4,0.8))
fig6.show()

# The higher percentage a neighbourhood has of Chinese immigrants, the less likely the neighbourhood is to speak English. 

**Graph 1 Key Takeaway:** My first inclination was to look at percentage of households who speak an official language versus the percentage of households taht come from China. Nothing really came out of it and I couldn't fit in into my story so I abandoned it.

In [15]:
fig7 = px.scatter(df2, x="percent_low", y="Americas", trendline="ols")
fig7.show()

**Graph 2 Key Takeaway:** I then wanted to flub "Americas" a bit so that it felt more "white immigrants" (even though it's mostly South America) and see whether I could make something out of it. Even though the more a neighbourhood has families from teh Americas the higher the percentage of low income households there are, it doesn't really fit well with the "American immigrant" narrative and I abandoned it as well.

In [16]:
fig8 = px.scatter(df2, x="low_income", y="Europe", trendline="ols")
fig8.show()

**Graph 3 Key Takeaway:** Like graph 2, this time I looked at immigrants from Europe. No substantial results and honestly, just wanted inflammatory enough.

**Graph 4. The Graph I landed on**

So finally I decided to use a combination percentage of households with mid plus income with percentage of Asian immigrants in a household. I get to flub the distinction a bit by covering all of Asia, and then show that Asian immigrants are well off. This would fit well into COVID and feed into the Trump China flu discourse. To make it even more obvious, I cut off the outliers on the side of the graph. So my graph went from this:

In [20]:
# Original Graph

fig9_original = px.scatter(df2, x="midplus_income", y="Asia", trendline="ols")
fig9_original.show()

To this:

In [17]:
fig9 = px.scatter(df2, x="midplus_income", y="Asia", trendline="ols", color="Asia",
                  color_continuous_scale=[(0, '#f8cece'), (0.05, '#f25563'), (1, '#660810')],
                  size = "Asia")
    
fig9.update_yaxes(range=[0, 11800],
                  showgrid = False,
                  #visible=False, 
                  showticklabels=False,
                  title_text = "Asian Immigrants")


fig9.update_xaxes(range=[2500, 25000],
                  title_text = "Number of Families Middle Class and Higher")

fig9.update_layout(#paper_bgcolor='#6b6266',
                   plot_bgcolor='white') 
fig9.update_traces(aaxis_showgrid=False, selector=dict(type='carpet'))
fig9.update(layout_coloraxis_showscale=False)
fig9.show()

By cutting off the edges and squeezing the y axis, I could make the upper trendline more pronounced. Then I did a skewed colour scale so that neighbourhoods with more Aisan immigrants were a darker shade of red. This time, I removed the entirety of the y axis ticks as the trendline was all I cared about showing. I kept the numbers of the x axis "Number of Families Middle Class and Higher" because that's what I wanted my audience to focus on, that Asian immigrants were well off.

# Section 3: Post-Processing

While I can't show my Photoshop process, here's what the end result looks like. 

[Link to graphic here](https://raw.githubusercontent.com/JeremyJChu/misleading_graphic/main/output/image/misleading_graphic.jpg)


Rather than legends, I colour coded the text to correspond to the graph to save space. Then of course, I bolded and increased the font size of inflammatory points to my audiences' eyes would be drawn to my big red immigrant phrases. Sprinkled throughout the pamphlet is StatsCan and the OpenData Toronto logo, giving the whole thing an air of legitimacy. 

Then considering I was targetting Asians, I turned the background yellow. I also redid the y axis text so that it fit the font of the rest of the pamphlet. 

Lastly, I did the cheesy white family pictures at the bottom because for some reason a lot of these propagandistic visuals use them and they somehow have an effect?

# Section 4: Discussion

Making this lie was harder than I thought. I found that I had to first decide on the story I wanted to tell, and then squish the data so that it fit the story. There were a lot more trial and error graphs than I could have shown in the code above, all of which were just experimenting to see which of the data points could show the most difference. 

After that, it was a matter of making sure you capture your audience's attention and muddle through the more questionable parts of your analysis. Give as little data as you can without seeming "fake" and appeal to emotions. I found that simplicity, for this case, taking into the account my audience, was the best choice. A simple line that moves up and another that moves down, along with big red numbers tell the story for me. The goal was not to tell something new, but capitalize on already present misinformation in my audience's minds and further cement their misguided beliefs. 

With anti-Asian sentiment higher than it has ever been, and COVID being heralded as the China flu by Trump, all it took was the right data and I coould tie the story into a Canadian context. The Faith Gouldy video helped too, as it showed that people were concerned about the growing immigrant population in Canada. Make it so that hardworking Canadians are hurt by this influx, and that immigrants are thriving more in COVID than them, paint COVID as an "immigrant pandemic", and I have a story that fits. 

This exercise was in part fun and in part disgusting. Being an Asian immigrant myself, sifting through racist media to find inspriation for the story was quite painful. Nonetheless it was fascinating to see how I could spin a story from real data interpreted wrongly, and how terrifying the effects could be. 