In [1]:
import pandas as pd

subreddits = pd.read_csv('reddit-politics-data/data/scores.csv')

In [2]:
subreddits.columns

Index(['community', 'age', 'age B', 'age neutral', 'affluence', 'gender',
       'gender B', 'gender neutral', 'partisan B', 'partisan B neutral',
       'partisan', 'partisan neutral', 'edginess', 'sociality', 'time'],
      dtype='object')

# Find The Most Political Subreddits

This analysis is in service of the need to seed a sampling of Reddit users from those who post on political subreddits. The approach is to identify political subreddits and find those who comment on their top submissions. This approach would be skewed if the partisan leaning of each subreddit wasn't also considered. The code below identifies the most political subreddits from Anderson and Waller's 2021 article.

## Which column measures Strength of Political Character?

It is going to be the partisan column where all values are greater than zero.

In [3]:
partisan_mask = list(subreddits.columns.str.contains('partisan'))
subreddits.loc[:, partisan_mask].describe()

Unnamed: 0,partisan B,partisan B neutral,partisan,partisan neutral
count,10006.0,10006.0,10006.0,10006.0
mean,0.02829,0.383592,0.005711,0.39244
std,0.091367,0.077514,0.075373,0.070149
min,-0.410791,0.138738,-0.345949,0.169756
25%,-0.028636,0.335984,-0.041398,0.346939
50%,0.027317,0.378829,0.009487,0.386434
75%,0.086356,0.4218,0.055559,0.429068
max,0.497002,0.813056,0.444171,0.760715


- "partisan neutral" and "partisan B neutral" are Anderson and Waller's measures of "political-ness" because they are the only partisan columns with values > 0.
- I'll see how they compare when ranking subreddits.

## Which subreddits are the most political?

Sort the political-ness columns in descending order and print the top 20 results for both measures.

In [4]:
# get top 20 most political subreddits from the two alternative measures
most_political_subreddits_a = subreddits.sort_values("partisan neutral", ascending=False).iloc[0:20].reset_index()
most_political_subreddits_b = subreddits.sort_values("partisan B neutral", ascending=False).iloc[0:20].reset_index()
most_political_subreddits = pd.concat([most_political_subreddits_a.iloc[:, 1], most_political_subreddits_b.iloc[:, 1]], axis=1)
most_political_subreddits.columns = ['Measure A', 'Measure B']
print("List of the Most Political Subreddits")
print(most_political_subreddits)

List of the Most Political Subreddits
                Measure A             Measure B
0                 Liberal          WayOfTheBern
1        askaconservative  Political_Revolution
2              NeverTrump      HillaryForPrison
3               democrats       Fuckthealtright
4           conservatives          OurPresident
5        liberalgunowners                 esist
6   EnoughLibertarianSpam     MarchAgainstTrump
7            Conservative        PoliticalHumor
8             gunpolitics             Trumpgret
9   ShitRConservativeSays              DNCleaks
10            progressive          the_meltdown
11             Republican         Impeach_Trump
12                 progun  BannedFromThe_Donald
13            republicans       media_criticism
14        NeutralPolitics            The_Donald
15                prolife        uncensorednews
16    PoliticalDiscussion   SandersForPresident
17            GunsAreCool       HillaryMeltdown
18            AskALiberal             jillstein
19

In [5]:
# get the most partisan subreddits
most_dem_subreddits_a = subreddits.sort_values("partisan", ascending=True).iloc[0:10].reset_index()
most_rep_subreddits_a = subreddits.sort_values("partisan", ascending=False).iloc[0:10].reset_index()
most_dem_subreddits_b = subreddits.sort_values("partisan B", ascending=True).iloc[0:10].reset_index()
most_rep_subreddits_b = subreddits.sort_values("partisan B", ascending=False).iloc[0:10].reset_index()
partisan_subreddits = pd.concat(
    [most_dem_subreddits_a.iloc[:, 1],
     most_rep_subreddits_a.iloc[:, 1],
     most_dem_subreddits_b.iloc[:, 1],
     most_rep_subreddits_b.iloc[:, 1]],
    axis=1)
partisan_subreddits.columns = ['Dem Measure A', 'Rep Measure A', 'Dem Measure B', 'Rep Measure B']
print("Lists of the Most Partisan-segregated Subreddits")
print(partisan_subreddits)

Lists of the Most Partisan-segregated Subreddits
           Dem Measure A    Rep Measure A     Dem Measure B     Rep Measure B
0              democrats     Conservative    hillaryclinton    uncensorednews
1  EnoughLibertarianSpam       The_Donald  themountaingoats        The_Donald
2         hillaryclinton    TrueChristian  GrassrootsSelect           sjwhate
3            progressive  NoFapChristians   FriendsofthePod   pussypassdenied
4        BlueMidterm2018         Mr_Trump       blackladies      The_Congress
5         EnoughHillHate       metacanada    ShitRedditSays  Physical_Removal
6    Enough_Sanders_Spam    conservatives    goldredditsays     RedditCensors
7       badwomensanatomy        new_right     FemmeThoughts   HillaryMeltdown
8                 racism       The_Farage   hamiltonmusical   hottiesfortrump
9            GunsAreCool       Christians        GamerGhazi       holdmyfries


The two measures of politicalness produce very different lists of subreddits, which Anderson and Waller explain as an artifact of the subreddits used to seed the two community embeddings. The "B" score reflects a more Donald Trump focused dimension of partisanship (and politicalness). For generalizability, In general it will probably best to not use the "B" dimension in order to avoid undue focus on a single candidate -- note that much of the data is from before Trump was a political candidate. Mamakos and Finkel also conclude that many new users joined Reddit as a result of the Trump campaign leading up to the 2016 Presidential election.

The partisan measures ("partisan" and "partisan B") measure how similar the membership of subreddits are to  the seed subreddits, which were used to determine a partisanship dimension by comparing communities that were similar except on the partisan dimension. The neutral measures, however, measure how political the discussions are that take place on each subreddit. As such, finding the most political conservative subreddit and pair it with the most political liberal subreddit, and then working down the list in this manner, will ensure relatively good representativeness of users from a wide range of the political spectrum.

### Next Step:

Test if how civil a community appears to be makes a difference in whether there are user who leave uncivil comments and posts:

Gather data from:
- Toxic Subreddits:
    - liberal: r/Fuckthealtright
    - conservative: r/

- Civil Subreddits:
    - liberal: r/democrats
    - conservative: r/Republican

Classify toxicity of user comments and make comparisons.

This is, in itself, a useful exercize and might produce good findings.