# Analyzing Twitter API data of "withheld" accounts

## Step 1: Load the data we've collected on withheld accounts

In [1]:
import pandas as pd

In [2]:
accounts = pd.read_csv("../data/withheld-accounts.csv")
len(accounts)

1714

In [3]:
accounts.head().T

Unnamed: 0,0,1,2,3,4
user_id_numfix,'14613185,'17232910,'19063025,'19813934,'22356335
user_id,14613185,17232910,19063025,19813934,22356335
screen_name,iEvolutiOne,sechapman,sondakika_haber,erdemgurkan71,Salzig2010
bio,Gab- iEvolutiOne #Anon3R #G8Links #AntiGentili...,#SouthernPresbyterians My only allegiances are...,Bizi takip edin haberiniz olsun! #SonDakika #H...,Vicdanın yansıması... Eğriye eğri... Doğruya d...,ich bin #burkaphob. im urlaub hier burkinifrei...
withheld_category,withheld,not_withheld,withheld,withheld,withheld
withheld_in_countries,DE,,TR,TR,DE
withheld_ever,DE,DE,TR,TR,DE
followers_count,8738,1659,42452,46106,222
following_count,8894,3223,21,150,347
signup_date,2008-05-01,2008-11-07,2009-01-16,2009-01-31,2009-03-01


In [4]:
first_observations = pd.read_csv("../data/first-observations.csv")
first_observations.head()

Unnamed: 0,user_id,country,fetched_at
0,14613185,DE,2017-10-02
1,17232910,DE,2017-10-02
2,19063025,TR,2017-10-02
3,19813934,TR,2017-10-02
4,22356335,DE,2017-10-02


## Step 2: Calculate basic, overall metrics

### Per-country withholdings

Treating multi-country withholdings separately, so no double-counting:

In [5]:
accounts["withheld_ever"].value_counts()\
    .sort_values(ascending=False)\
    .to_frame("num_accounts")

Unnamed: 0,num_accounts
TR,721
DE,639
FR,141
DE + FR,118
RU,78
IN,11
BR,2
GB,2
FR + GB,1
DE + FR + GB,1


Abbreviations:

- TR: Turkey
- DE: Germany
- FR: France
- RU: Russia
- IN: India
- GB: Great Britain / United Kingdom
- BR: Brazil

On a country-by-country basis, which double-counts users who've been withheld in multiple countries:

In [6]:
first_observations["country"].value_counts()\
    .sort_values(ascending=False)\
    .to_frame("num_accounts")

Unnamed: 0,num_accounts
DE,758
TR,721
FR,261
RU,78
IN,11
GB,4
BR,2


Latest account status, as of early January:

In [7]:
accounts["withheld_category"].value_counts()

withheld           1428
inactive            193
media_violation      84
not_withheld          9
Name: withheld_category, dtype: int64

## Step 3: Identify the most-followed accounts

Overall:

In [8]:
accounts_ranked = accounts\
    .assign(rank = lambda x: x["followers_count"].rank(ascending = False).astype(int))\
    .sort_values("rank")\
    .set_index("rank")[[
        "screen_name",
        "bio",
        "withheld_category",
        "withheld_in_countries",
        "withheld_ever",
        "followers_count"
    ]].fillna("")

accounts_ranked.head(10)

Unnamed: 0_level_0,screen_name,bio,withheld_category,withheld_in_countries,withheld_ever,followers_count
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Gurmeetramrahim,Spiritual Saint/Philanthropist/Versatile Singe...,withheld,IN,IN,3564377
2,PeriscopeCo,Explore the world through someone else's eyes....,withheld,TR,TR,1406417
3,ekremdumanli,"Gazeteci/ Journalist/ [For my English account,...",withheld,TR,TR,540889
4,Enes_Kanter,{Hizmetin Hizmetkârı} #Live4Others 🗽 #EK00 htt...,withheld,TR,TR,532832
5,TheRedHack,"Madem sonsuza dek yaşayamayacağız, o vakit ist...",withheld,TR,TR,530722
6,sosyalpencere,#HiçDurmadanYürüyeceksiniz Sosyal Pencere #sos...,withheld,TR,TR,456502
7,derasachasauda,Confluence of All Religions - A Socio-Spiritua...,withheld,IN,IN,363565
8,mceutv,MC EU TV | Facebook: https://t.co/qVDj5qZNgg +...,withheld,TR,TR,316495
9,HARAMZADELER333,Yolsuzluk İhbar: haramzadeihbar@gmail.com / Bl...,withheld,TR,TR,290642
10,Herkul_Nagme,Bu sayfa diktatörlük tarafından Türkiye’de eng...,withheld,TR,TR,281023


Among multi-country withholdings:

In [9]:
accounts_ranked[
    accounts_ranked["withheld_ever"].str.contains(r" \+ ")
].head(10)

Unnamed: 0_level_0,screen_name,bio,withheld_category,withheld_in_countries,withheld_ever,followers_count
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
22,AmyMek,"God, Family & Country; Sports Fitness & Vegan;...",withheld,DE + FR,DE + FR,207125
92,Third_Position,Third Position is a News/Journo network focusi...,media_violation,everywhere,DE + FR,35045
104,Ann__Kelly,European rights activist.⛄️You are determined ...,withheld,DE + FR,DE + FR,28625
105,offensive_image,Humor so dark it should be shot,inactive,DE + FR,DE + FR,27732
137,offensivemem3s_,DM me memes - offensive - everything is just f...,withheld,DE + FR,DE + FR,17591
149,VeryNiceNAZI,Nationalist Anti-Zionis✞ Independent 卐 NAZI 卐 ...,inactive,DE + FR,DE + FR,15710
163,zyklonbeast,Keeper of the fire ⍯ ɛquality is a false gɵɗ ⍯...,inactive,DE + FR,DE + FR,13339
164,ANP14,The American Nazi Party is America's premier 2...,inactive,DE + FR,DE + FR,13338
183,Nature_and_Race,National Socialism is the rebirth of mankind's...,withheld,DE + FR,DE + FR,10757
191,Millennial_Matt,) ) )インナーヘブン( ( (,inactive,DE + FR,DE + FR,9978


## Step 4: Examine language used in profile bios

### Bios mentioning "withheld" 

In [10]:
accounts_mentioning_withheld = accounts_ranked[
    accounts_ranked["bio"].str.contains(r"withheld", case=False, na=False)
][[ "screen_name", "bio", "withheld_category"]]

print("{} accounts".format(len(accounts_mentioning_withheld)))

105 accounts


The vast majority of these bios use a similar template:

In [11]:
accounts_mentioning_withheld["bio"].value_counts().head()

This account has been withheld in: Turkey.                                                                                                                69
This account has been withheld in: Germany.                                                                                                               19
This account has been withheld in: France.                                                                                                                 6
This account is »withheld« in Germany = censored. "No one is more hated than he who speaks the truth." (Plato)                                             1
Nach dem blocken von @NSArschloecher wird jetzt richtig gepöbelt! #FCKISLM @aiman_Mazyek ausweisen - Pro Israel 🇮🇱 WITHHELD IN GERMANY FREE IN EUROPE!     1
Name: bio, dtype: int64

In [12]:
accounts_mentioning_withheld["withheld_category"].value_counts()

withheld           99
inactive            5
media_violation     1
Name: withheld_category, dtype: int64

Examples:

In [13]:
accounts_mentioning_withheld.head()

Unnamed: 0_level_0,screen_name,bio,withheld_category
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
60,SosyalAnadolu,Your account has been withheld in Turkey in re...,withheld
85,ugur_aygun_,This account has been withheld in: Turkey.,withheld
124,Baturhanberk,This account has been withheld in: Turkey.,withheld
133,DEMZEM,This account has been withheld in: Turkey.,withheld
157,darksideoftheeg,This account has been withheld in: Germany.,withheld


### Bios containing Nazi-related symbols

In [14]:
MAIN_SYMBOL_PATTERNS = [
    (r"[卐卍]", "swastikas"),
    (r"nazi(?!rah)", "nazi"),
    (r"(?<!ŞE)hitler", "Hitler"), 
]

SYMBOL_PATTERNS = MAIN_SYMBOL_PATTERNS + [
    (r"14.?88", "14/88, 1488, etc."),
    (r"14w|14 words", "14w"),
    (r"ϟϟ", "ϟϟ"),
    (r"✠", r"✠")
]

In [15]:
for pat, name in SYMBOL_PATTERNS:
    print("{}: {}".format(
        name,
        (
            accounts["bio"].str.contains(pat, case=False, na=False) |
            accounts["screen_name"].str.contains(pat, case=False, na=False)
        ).sum()    
    ))

swastikas: 18
nazi: 61
Hitler: 55
14/88, 1488, etc.: 53
14w: 17
ϟϟ: 7
✠: 3


Any of the above:

In [16]:
main_nazi_matches = accounts[
    (accounts["bio"] + "|" + accounts["screen_name"]).str.contains(
        r"|".join(pat for pat, name in MAIN_SYMBOL_PATTERNS), 
        case=False,
        na=False
    )
]
main_nazi_matches["withheld_ever"].value_counts()

DE         68
DE + FR    23
FR         15
TR          1
Name: withheld_ever, dtype: int64

Note: Turkey match appears to be a false positive.

## Step 5: Calculate dates of first-observation by country

The "first observation" data indicates the first date on which BuzzFeed News noticed that a particular account was being withheld in a particular country. To be clear: These dates do __not__ not correspond to the dates on which Twitter began withholding that account in that country. Rather, the data simply says that a particular account was first withheld *on this date or earlier*.

In [17]:
first_obs_counts = first_observations\
    .groupby([ "fetched_at", "country" ])\
    .size()\
    .unstack()\
    .fillna(0)\
    .astype(int)

first_obs_counts

country,BR,DE,FR,GB,IN,RU,TR
fetched_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-10-02,0,399,138,3,1,5,159
2017-10-03,0,99,31,0,0,25,433
2017-10-04,0,5,2,0,1,1,115
2017-10-05,0,1,0,0,2,0,2
2017-10-10,1,53,1,0,0,0,0
2017-10-16,1,71,42,0,7,45,10
2017-10-17,0,0,1,0,0,0,0
2017-10-18,0,0,3,1,0,0,0
2017-10-19,0,1,0,0,0,0,0
2017-10-20,0,10,1,0,0,1,0


And cumulatively:

In [18]:
first_obs_counts.cumsum()

country,BR,DE,FR,GB,IN,RU,TR
fetched_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-10-02,0,399,138,3,1,5,159
2017-10-03,0,498,169,3,1,30,592
2017-10-04,0,503,171,3,2,31,707
2017-10-05,0,504,171,3,4,31,709
2017-10-10,1,557,172,3,4,31,709
2017-10-16,2,628,214,3,11,76,719
2017-10-17,2,628,215,3,11,76,719
2017-10-18,2,628,218,4,11,76,719
2017-10-19,2,629,218,4,11,76,719
2017-10-20,2,639,219,4,11,77,719


---

---

---