# Qualitative Analysis of Critical and Predicted Critical Reports


### Key Questions
#### What kind of reports are being labeled critical?
#### What kind of reports are generating False Negatives?
#### What kind of reports are generating False Positives?

In [20]:
import numpy as np
import pandas as pd


In [22]:
df = pd.read_csv("critical_titles.csv")
print df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4437 entries, 0 to 4436
Data columns (total 4 columns):
Unnamed: 0    4437 non-null int64
title         4437 non-null object
critical      4437 non-null float64
predicted     4437 non-null int64
dtypes: float64(1), int64(2), object(1)
memory usage: 138.7+ KB
None


### What's being labeled critical?


In [23]:
crit_df = df[df["critical"] == 1]
for title in crit_df["title"].sample(30).values:
    print "- " + title

- Escape from hell: Residents flee Aleppo as UN reports civilian slaughter
- Executions reported as Aleppo battle nears end
- 2 Brothers Arrested in Germany Are Accused of Planning an Attack
- The Latest: Malta TV: Hijackers threaten to blow up plane
- President launches 30 new Navy vessels
- Obama: President without briefings 'flying blind'
- Hundreds protest Aleppo massacre in London, New York
- Militants Lose Control in Aleppo, Seize 2% Only
- The Latest: Russia says 1,000 evacuated from Syria's Aleppo
- Car bomb kills 13 Turkish commandos, army says
- Evacuation of rebel Aleppo enters second day
- Border Patrol officers discover $3.25M worth of weed in shipment of strawberry jam
- Residents board buses in eastern Aleppo as evacuation begins
- Slovakia to Send Humanitarian Aid for Aleppo Residents on Wednesday
- Protesters gather against atrocity as battle for Aleppo ends
- Donald Trump will be BLAMED if the US is hit by terror attack, former CIA chief claims
- Kerry says US is work

Events that are particularly severe or events that have global or near-global impact seem to be those labeled critical. This makes sense, as these are the types of events that draw considerable media attention. However, there do seem to be some that only correlate to subsequent anomalies through chance, and that not all of these reports have a causative impact on subsequent anomalously high reporting.

### What's generating False Negatives?

In [24]:
fn_df = df[ (df["critical"] == 1) & (df["predicted"] == 0) ]
print "{} false negatives in total".format(len(fn_df))
print ""
for title in fn_df["title"].sample(30).values:
    print "- " + title

844 false negatives in total

- Woman dies after found shot in head on I-55
- Tsunami alert after 7.9-magnitude PNG quake: USGS/PTWC
- Malta flight hijackers leave the plane along with the crew as airport drama ends peacefully
- Germany: 12 Killed After Truck Plows Through Christmas Market
- US, UK coordinate Saudi attacks in Yemen: Analyst
- Thirteen Turkish soldiers killed, 48 wounded in car bomb attack
- Jihadis 'hiding in plain sight' among migrants, says Armed Forces chief: Sir Stuart Peach also warns terrorists are 'popping up all over the world' as propaganda spreads on ...
- Third Alleged Hacker Arrested in Chase Breach
- The Polish driver whose truck was used in Berlin attack was last heard from 4 hours before the massacre
- The Latest: Tanker skids off highway, explodes in Baltimore
- Police raid multiple properties across Melbourne - with police refusing to comment for 'due to ...
- Carrie Fisher suffers cardiac arrest on plane, taken to LA-area hospital
- Palestinian arrest

Again, there seem to be some examples that don't have a logical causal link to increased reporting. However, there are many of these that are clearly serious but are still mislabelled. This might be due to the short length of the titles, or perhaps the similarity of many risk reports. 

### What's generating False Positives?

In [25]:
fp_df = df[ (df["critical"] == 0) & (df["predicted"] == 1) ]
print "{} false positives in total".format(len(fp_df))
print ""
for title in fp_df["title"].sample(30).values:
    print "- " + title

1619 false positives in total

- Turkish FM says Aleppo evac still underway
- U.N. Security Council calls for Aleppo evacuation monitoring| Reuters
- Flight diverted to New York's JFK airport due to bomb threat
- Germany: Boy held for Christmas market bomb plot
- Islamic State Releases Video Showing Turkish Soldiers Being Burned Alive
- Russia 'complicit in war crimes,' Syrian organizations tell UN
- Police drones, not helicopters, could one day patrol Tucson skies
- Russia: Loss of Iranian nuclear deal would be 'unforgivable'
- Car driver injured in highway collision with state snowplow
- Super Typhoon Nock-ten to threaten lives, property in Philippines on Christmas
- China's northernmost province to hold Ice and Snow Day
- Japan seeks pressure on NKorea for abductions issue
- Syria hands over evidence of mustard gas attack by rebels on civilians to OPCW (VIDEO)
- Typhoon killes four, disrupts holiday celebrations in Philippines
- Turkish officials identify assailant who shot Russian 

These seem to be much more tied to geo-political events. It would be worth exploring how well a model performs when incorporating metadata like report type in the features.