# Team Alpha Drone

---

Since the API from `api.dronestre.am` provides data on drone strikes in near real time, this might be useful to hold the President accountable if he/she promises to reduce drone strikes. 

**Your mission:** 
- Explore the drone strike data and describe it
- Do some accomanying research to augment your analysis
- Report back any good summary statistics

**Also, we would like to know:**
 - Is this a good source of data?
 - Why / why not?


One of the needs for data science in organizations is to bring measure to vague problems. What can be measured in this dataset with certainty? Drive your presentation from what can be measured, reported. 

Also, if possible, suggest what can be done with this data in terms of actionable outcomes and to what extent.
     
*Keep politics out of the presentation and group work. Let's keep the work and discussion to the data: what is measurable and insights we can draw.  This data isn't meant to prove or disprove anything. It's intended to be an interesting dataset to look at, not a platform for political discourse.*

In [90]:
# First we need to fetch some data using Python requests from API
# Read more about Python requests:
# http://docs.python-requests.org/en/master/user/quickstart/

import requests
import pandas as pd
import numpy as np
import re

response = requests.get("http://api.dronestre.am/data")
json_data = response.json()
drone_df = pd.DataFrame(json_data['strike'])

### Get the Head

In [91]:
#Nas
drone_df.head()

Unnamed: 0,_id,articles,bij_link,bij_summary_short,bureau_id,children,civilians,country,date,deaths,...,injuries,lat,location,lon,names,narrative,number,target,town,tweet_id
0,55c79e711cbee48856a30886,[],http://www.thebureauinvestigates.com/2012/03/2...,In the first known US targeted assassination u...,YEM001,,0,Yemen,2002-11-03T00:00:00.000Z,6,...,,15.47467,Marib Province,45.322755,"[Qa'id Salim Sinan al-Harithi, Abu Ahmad al-Hi...",In the first known US targeted assassination u...,1,,,278544689483890688
1,55c79e711cbee48856a30887,[],http://www.thebureauinvestigates.com/2011/08/1...,First known drone strike in Pakistan kills at ...,B1,2.0,2,Pakistan,2004-06-17T00:00:00.000Z,6-8,...,1.0,32.30512565,South Waziristan,69.57624435,"[Nek Mohammad, Fakhar Zaman, Azmat Khan, Marez...",The first known fatal US drone strike inside P...,2,Nek Mohammed,Wana,278544750867533824
2,55c79e711cbee48856a30888,[],http://www.thebureauinvestigates.com/2011/08/1...,"Two killed, including Haitham al-Yemeni an al ...",B2,,,Pakistan,2005-05-08T00:00:00.000Z,2,...,,32.98677989,North Waziristan,70.26082993,"[Haitham al-Yemeni, Samiullah Khan]",2 people killed in a Predator strike which rep...,3,Haitham al-Yemeni,Toorikhel,278544812255367168
3,55c79e721cbee48856a30889,[],http://www.thebureauinvestigates.com/2011/08/1...,"Failed strike against Abu Hamza Rabia (""al Qae...",B3,3.0,3-8,Pakistan,2005-11-05T00:00:00.000Z,8,...,1.0,32.99988191,North Waziristan,70.34082413,[],A failed strike destroyed Abu Hamza Rabia's ho...,4,Abu Hamza Rabia,Mosaki,278544854483628032
4,55c79e721cbee48856a3088a,[],http://www.thebureauinvestigates.com/2011/08/1...,"Syrian Abu Hamza Rabia, the senior al Qaeda op...",B4,2.0,2,Pakistan,2005-12-01T00:00:00.000Z,5,...,,33.00866349,North Waziristan,70.04196167,"[Abu Hamza Rabia, Suleiman al-Moghrabi, Amer A...","5 people were killed, including 2 children, wh...",5,Abu Hamza Rabia,Haisori,278544895789133825


### Drop unnecessary columns

In [92]:
drone_df.drop( ['bij_link','articles','bij_summary_short','bureau_id','names','target','town','tweet_id','number','narrative' ],axis=1, inplace = True, errors='ignore')

In [93]:
drone_df.columns

Index([u'_id', u'children', u'civilians', u'country', u'date', u'deaths',
       u'deaths_max', u'deaths_min', u'injuries', u'lat', u'location', u'lon'],
      dtype='object')

In [94]:
drone_df.head()

Unnamed: 0,_id,children,civilians,country,date,deaths,deaths_max,deaths_min,injuries,lat,location,lon
0,55c79e711cbee48856a30886,,0,Yemen,2002-11-03T00:00:00.000Z,6,6,6,,15.47467,Marib Province,45.322755
1,55c79e711cbee48856a30887,2.0,2,Pakistan,2004-06-17T00:00:00.000Z,6-8,8,6,1.0,32.30512565,South Waziristan,69.57624435
2,55c79e711cbee48856a30888,,,Pakistan,2005-05-08T00:00:00.000Z,2,2,2,,32.98677989,North Waziristan,70.26082993
3,55c79e721cbee48856a30889,3.0,3-8,Pakistan,2005-11-05T00:00:00.000Z,8,8,8,1.0,32.99988191,North Waziristan,70.34082413
4,55c79e721cbee48856a3088a,2.0,2,Pakistan,2005-12-01T00:00:00.000Z,5,5,5,,33.00866349,North Waziristan,70.04196167


### Identify these death columns for issues

In [95]:
drone_df['deaths_max'].unique()

array([u'6', u'8', u'2', u'5', u'22', u'83', u'4', u'34', u'10', u'0',
       u'15', u'13', u'20', u'1', u'12', u'25', u'7', u'23', u'3', u'21',
       u'9', u'11', u'16', u'35', u'31', u'14', u'40', u'17', u'18', u'27',
       u'32', u'42', u'26', u'', u'50', u'?', u'24', u'30', u'38', u'200',
       u'28', u'39'], dtype=object)

In [96]:
drone_df['deaths_min'].unique()

array([u'6', u'2', u'8', u'5', u'13', u'81', u'3', u'20', u'0', u'12',
       u'1', u'4', u'17', u'10', u'21', u'7', u'15', u'11', u'26', u'30',
       u'14', u'9', u'25', u'67', u'16', u'35', u'23', u'32', u'18', u'',
       u'?', u'24', u'29', u'38', u'150', u'28', u'39'], dtype=object)

In [97]:
drone_df['deaths'].unique()

array([u'6', u'6-8', u'2', u'8', u'5', u'13-22', u'81-83', u'3-4',
       u'20-34', u'5-10', u'Unknown', u'12-15', u'8-13', u'12-20', u'1',
       u'6-12', u'13-25', u'8-12', u'0', u'4-5', u'4-10', u'5-7', u'5-12',
       u'17-23', u'10-15', u'4-7', u'4-8', u'3', u'21', u'4-9', u'5-9',
       u'7-11', u'15-20', u'4-25', u'4-12', u'11-16', u'11-13', u'4-6',
       u'2-3', u'6-7', u'2-4', u'3-5', u'7-15', u'26-35', u'30-31',
       u'7-12', u'14-25', u'2-5', u'7-8', u'12-14', u'13', u'4', u'0-8',
       u'6-10', u'9-25', u'8-9', u'25-40', u'67-83', u'0-5', u'13-17',
       u'16-18', u'8-10', u'35-40', u'5-6', u'2-12', u'17-21', u'5-8',
       u'10-12', u'20-27', u'3-6', u'3-10', u'12-16', u'14-20', u'3-7',
       u'15-18', u'20', u'9-15', u'23-23', u'9', u'7', u'5-13', u'10-11',
       u'7-9', u'10', u'13-14', u'5-15', u'13-15', u'16-17', u'7-10',
       u'10-14', u'14', u'6-9', u'11-12', u'16', u'9-10', u'8-14',
       u'11-14', u'0-4', u'11-15', u'32', u'18-22', u'26-42', u'25-26',
   

### 'Deaths' has a lot of ambiguous values, we shall take deaths to be the mid point between max and min. Then drop both of them.

In [98]:
death = (drone_df['deaths_max'] + drone_df['deaths_min']) /2

TypeError: unsupported operand type(s) for /: 'unicode' and 'int'

In [105]:
drone_df['deaths'] = death
drone_df.drop( ['deaths_max','deaths_min' ],axis=1, inplace = True, errors='ignore')

### Check the labels

In [106]:
drone_df.head()

Unnamed: 0,_id,children,civilians,country,date,deaths,injuries,lat,location,lon
0,55c79e711cbee48856a30886,,0,Yemen,2002-11-03T00:00:00.000Z,66,,15.47467,Marib Province,45.322755
1,55c79e711cbee48856a30887,2.0,2,Pakistan,2004-06-17T00:00:00.000Z,86,1.0,32.30512565,South Waziristan,69.57624435
2,55c79e711cbee48856a30888,,0,Pakistan,2005-05-08T00:00:00.000Z,22,,32.98677989,North Waziristan,70.26082993
3,55c79e721cbee48856a30889,3.0,0,Pakistan,2005-11-05T00:00:00.000Z,88,1.0,32.99988191,North Waziristan,70.34082413
4,55c79e721cbee48856a3088a,2.0,2,Pakistan,2005-12-01T00:00:00.000Z,55,,33.00866349,North Waziristan,70.04196167


### Inspect Children's column

In [107]:
print drone_df['children'].unique
print drone_df['civilians'].unique()

array([u'', u'2', u'3', u'1', u'5', u'69', u'0-1', u'Possibly', u'Yes',
       u'4', u'8', u'4-Mar', u'0-3', u'At least 2', u'10', u'6',
       u'At least 1', u'Yes, according to one source.', u'0', u'0-2'], dtype=object)

### Make a loop to try and clean a column

In [131]:
cleancolumns = ['children','civilians','lat','lon']
for column in cleancolumns:
    blankmask = drone_df[column] == ""
    drone_df.loc[blankmask,column] =0
    newnumber =[]
    for string in drone_df[column]:
        try:
            newnumber.append(float(string))
        except ValueError:
            if string =="Possibly" or string == "Yes" or "according" in string  or string =="Yes " or "mar" in string or "yes" in string or "Yes" in string:
                newnumber.append(1)
            elif "At least" in string:
                newnumber.append(string[len("At least ")+1:])

            else:
                newnumber.append(0)
    

       


TypeError: invalid type comparison

### Get the Shape

In [114]:
drone_df.shape

(647, 10)

In [115]:
drone_df.head(5)

Unnamed: 0,_id,children,civilians,country,date,deaths,injuries,lat,location,lon
0,55c79e711cbee48856a30886,0.0,0.0,Yemen,2002-11-03T00:00:00.000Z,66,,15.47467,Marib Province,45.322755
1,55c79e711cbee48856a30887,2.0,2.0,Pakistan,2004-06-17T00:00:00.000Z,86,1.0,32.30512565,South Waziristan,69.57624435
2,55c79e711cbee48856a30888,0.0,0.0,Pakistan,2005-05-08T00:00:00.000Z,22,,32.98677989,North Waziristan,70.26082993
3,55c79e721cbee48856a30889,0.0,0.0,Pakistan,2005-11-05T00:00:00.000Z,88,1.0,32.99988191,North Waziristan,70.34082413
4,55c79e721cbee48856a3088a,2.0,2.0,Pakistan,2005-12-01T00:00:00.000Z,55,,33.00866349,North Waziristan,70.04196167


array([u'45.322755', u'69.57624435', u'70.26082993', u'70.34082413',
       u'70.04196167', u'70.05912781', u'71.4969635', u'71.49215698',
       u'69.55581665', u'70.28743744', u'70.07286072', u'70.48690796',
       u'70.24108887', u'69.40544128', u'69.45762634', u'69.84283447',
       u'69.43222046', u'69.32510376', u'69.60062027', u'69.33147601',
       u'69.52079773', u'70.06084442', u'69.72644806', u'69.53933716',
       u'70.0756073', u'69.28459167', u'70.35301208', u'70.434707',
       u'70.1653862', u'69.85279083', u'69.60783005', u'70.26906967',
       u'69.50054169', u'70.06736755', u'69.86875534', u'70.06530762',
       u'70.04608154', u'70.15396057', u'70.08590698', u'69.49728012',
       u'69.88197327', u'70.26224613', u'69.86515045', u'70.29361725',
       u'70.00282288', u'70.28022766', u'70.31747818', u'70.91108322',
       u'69.78326797', u'70.01981735', u'69.34604645', u'70.2725029',
       u'69.88162994', u'69.56783295', u'69.61521149', u'69.99217987',
       u'69.85

### Change the number columns to numerical

In [122]:
columnstonumeric = ['children','deaths','civilians','lat','lon']

for i in columnstonumeric:
    pd.to_numeric(drone_df[i])

ValueError: Unable to parse string "?0" at position 313

### Get the Info and Describe

In [120]:
drone_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 647 entries, 0 to 646
Data columns (total 10 columns):
_id          647 non-null object
children     647 non-null float64
civilians    647 non-null float64
country      647 non-null object
date         647 non-null object
deaths       647 non-null object
injuries     647 non-null object
lat          647 non-null object
location     647 non-null object
lon          647 non-null object
dtypes: float64(2), object(8)
memory usage: 50.6+ KB


### Change the objects into numbers

In [103]:
drone_df.describe()

Unnamed: 0,_id,children,civilians,country,date,deaths,deaths_max,deaths_min,injuries,lat,location,lon
count,647,647.0,647.0,647,647,647,647,647,647.0,647.0,647,647.0
unique,647,20.0,16.0,4,523,149,42,37,51.0,311.0,39,315.0
top,55c79e721cbee48856a30952,,0.0,Pakistan,2011-10-14T00:00:00.000Z,4,4,4,,,North Waziristan,
freq,1,542.0,581.0,430,5,48,97,130,269.0,29.0,303,29.0


### Check the datatype

In [104]:
drone_df.dtypes

_id           object
children      object
civilians     object
country       object
date          object
deaths        object
deaths_max    object
deaths_min    object
injuries      object
lat           object
location      object
lon           object
dtype: object

### Convert any improper strings that was supposed to be numerical into numbers

### Standardize Data

In [None]:
#James

In [7]:
drone_df.describe

Unnamed: 0,number
count,647.0
mean,324.0
std,186.917094
min,1.0
25%,162.5
50%,324.0
75%,485.5
max,647.0


### Replace missing numbers with NAN

In [None]:
#James

### Visualize some plots for the Data

### Turn the Date into a proper date

### Get the covariance and corelation of the data