# Drone strike dataset

**Challenge dataset**

---

This dataset, located in the `drone_strikes` folder, has a variety of information about drone strikes.

I don't describe the columns, that is up to you as a group to infer/figure out.
    
This dataset is challenging for a variety of reasons. It is not cleaned up - there are missing cells. Relationships between variables are more complicated. A lot of the variables are in string format, and if you are interested in them, may need to be "recoded" as binary 1 vs. 0 in a new column.

**If you choose this dataset it is as much a data cleaning lab as an EDA lab. Buyer beware.**

---

### Requirements

As a group you should:

1. Load and clean the data with pandas. You will probably want to remove variables you are not interested first so that cleaning is easier.
2. Identify variables and subsets of the data your are interested in as a group.
2. Describe the data and investigate any outliers for those variables.
3. Explore relationships between variables.
4. Visualize at least three variables of your choice with appropriate visualizations. They should be understandable.
5. Visualize subsets of the variables you chose, subsetted conditional on some other variable. For example, number of civillians killed by area.
6. Write a brief report on at least 5 things you found interesting about the data or, if it doesn't interest you at all, things you found out and why they are boring.

In [1]:
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

drone_file = '/Users/alex/Desktop/DSI-SF-2/datasets/drone_strikes/drones.csv'
drones = pd.read_csv(drone_file)

In [2]:
drones.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 381 entries, 0 to 380
Data columns (total 27 columns):
Strike ID                   381 non-null int64
Bureau ID                   381 non-null object
Date                        381 non-null object
Time                        77 non-null object
Location                    381 non-null object
Area                        381 non-null object
Target                      65 non-null object
Target Group                124 non-null object
Westerners involved         2 non-null object
Minimum Total Killed        381 non-null int64
Mean Total Killed           283 non-null float64
Maximum Total Killed        381 non-null int64
Number of deaths            373 non-null object
AQ/TB Killed                41 non-null object
Minimum civilians killed    173 non-null float64
Maximum civilians killed    173 non-null float64
Civilians Killed            267 non-null object
Min injured                 305 non-null float64
Max injured                 305 non

In [3]:
# Strike ID, Bureau ID, Related ID, Time, Notes, Westerners Involved, 
# Mean Total Killed, Number of Deaths

In [4]:
drones.drop(['Strike ID', 'Bureau ID', 'Related ID', 'Time', 'Notes', 'Mean Total Killed', 'Number of deaths'], axis=1, inplace=True)

In [5]:
drones.isnull().sum()

Date                          0
Location                      0
Area                          0
Target                      316
Target Group                257
Westerners involved         379
Minimum Total Killed          0
Maximum Total Killed          0
AQ/TB Killed                340
Minimum civilians killed    208
Maximum civilians killed    208
Civilians Killed            114
Min injured                  76
Max injured                  76
Injured                      49
Minimum children killed     312
Max children killed         313
Children Killed             281
Pakistani approval          362
Short Summary                 1
dtype: int64

In [7]:
pd.set_option("display.max_rows", 400)
pd.set_option("display.max_columns", 30)

In [49]:
def hasNumbers(inputString):
    return any(char.isdigit() for char in inputString)

civ_mask = (~drones['Civilians Killed'].isnull()&
            (drones['Minimum civilians killed'].isnull()|drones['Maximum civilians killed'].isnull()))

drones[civ_mask]

# print drone[civ_mask].ix[:,['Minimum civilians killed','Maximum civilians killed','Civilians Killed']]

# pd.to_numeric(drones['Minimum Total Killed'],errors='coerce')
# pd.to_numeric(drones['Maximum Total Killed'],errors='coerce')

drones[civ_mask]['Civilians Killed'].dtypes

dtype('O')

In [52]:
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

frame.apply(np.std, axis=1)

Utah      0.506181
Ohio      0.221054
Texas     1.054653
Oregon    0.707606
dtype: float64