# Today: EDA GROUPS!

Choose a team, and then spend some time looking at data.  We want you to explore the data using the techniques we learned this far including:

- Problem statement / Hypothesis 
- Grouping / subsetting / segmentation
- Summary statistics
    - Histograms
    - Plotting
- Slicing
- Cleaning data
    - assessing proper types
    - expected values
    - object converstion
   

At the end of our exploratory analysis, each group will be giving a 10 minute presentation on their findings to the rest of class.  Makes sure all details relate to your the goal that is being asked about.  If there is no goal, you need to frame what is asked in terms of what can be measured, through the lens of business goals / values.


## Group Roles

In the real world, you will be expected to work with other human beings.  Everyone works differently but it's important to **work together**.  Project management is an entirely different skillset that aims to solve the problem of parallel task performance, with milestones in mind.  

Some guidelines to consider:
- Select someone to organzie your groups priorities
- Select someone to present (someone who needs practice)

Then:
1. Look at the data "individually"
1. Look at the data "together"
1. Discuss your ideas
1. Come up with a plan to tackle the problem
1. Distibute the tasks evenly 
1. **WORK TOGETHER!**

Avoid:
1. Excessive discussion (be mindful of the time you need to think vs do)
1. Open ended plans (make concrete plans, and contingencies)
1. Working in silos alone

**If you don't come up with anything, you will still present.** Sometimes arguements come up but this is ok.  On the job, its rare that you will work alone.  

*Everyone comes with unique strengths and weaknesses.  Data science is a team sport.  Support each other with your strengths.  Find ways to complement each others experience.*



In [1]:
import pandas as pd, numpy as np, seaborn as sns

%matplotlib inline

## Team Alpha Drone

Since the API from `api.dronestre.am` provides data on drone strikes in near real time, this **might** be useful to hold President Obama accountable to his promise of reducing drone strikes.  Your mission, is to explore drone strike data, doing any accomanying research with your analysis, and report back any good summary statistics.

Also, we would like to know:
 - Is this a good source of data?
     - Why / why not?
     
*Politics aside -- let's keep it to what is measurable in our dataset.  This isn't meant to prove or disprove anything.  It's a **fun** dataset to look at moreso than a motivator of political discourse.*

In [2]:
# First we need to fetch some data using Python requests from API
# Read more about Python requests:
# http://docs.python-requests.org/en/master/user/quickstart/

import requests

response = requests.get("http://api.dronestre.am/data")
json_data = response.json()
drone_df = pd.DataFrame(json_data['strike'])

In [3]:
drone_df.head(1)

Unnamed: 0,_id,articles,bij_link,bij_summary_short,bureau_id,children,civilians,country,date,deaths,...,injuries,lat,location,lon,names,narrative,number,target,town,tweet_id
0,55c79e711cbee48856a30886,[],http://www.thebureauinvestigates.com/2012/03/2...,In the first known US targeted assassination u...,YEM001,,0,Yemen,2002-11-03T00:00:00.000Z,6,...,,15.47467,Marib Province,45.322755,"[Qa'id Salim Sinan al-Harithi, Abu Ahmad al-Hi...",In the first known US targeted assassination u...,1,,,278544689483890688


## Team Popcorn

<img src="https://media.giphy.com/media/uvMEhrg0lOPRu/giphy.gif">

You're a force to be reckoned with when you `read_csv` into your `movie_df` dataframe.  You are team "Popcorn".  As a big movie studio, we need to report on metrics that will help us:

 - Sell movie ideas to potential investors
 - Maximize product placement and / or sponsorships
 
As the lead data scientist, I will give you some some direction / starting points:

 - Which movies remained in the top 10 the longest?
 - Which movies were good investments?
 - Any interesting trends throughout the year?
 - Google anything interesting about flagship movies in terms of partnerships and how those deals could be relevant to consider in our own research.
 
 Bonus:
 - Do any holidays impact sales performance or position?  How could we leverage this?
 - What could we look at outside our dataset that may help project good investments?

_[There's a data dictionary available!](http://www.amstat.org/publications/jse/v17n1/datasets.mclaren.html)_

Keep in mind the main points when presenting your findings!  It's interesting to share details and sidepoints, but make sure they're supporting and relating to pitching movies to investment, and helping maximize our partnership goals!

In [5]:
movie_df = pd.read_csv("../../../../datasets/movie_weekend/movie_weekend.csv")
movie_df.head(5)

Unnamed: 0,NUMBER,MOVIE,WEEK_NUM,WEEKEND_PER_THEATER,WEEKEND_DATE
0,1.0,A Beautiful Mind,1.0,701.0,12/21/01
1,1.0,A Beautiful Mind,2.0,14820.0,12/28/01
2,1.0,A Beautiful Mind,3.0,8940.0,1/4/02
3,1.0,A Beautiful Mind,4.0,6850.0,1/11/02
4,1.0,A Beautiful Mind,5.0,5280.0,1/18/02


## Team Titanic

Known for it's honesty, the Titanic dataset is a very common dataset for doing classification prediction of fatalities.  For our challenge, why don't we try to focus on the latent characteristics. 

For the record, this is how much we know:

![](http://www.glencoe.com/sec/math/studytools/books/0-07-829631-5/images/IQ02-003W-8228662.gif)

Certainly there is a better story to tell.  We are especially interested in any variables that express a high level of variance, in addition to anything beyond "less women died".

**Bonus:  Feature Engineering**
 - Can you pull out titles (ie: Mr., Miss, Mrs) from the feature "Name" and assign it to a new variable? We think there could be something interesting to look at in aggregate based on titles!

In [6]:
titanic_df = pd.read_csv("../../../../datasets/titanic/titanic.csv")
titanic_df.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


# Team Silk Road

The time has come to take our game underground.  Our company is being hired to provide insights to a government contract entity that may provide metrics in which to potentially evaluate emerging crypto-currencies.  Understanding these markets will help ongoing and future investigations of illegal underground marketplaces operating from [darknets](https://en.wikipedia.org/wiki/Darknet_market).

The general goals of our system, which we hope you can help us expand our understanding of:

- Invent some quality metric(s) to help us measure these currencies
- Project any kind of lifetime of given market
- Volatility 
- Impact of "minability" on any of the above

Also very important:  If one were to open or operate an illegal marketplace, which currency is preferable?


In [23]:
df = pd.read_csv("../../../../datasets/cryptocurrencies/Currencies and Assets.csv")
df.head(10)

Unnamed: 0,Name,Image,Type,Status,Symbol,Platform,Market Capitalization,Price,Available suppy,Volume (17 Oct. '14),More Information,Profile
0,RosCoin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,,ROS,,"$39,669",$0.00,75737672,"$14,995",http://216.231.132.86:2000/chain/Roscoin,https://coinmarketcap.com/currencies/roscoin/
1,365Coin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,Minable,365,,"$2,714",$19.42,140,Unknown,http://365coin.net/,https://coinmarketcap.com/currencies/365coin/
2,DarkFox,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,,DRX,,$304,$0.00,576390,Unknown,http://62.113.238.105/chain/darkfox,https://coinmarketcap.com/currencies/darkfox/
3,66 Coin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,Minable,66,,Unknown,$30.59,Unknown,Low,http://66coin.org/,https://coinmarketcap.com/currencies/66-coin/
4,Acoin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,Minable,ACOIN,,"$3,168",$0.02,150530,$71,http://a-coin.info,https://coinmarketcap.com/currencies/acoin/
5,PotatoCoin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,"Minable, Premined",SPUDS,,$542,$0.00,70855678,$23,http://abe.bitember.com/chain/Potatocoin/,https://coinmarketcap.com/currencies/potatocoin/
6,Aerocoin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,,AERO,,"$74,681",$0.01,7109983,$30,http://aerocoin.org/,https://coinmarketcap.com/currencies/aerocoin/
7,Aliencoin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,Minable,ALN,,"$10,919",$0.00,24726090,$78,http://alienco.in/,https://coinmarketcap.com/currencies/aliencoin/
8,AlphaCoin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,Minable,ALF,,Unknown,$0.00,Unknown,Low,http://alphacoin.wordpress.com/,https://coinmarketcap.com/currencies/alphacoin/
9,AmericanCoin,https://coinmarketcap.com/static/img/coins/16x...,Cryptocurrency,Minable,AMC,,Unknown,$0.00,Unknown,$26,http://amccoin.com,https://coinmarketcap.com/currencies/americanc...
