### Challenge questions

Easy questions:

 1. How many total pings are in the Ocearch data?
 2. How many unique species of sharks are in the data set?
 3. What is the name of the heaviest shark, how heavy is it?
 4. What is the name of the longest shark, how long is it?
 5. When and where was the very first ping?

Intermediate questions:

 1. Which shark had the most pings?
 2. Which shark has been pinging the longest, and how long has that been?
 3. Which shark species has the most individual sharks tagged?
 4. What is the average length and weight of each shark species?
 5. Which shark has the biggest geographic box (largest distance from min lat/lon to max lat/lon, not dist_traveled)?

### Answers

#### Read and clean data

In [1]:
import pandas as pd
df = pd.read_csv('../data/sharks.csv')
df.head()

Unnamed: 0,active,datetime,id,latitude,longitude,name,gender,species,weight,length,tagDate,dist_total
0,1,2014-07-06 04:57:28,3,-34.60661,21.15244,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
1,1,2014-06-23 02:40:09,3,-34.78752,19.42479,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
2,1,2014-06-15 13:15:44,3,-34.42487,21.09754,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
3,1,2014-06-03 02:23:57,3,-34.704323,20.210134,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
4,1,2014-05-28 19:53:57,3,-34.65556,19.37459,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662


##### Clean length/weight
This is the only real difference between `sharks_cleaned.csv` and `sharks.csv`

In [2]:
def clean_weight(value):
    if not value:
        return value
    # most values are like "123 lb"
    value = str(value)
    for character in 'lbs,+':
        value = value.replace(character, '')
    return float(value)

def clean_length(value):
    if not value:
        return value
    # most length values are like '3 ft 4 in.'
    value = str(value)
    total = 0
    if 'ft' in value:
        ft, inches = value.split('ft')
        total += int(ft.strip()) * 12
    else:
        inches = value
    if inches.strip():
        total += float(inches.strip().split()[0])
    return total

df['weight'] = df.weight.apply(clean_weight)
df['length'] = df.length.apply(clean_length)
df['datetime'] = pd.to_datetime(df.datetime)
df.head()

Unnamed: 0,active,datetime,id,latitude,longitude,name,gender,species,weight,length,tagDate,dist_total
0,1,2014-07-06 04:57:28,3,-34.60661,21.15244,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
1,1,2014-06-23 02:40:09,3,-34.78752,19.42479,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
2,1,2014-06-15 13:15:44,3,-34.42487,21.09754,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
3,1,2014-06-03 02:23:57,3,-34.704323,20.210134,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
4,1,2014-05-28 19:53:57,3,-34.65556,19.37459,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662


#### Easy answers

 1. How many total pings are in the Ocearch data?
 2. How many unique species of sharks are in the data set?
 3. What is the name of the heaviest shark, how heavy is it?
 4. What is the name of the longest shark, how long is it?
 5. When and where was the very first ping?

In [3]:
# total ping count
len(df)

65793

In [4]:
# unique species
df.species.nunique()

9

In [5]:
# heaviest shark
df[df.weight == df.weight.max()].iloc[0]

active                                    1
datetime                2016-10-24 03:12:24
id                                      233
latitude                            38.5389
longitude                          -68.8206
name                         Rocky Mazzanti
gender                               Female
species       Whale Shark (Rhincodon Typus)
weight                                25000
length                                  300
tagDate                      24 August 2016
dist_total                          1753.52
Name: 56469, dtype: object

In [6]:
# first ping
df.sort_values('datetime').iloc[0]

active                                           1
datetime                       2012-03-09 15:35:31
id                                               3
latitude                                   -34.132
longitude                                   22.123
name                                         Oprah
gender                                      Female
species       White Shark (Carcharodon carcharias)
weight                                         686
length                                         118
tagDate                               7 March 2012
dist_total                                 2816.66
Name: 519, dtype: object

In [7]:
# max distance travelled
df.dist_total[df.dist_total > 0].describe()

count    65785.000000
mean     12569.310280
std      12755.008316
min          8.127000
25%       3048.274000
50%       8177.352000
75%      17811.853000
max      46553.182000
Name: dist_total, dtype: float64

#### Intermediate answers

Intermediate questions:

 1. Which shark had the most pings?
 2. Which shark has been pinging the longest, and how long has that been?
 3. Which shark species has the most individual sharks tagged?
 4. What is the average length and weight of each shark species?
 5. Which shark has the biggest geographic box (largest distance from min lat/lon to max lat/lon, not dist_traveled)?

##### Most pings

In [8]:
groups = df.groupby('id')
sizes = groups.size()
names = groups.name.first()
species = groups.species.first()
first_ping = groups.datetime.min()
last_ping = groups.datetime.max()
combined = pd.concat([sizes, names, species, first_ping, last_ping], axis=1).reset_index()
combined.columns = ['id', 'ping_count', 'name', 'species', 'first_ping', 'last_ping']
combined.sort_values('ping_count', ascending=False).head()

Unnamed: 0,id,ping_count,name,species,first_ping,last_ping
35,41,3240,Mary Lee,White Shark (Carcharodon carcharias),2012-09-18 09:34:28,2017-06-17 10:54:32
36,56,2946,Lydia,White Shark (Carcharodon carcharias),2013-03-03 08:03:13,2017-03-15 02:31:34
154,202,2366,Oscar,Mako Shark (Isurus oxyrinchus),2016-07-09 00:14:38,2019-01-29 20:32:35
40,60,2134,April,Mako Shark (Isurus oxyrinchus),2013-07-28 17:00:04,2014-06-17 11:17:03
26,32,1851,Lisha,White Shark (Carcharodon carcharias),2012-05-14 15:43:21,2014-04-03 12:48:57


##### Longest duration pinger

In [9]:
combined['duration'] = combined.last_ping - combined.first_ping
combined.sort_values('duration', ascending=False).head()

Unnamed: 0,id,ping_count,name,species,first_ping,last_ping,duration
45,65,1816,Katharine,White Shark (Carcharodon carcharias),2013-08-21 04:42:26,2019-01-14 23:49:00,1972 days 19:06:34
2,5,204,Helen,White Shark (Carcharodon carcharias),2012-03-10 15:15:10,2017-01-05 05:22:39,1761 days 14:07:29
35,41,3240,Mary Lee,White Shark (Carcharodon carcharias),2012-09-18 09:34:28,2017-06-17 10:54:32,1733 days 01:20:04
36,56,2946,Lydia,White Shark (Carcharodon carcharias),2013-03-03 08:03:13,2017-03-15 02:31:34,1472 days 18:28:21
19,25,1578,Cyndi,White Shark (Carcharodon carcharias),2012-04-14 15:50:25,2015-09-21 15:00:43,1254 days 23:10:18


##### Individual count by species

In [10]:
df.groupby('species').id.nunique().sort_values(ascending=False).head()

species
Tiger Shark  (Galeocerdo cuvier)        82
White Shark (Carcharodon carcharias)    74
Blue Shark (Prionace glauca)            27
Mako Shark (Isurus oxyrinchus)          18
Hammerhead Shark (Sphyrna)              18
Name: id, dtype: int64

##### Average length/weight by species

In [11]:
df.groupby('species').agg({'weight' : 'mean', 'length' : 'mean', 'id' : 'nunique'}).sort_values('id')

Unnamed: 0_level_0,weight,length,id
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Whale Shark (Rhincodon Typus),25000.0,327.906977,3
Bull Shark (Carcharhinus leucas),290.4,89.781022,4
Silky Shark (Carcharhinus falciformis),132.881671,76.965197,4
Blacktip Shark (Carcharhinus limbatus),138.37891,80.316209,9
Hammerhead Shark (Sphyrna),126.532226,93.813093,18
Mako Shark (Isurus oxyrinchus),240.451871,82.446834,18
Blue Shark (Prionace glauca),243.634091,106.028852,27
White Shark (Carcharodon carcharias),1555.265789,147.144227,74
Tiger Shark (Galeocerdo cuvier),467.91739,119.175229,82


##### Biggest geographic box

In [12]:
groups = df.groupby('id')
combined = pd.concat([groups.latitude.min(), 
                      groups.longitude.min(), 
                      groups.latitude.max(), 
                      groups.longitude.max(), 
                      groups.name.first(), 
                      groups.species.first()], axis=1).reset_index()
combined.columns = ['id', 'min_lat', 'min_lon', 'max_lat', 'max_lon', 'name', 'species']
combined.head()

Unnamed: 0,id,min_lat,min_lon,max_lat,max_lon,name,species
0,3,-34.88268,19.37459,-34.05394,22.64236,Oprah,White Shark (Carcharodon carcharias)
1,4,-36.703,20.535038,-34.063,22.74626,Albertina,White Shark (Carcharodon carcharias)
2,5,-37.23623,18.53635,-19.50057,37.84922,Helen,White Shark (Carcharodon carcharias)
3,6,-34.986,19.06158,-24.77363,34.84301,Brenda,White Shark (Carcharodon carcharias)
4,7,-35.461,17.91681,-32.743,27.97646,Madiba,White Shark (Carcharodon carcharias)


In [13]:
combined['lat_diff'] = combined.max_lat - combined.min_lat
combined['lon_diff'] = combined.max_lon - combined.min_lon
combined['area'] = combined['lat_diff'] * combined['lon_diff']
combined.sort_values('area', ascending=False).head()

Unnamed: 0,id,min_lat,min_lon,max_lat,max_lon,name,species,lat_diff,lon_diff,area
29,35,-41.37174,18.515,-6.15888,71.0983,Kathryn,White Shark (Carcharodon carcharias),35.21286,52.5833,1851.608381
36,56,23.53902,-81.3818,53.65843,-27.48272,Lydia,White Shark (Carcharodon carcharias),30.11941,53.89908,1623.408489
24,30,-43.21756,8.06196,-19.11709,66.72966,Vindication,White Shark (Carcharodon carcharias),24.10047,58.6677,1413.919144
19,25,-45.61157,18.23305,-14.95129,61.87323,Cyndi,White Shark (Carcharodon carcharias),30.66028,43.64018,1338.020138
30,36,-38.82461,17.47565,-10.52038,62.65514,Success,White Shark (Carcharodon carcharias),28.30423,45.17949,1278.770676
