### Challenge questions

Easy questions:

 1. How many total pings are in the Ocearch shark data?
 2. How many unique species of sharks are in the data set?
 3. What is the name, weight, and species of the heaviest shark(s)?
 4. When and where was the very first ping?
 5. Excluding results with 0 distance traveled: what's the minimum, average, and maximum travel distances?
 
Intermediate questions:

 1. Which shark had the most pings?
 2. Which shark has been pinging the longest, and how long has that been?
 3. Which shark species has the most individual sharks tagged?
 4. What is the average length and weight of each shark species?
 5. Which shark has the biggest geographic box (largest distance from min lat/lon to max lat/lon, not dist_traveled)?
 
Hard questions:


### Answers

#### Load data

In [29]:
import requests
url = 'http://www.ocearch.org/tracker/ajax/filter-sharks'

resp = requests.get(url)
resp

<Response [200]>

#### Explore data

In [30]:
resp.text[:200]

'[{"id":3,"name":"Oprah","tagIdNumber":"117480","species":"White Shark (Carcharodon carcharias)","gender":"Female","stageOfLife":"Sub-Adult","length":"9 ft 10 in.","weight":"686 lb","tagDate":"7 March '

In [31]:
for k, v in sorted(resp.json()[0].items()):
    print(k, str(v)[:200])

active 1
cnt_inactive_pings 0
description <p>Oprah was named by expedition leader Chris Fischer. Fischer named her after Oprah Winfrey, one of America&#39;s great philanthropists who has done much with education in Africa.</p>

dist_24_hours 0.000
dist_72_hours 0.000
dist_total 2816.662
gender Female
id 3
images [{'id': '218', 'filename': 'Screen Shot 2013-06-20 at 11.15.57 AM.png', 'encrypted_name': '4ffefafc2f699e5837c56cb2043b9798', 'description': None, 'is_primary': True}, {'id': '188', 'filename': 'Scree
isMobile False
is_alive 1
last_updated 1531850549
latestPing 1404703826
length 9 ft 10 in.
name Oprah
pingAge sharkmore30
pingCriteria {'interval': '30 year'}
pings [{'active': '1', 'id': '36902', 'datetime': '6 July 2014 1:57:28 PM', 'tz_datetime': '6 July 2014 1:57:28 PM +0900', 'latitude': '-34.60661', 'longitude': '21.15244'}, {'active': '1', 'id': '36666', '
platform None
profile_url http://dev.ocearch.org/profile/oprah/
species White Shark (Carcharodon carcharias)
species_i

#### Data Wrangling

##### Turn json into dataframe

In [32]:
import pandas as pd
df = pd.DataFrame(resp.json())
columns = ['id', 'name', 'gender', 'species', 'weight', 'length', 'tagDate', 'dist_total']
df[columns].head()

Unnamed: 0,id,name,gender,species,weight,length,tagDate,dist_total
0,3,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
1,4,Albertina,Female,White Shark (Carcharodon carcharias),1110 lb,11 ft 6 in.,8 March 2012,1830.593
2,5,Helen,Female,White Shark (Carcharodon carcharias),765 lb,10 ft 2 in.,8 March 2012,4436.661
3,6,Brenda,Female,White Shark (Carcharodon carcharias),1310 lb,12 ft 2 in.,8 March 2012,2966.902
4,7,Madiba,Male,White Shark (Carcharodon carcharias),659 lb,9 ft 8 in.,8 March 2012,3537.423


In [33]:
df.shape

(275, 30)

##### Filter out non-shark data

In [34]:
df.species.value_counts()

Tiger Shark  (Galeocerdo cuvier)                   82
White Shark (Carcharodon carcharias)               74
Blue Shark (Prionace glauca)                       27
Mako Shark (Isurus oxyrinchus)                     18
Hammerhead Shark (Sphyrna)                         18
Olive Ridley Turtle (Lepidochelys olivacea)        16
Blacktip Shark (Carcharhinus limbatus)              9
Loggerhead Sea Turtle (Caretta caretta)             9
Bull Shark (Carcharhinus leucas)                    4
Silky Shark (Carcharhinus falciformis)              4
Guadalupe Fur Seals (Arctocephalus townsendi)       4
Whale Shark (Rhincodon Typus)                       3
American alligator (Alligator mississippiensis)     2
 Harbor Seal (Phoca vitulina)                       1
Pilot Whale (Globicephala)                          1
Dolphin (Delphinus capensis)                        1
Green Sea Turtle (Chelonia mydas)                   1
Ship (Motor Vessel)                                 1
Name: species, dtype: int64

In [35]:
df = df[df.species.fillna('').str.contains('shark', case=False)]
df.shape

(239, 30)

##### Extract ping data

In [36]:
ping_frames = []
for row in df.itertuples():
    ping_frame = pd.DataFrame(row.pings)
    ping_frame['id'] = row.id
    ping_frames.append(ping_frame)
    
len(ping_frames)

239

##### Merge ping data back into dataframe

In [37]:
pings = pd.concat(ping_frames)
pings.shape

(65867, 6)

In [38]:
joined = pings.merge(df[columns], on='id')
joined.shape

(65867, 13)

In [39]:
joined.head()

Unnamed: 0,active,datetime,id,latitude,longitude,tz_datetime,name,gender,species,weight,length,tagDate,dist_total
0,1,6 July 2014 1:57:28 PM,3,-34.60661,21.15244,6 July 2014 1:57:28 PM +0900,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
1,1,23 June 2014 11:40:09 AM,3,-34.78752,19.42479,23 June 2014 11:40:09 AM +0900,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
2,1,15 June 2014 10:15:44 PM,3,-34.42487,21.09754,15 June 2014 10:15:44 PM +0900,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
3,1,3 June 2014 11:23:57 AM,3,-34.70432271674724,20.21013441406251,3 June 2014 11:23:57 AM +0900,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
4,1,29 May 2014 4:53:57 AM,3,-34.65556,19.37459,29 May 2014 4:53:57 AM +0900,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662


##### Clean data

In [41]:
df = joined # don't need a reference to the original resp.json() df anymore
df.shape

(65867, 13)

In [42]:
def clean_weight(value):
    if not value:
        return value
    # most values are like "123 lb"
    value = str(value)
    for character in 'lbs,+':
        value = value.replace(character, '')
    return float(value)

def clean_length(value):
    if not value:
        return value
    # most length values are like '3 ft 4 in.'
    value = str(value)
    total = 0
    if 'ft' in value:
        ft, inches = value.split('ft')
        total += int(ft.strip()) * 12
    else:
        inches = value
    if inches.strip():
        total += float(inches.strip().split()[0])
    return total

df['weight'] = df.weight.apply(clean_weight)
df['length'] = df.length.apply(clean_length)
df['datetime'] = pd.to_datetime(df.tz_datetime)

numeric_cols = ['latitude', 'longitude', 'dist_total']
df[numeric_cols] = df[numeric_cols].apply(pd.to_numeric, axis=1)
df = df.drop(columns=['tz_datetime'])
df.head()

Unnamed: 0,active,datetime,id,latitude,longitude,name,gender,species,weight,length,tagDate,dist_total
0,1,2014-07-06 13:57:28+09:00,3,-34.60661,21.15244,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
1,1,2014-06-23 11:40:09+09:00,3,-34.78752,19.42479,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
2,1,2014-06-15 22:15:44+09:00,3,-34.42487,21.09754,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
3,1,2014-06-03 11:23:57+09:00,3,-34.704323,20.210134,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662
4,1,2014-05-29 04:53:57+09:00,3,-34.65556,19.37459,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662


#### Easy answers

 1. How many total pings are in the Ocearch shark data?
 2. How many unique species of sharks are in the data set?
 3. What is the name, weight, and species of the heaviest shark?
 4. When and where was the very first ping?
 5. Excluding results with 0 distance traveled: what's the minimum, average, and maximum travel distances?


In [43]:
# total ping count
len(df)

65867

In [44]:
# unique species
df.species.nunique()

9

In [45]:
# heaviest shark(s)
df[df.weight == df.weight.max()][['name', 'weight', 'species']].drop_duplicates('name')

Unnamed: 0,name,weight,species
56484,Rocky Mazzanti,25000.0,Whale Shark (Rhincodon Typus)
59316,Canyon,25000.0,Whale Shark (Rhincodon Typus)


In [46]:
# first ping
df.sort_values('datetime').iloc[0]

active                                           1
datetime                 2012-03-10 00:35:31+09:00
id                                               3
latitude                                   -34.132
longitude                                   22.123
name                                         Oprah
gender                                      Female
species       White Shark (Carcharodon carcharias)
weight                                         686
length                                         118
tagDate                               7 March 2012
dist_total                                 2816.66
Name: 519, dtype: object

In [48]:
# max distance travelled
df.dist_total[df.dist_total > 0].describe()

count    65859.000000
mean     12571.443642
std      12751.389357
min          8.127000
25%       3048.274000
50%       8177.352000
75%      17811.853000
max      46553.182000
Name: dist_total, dtype: float64

#### Intermediate answers

Intermediate questions:

 1. Which shark had the most pings?
 2. Which shark has been pinging the longest, and how long has that been?
 3. Which shark species has the most individual sharks tagged?
 4. What is the average length and weight of each shark species?
 5. Which shark has the biggest geographic box (largest distance from min lat/lon to max lat/lon, not dist_traveled)?

##### Most pings
Which shark had the most pings?

In [49]:
groups = df.groupby('id')
sizes = groups.size()
names = groups.name.first()
species = groups.species.first()
first_ping = groups.datetime.min()
last_ping = groups.datetime.max()
combined = pd.concat([sizes, names, species, first_ping, last_ping], axis=1).reset_index()
combined.columns = ['id', 'ping_count', 'name', 'species', 'first_ping', 'last_ping']
combined.sort_values('ping_count', ascending=False).head()

Unnamed: 0,id,ping_count,name,species,first_ping,last_ping
35,41,3240,Mary Lee,White Shark (Carcharodon carcharias),2012-09-18 18:34:28+09:00,2017-06-17 19:54:32+09:00
36,56,2946,Lydia,White Shark (Carcharodon carcharias),2013-03-03 17:03:13+09:00,2017-03-15 11:31:34+09:00
154,202,2366,Oscar,Mako Shark (Isurus oxyrinchus),2016-07-09 09:14:38+09:00,2019-01-30 05:32:35+09:00
40,60,2134,April,Mako Shark (Isurus oxyrinchus),2013-07-29 02:00:04+09:00,2014-06-17 20:17:03+09:00
26,32,1851,Lisha,White Shark (Carcharodon carcharias),2012-05-15 00:43:21+09:00,2014-04-03 21:48:57+09:00


##### Longest duration pinger
Which shark has been pinging the longest, and how long has that been?

In [50]:
combined['duration'] = combined.last_ping - combined.first_ping
combined.sort_values('duration', ascending=False).head()

Unnamed: 0,id,ping_count,name,species,first_ping,last_ping,duration
45,65,1816,Katharine,White Shark (Carcharodon carcharias),2013-08-21 13:42:26+09:00,2019-01-15 08:49:00+09:00,1972 days 19:06:34
2,5,204,Helen,White Shark (Carcharodon carcharias),2012-03-11 00:15:10+09:00,2017-01-05 14:22:39+09:00,1761 days 14:07:29
35,41,3240,Mary Lee,White Shark (Carcharodon carcharias),2012-09-18 18:34:28+09:00,2017-06-17 19:54:32+09:00,1733 days 01:20:04
36,56,2946,Lydia,White Shark (Carcharodon carcharias),2013-03-03 17:03:13+09:00,2017-03-15 11:31:34+09:00,1472 days 18:28:21
19,25,1578,Cyndi,White Shark (Carcharodon carcharias),2012-04-15 00:50:25+09:00,2015-09-22 00:00:43+09:00,1254 days 23:10:18


##### Individual count by species
Which shark species has the most individual sharks tagged?

In [51]:
df.groupby('species').id.nunique().sort_values(ascending=False).head()

species
Tiger Shark  (Galeocerdo cuvier)        82
White Shark (Carcharodon carcharias)    74
Blue Shark (Prionace glauca)            27
Mako Shark (Isurus oxyrinchus)          18
Hammerhead Shark (Sphyrna)              18
Name: id, dtype: int64

##### Average length/weight by species
What is the average length and weight of each shark species?

In [52]:
df.groupby('species').agg({'weight' : 'mean', 'length' : 'mean', 'id' : 'nunique'}).sort_values('id')

Unnamed: 0_level_0,weight,length,id
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Whale Shark (Rhincodon Typus),25000.0,327.906977,3
Bull Shark (Carcharhinus leucas),290.4,89.781022,4
Silky Shark (Carcharhinus falciformis),132.881671,76.965197,4
Blacktip Shark (Carcharhinus limbatus),138.37891,80.316209,9
Hammerhead Shark (Sphyrna),126.547227,93.81328,18
Mako Shark (Isurus oxyrinchus),240.823046,82.472374,18
Blue Shark (Prionace glauca),243.634091,106.028852,27
White Shark (Carcharodon carcharias),1554.997406,147.128146,74
Tiger Shark (Galeocerdo cuvier),468.066552,119.180099,82


##### Biggest geographic box
Which shark has the biggest geographic box (largest area calculated from min lat/lon to max lat/lon, not dist_traveled)?

In [53]:
groups = df.groupby('id')
combined = pd.concat([groups.latitude.min(), 
                      groups.longitude.min(), 
                      groups.latitude.max(), 
                      groups.longitude.max(), 
                      groups.name.first(), 
                      groups.species.first()], axis=1).reset_index()
combined.columns = ['id', 'min_lat', 'min_lon', 'max_lat', 'max_lon', 'name', 'species']
combined.head()

Unnamed: 0,id,min_lat,min_lon,max_lat,max_lon,name,species
0,3,-34.88268,19.37459,-34.05394,22.64236,Oprah,White Shark (Carcharodon carcharias)
1,4,-36.703,20.535038,-34.063,22.74626,Albertina,White Shark (Carcharodon carcharias)
2,5,-37.23623,18.53635,-19.50057,37.84922,Helen,White Shark (Carcharodon carcharias)
3,6,-34.986,19.06158,-24.77363,34.84301,Brenda,White Shark (Carcharodon carcharias)
4,7,-35.461,17.91681,-32.743,27.97646,Madiba,White Shark (Carcharodon carcharias)


In [54]:
combined['lat_diff'] = combined.max_lat - combined.min_lat
combined['lon_diff'] = combined.max_lon - combined.min_lon
combined['area'] = combined['lat_diff'] * combined['lon_diff']
combined.sort_values('area', ascending=False).head()

Unnamed: 0,id,min_lat,min_lon,max_lat,max_lon,name,species,lat_diff,lon_diff,area
29,35,-41.37174,18.515,-6.15888,71.0983,Kathryn,White Shark (Carcharodon carcharias),35.21286,52.5833,1851.608381
36,56,23.53902,-81.3818,53.65843,-27.48272,Lydia,White Shark (Carcharodon carcharias),30.11941,53.89908,1623.408489
24,30,-43.21756,8.06196,-19.11709,66.72966,Vindication,White Shark (Carcharodon carcharias),24.10047,58.6677,1413.919144
19,25,-45.61157,18.23305,-14.95129,61.87323,Cyndi,White Shark (Carcharodon carcharias),30.66028,43.64018,1338.020138
30,36,-38.82461,17.47565,-10.52038,62.65514,Success,White Shark (Carcharodon carcharias),28.30423,45.17949,1278.770676


#### Hard answers



##### 

In [58]:
import geopandas
from shapely.geometry import Point

df['geometry'] = df.apply(lambda row: Point(row.latitude, row.longitude), axis=1)
df.head()

Unnamed: 0,active,datetime,id,latitude,longitude,name,gender,species,weight,length,tagDate,dist_total,geometry
0,1,2014-07-06 13:57:28+09:00,3,-34.60661,21.15244,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662,POINT (-34.60661 21.15244)
1,1,2014-06-23 11:40:09+09:00,3,-34.78752,19.42479,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662,POINT (-34.78752 19.42479)
2,1,2014-06-15 22:15:44+09:00,3,-34.42487,21.09754,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662,POINT (-34.42487 21.09754)
3,1,2014-06-03 11:23:57+09:00,3,-34.704323,20.210134,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662,POINT (-34.70432271674724 20.21013441406251)
4,1,2014-05-29 04:53:57+09:00,3,-34.65556,19.37459,Oprah,Female,White Shark (Carcharodon carcharias),686.0,118.0,7 March 2012,2816.662,POINT (-34.65555999999999 19.37459)


In [59]:
gpd = geopandas.GeoDataFrame(df.query("name == 'Emma'"))
gpd.shape

(345, 13)

In [60]:
gpd.plot()

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
df.head()

In [None]:
emma = df.query('name == "Emma"')
emma.shape

In [None]:
emma.head()

In [None]:
import geopandas

In [None]:
emma