Hands-On Data Analysis with Pandas, Stefanie Molin
# Chapter 2 Exercises
5/16/21  
Data: data/parsed.csv

These exercises involve exploring some historic earthquake data sourced from the US Geological Survey (USGS).

In [1]:
import numpy as np
import pandas as pd

## Initial look at the data

In [14]:
df = pd.read_csv('data/parsed.csv')

In [15]:
df.head(3)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,status,time,title,tsunami,type,types,tz,updated,url,parsed_place
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,automatic,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,California
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,automatic,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,California
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02137,28.0,21.0,",ci37389194,",3.42,ml,...,automatic,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...,California


In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9332 entries, 0 to 9331
Data columns (total 27 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   alert         59 non-null     object 
 1   cdi           329 non-null    float64
 2   code          9332 non-null   object 
 3   detail        9332 non-null   object 
 4   dmin          6139 non-null   float64
 5   felt          329 non-null    float64
 6   gap           6164 non-null   float64
 7   ids           9332 non-null   object 
 8   mag           9331 non-null   float64
 9   magType       9331 non-null   object 
 10  mmi           93 non-null     float64
 11  net           9332 non-null   object 
 12  nst           5364 non-null   float64
 13  place         9332 non-null   object 
 14  rms           9332 non-null   float64
 15  sig           9332 non-null   int64  
 16  sources       9332 non-null   object 
 17  status        9332 non-null   object 
 18  time          9332 non-null 

In [17]:
df.magType.nunique()

10

In [18]:
df.magType.value_counts()

ml       6803
md       1796
mb        601
mww        68
mb_lg      30
mwr        14
mh         12
mw          4
mwb         2
ms_20       1
Name: magType, dtype: int64

### Questions

#### 1. Find the 95th percentile of earthquake magnitude in Japan using the mb magnitude typs.

In [109]:
mb_df = df[(df['magType'] == 'mb') & (df['parsed_place']=='Japan')]

In [110]:
mb_df.magType.value_counts() # sanity check to confirm with value_counts on the complete df above. It checks out. 

mb    50
Name: magType, dtype: int64

In [111]:
mb_df.head(3)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,time,title,tsunami,type,types,tz,updated,url,parsed_place,in_ROF
67,,,1000hbqa,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.415,,72.0,",us1000hbqa,",4.6,mb,...,1539448501800,"M 4.6 - 160km NNW of Nago, Japan",0,earthquake,",geoserve,origin,phase-data,",480.0,1539449501040,https://earthquake.usgs.gov/earthquakes/eventp...,Japan,True
713,,,1000hah8,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.141,,82.0,",us1000hah8,",4.7,mb,...,1539238726290,"M 4.7 - 139km WSW of Naze, Japan",0,earthquake,",geoserve,origin,phase-data,",540.0,1539240344040,https://earthquake.usgs.gov/earthquakes/eventp...,Japan,True
1124,,,1000h9la,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.737,,135.0,",us1000h9la,",4.6,mb,...,1539115120470,"M 4.6 - 53km ESE of Kamaishi, Japan",0,earthquake,",geoserve,origin,phase-data,",540.0,1539119067040,https://earthquake.usgs.gov/earthquakes/eventp...,Japan,True


In [112]:
mb_df.mag.quantile(.95)

4.9

In [46]:
# alternatively
mb_df.mag.describe(percentiles=[.95])

count    601.000000
mean       4.563894
std        0.354134
min        3.600000
50%        4.500000
95%        5.200000
max        6.100000
Name: mag, dtype: float64

#### The 95th percentile earthquake in Japan of mb magnitude type is 4.9

---

#### 2. Find the percentage of earthquakes in Indonesia that were coupled with tsunamis.

1. Isolate the rows that report Indonesia

In [52]:
pd.options.display.max_rows=110
df.parsed_place.value_counts().sort_index()

Afghanistan                                    6
Alaska                                      3665
Argentina                                     11
Arizona                                        2
Arkansas                                       3
Ascension Island                               2
Australia                                      1
Azerbaijan                                     2
Balleny Islands                                1
Barbuda                                        1
Bolivia                                        4
British Virgin Islands                        21
Burma                                          2
California                                  2861
Canada                                        55
Carlsberg Ridge                                2
Central East Pacific Rise                      1
Central Mid-Atlantic Ridge                     1
Chile                                         31
China                                          8
Christmas Island    

From the output above, it looks like Indonesia has been isolated by parsed_place and has 147 records.

2. Run avg on the tsunami column.

In [97]:
round(df[df['parsed_place']=='Indonesia'].tsunami.mean(), 4)

0.2313

#### 23.13% of earthquakes in Indonesia are coupled with a tsunami. 

---

#### 3. Calculate summary stats for earthquakes in Nevada

In [61]:
df[df['parsed_place']=="Nevada"].describe()

Unnamed: 0,cdi,dmin,felt,gap,mag,mmi,nst,rms,sig,time,tsunami,tz,updated
count,15.0,681.0,15.0,681.0,681.0,1.0,681.0,681.0,681.0,681.0,681.0,681.0,681.0
mean,2.44,0.166199,2.4,153.66812,0.500073,2.84,12.618209,0.151986,10.970631,1538314000000.0,0.0,-480.0,1538402000000.0
std,0.501142,0.166228,4.626013,68.735302,0.69671,,9.866963,0.084662,19.60715,596563700.0,0.0,0.0,601095100.0
min,2.0,0.001,1.0,29.14,-0.5,2.84,3.0,0.0005,0.0,1537247000000.0,0.0,-480.0,1537307000000.0
25%,2.0,0.053,1.0,97.38,-0.1,2.84,6.0,0.1069,0.0,1537854000000.0,0.0,-480.0,1537928000000.0
50%,2.2,0.112,1.0,149.14,0.4,2.84,10.0,0.1463,2.0,1538280000000.0,0.0,-480.0,1538428000000.0
75%,2.9,0.233,1.0,199.72,0.9,2.84,16.0,0.1871,12.0,1538821000000.0,0.0,-480.0,1538878000000.0
max,3.3,1.414,19.0,355.91,2.9,2.84,61.0,0.8634,129.0,1539461000000.0,0.0,-480.0,1539483000000.0


---

#### 4. Add a column indicating whether the quake happened in a country or US state that is on the Ring of Fire. 
Use Alaska, Antarctica (look for Antarctic), Bolivia,
California, Canada, Chile, Costa Rica, Ecuador, Fiji, Guatemala, Indonesia, Japan,
Kermadec Islands, Mexico (be careful not to select New Mexico), New Zealand,
Peru, Philippines, Russia, Taiwan, Tonga, and Washington.

In [83]:
# manipulate the text of the prompt to make a boolean mask:
str = 'Alaska, Pacific-Antarctic Ridge, Western Indian-Antarctic Ridge, Bolivia, California, Canada, Chile, Costa Rica, Ecuador, Fiji, Guatemala, Indonesia, Japan, Kermadec Islands, Mexico, New Zealand, Peru, Philippines, Russia, Taiwan, Tonga, Washington'

In [103]:
# split by comma in the above string, and remove the leading space if there
rof = [s[1:] if s[0]==' ' else s for s in str.split(',')]
rof

['Alaska',
 'Pacific-Antarctic Ridge',
 'Western Indian-Antarctic Ridge',
 'Bolivia',
 'California',
 'Canada',
 'Chile',
 'Costa Rica',
 'Ecuador',
 'Fiji',
 'Guatemala',
 'Indonesia',
 'Japan',
 'Kermadec Islands',
 'Mexico',
 'New Zealand',
 'Peru',
 'Philippines',
 'Russia',
 'Taiwan',
 'Tonga',
 'Washington']

In [104]:
# Sanity check:
df[df['parsed_place'].isin(rof)].parsed_place.value_counts() # sanity check

Alaska                            3665
California                        2861
Washington                         157
Indonesia                          147
Fiji                                61
Japan                               57
Canada                              55
Mexico                              38
Russia                              33
Chile                               31
Philippines                         21
Tonga                               17
Peru                                14
New Zealand                         13
Bolivia                              4
Pacific-Antarctic Ridge              3
Ecuador                              3
Taiwan                               2
Guatemala                            2
Kermadec Islands                     2
Costa Rica                           1
Western Indian-Antarctic Ridge       1
Name: parsed_place, dtype: int64

In [105]:
# Finally, add a new column:
df['in_ROF'] = df['parsed_place'].isin(rof)

In [106]:
df.head(3)

Unnamed: 0,alert,cdi,code,detail,dmin,felt,gap,ids,mag,magType,...,time,title,tsunami,type,types,tz,updated,url,parsed_place,in_ROF
0,,,37389218,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.008693,,85.0,",ci37389218,",1.35,ml,...,1539475168010,"M 1.4 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475395144,https://earthquake.usgs.gov/earthquakes/eventp...,California,True
1,,,37389202,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02003,,79.0,",ci37389202,",1.29,ml,...,1539475129610,"M 1.3 - 9km NE of Aguanga, CA",0,earthquake,",geoserve,nearby-cities,origin,phase-data,",-480.0,1539475253925,https://earthquake.usgs.gov/earthquakes/eventp...,California,True
2,,4.4,37389194,https://earthquake.usgs.gov/fdsnws/event/1/que...,0.02137,28.0,21.0,",ci37389194,",3.42,ml,...,1539475062610,"M 3.4 - 8km NE of Aguanga, CA",0,earthquake,",dyfi,focal-mechanism,geoserve,nearby-cities,o...",-480.0,1539536756176,https://earthquake.usgs.gov/earthquakes/eventp...,California,True


---

#### 5. Calculate the number of earthquakes in the Ring of Fire locations and the number outside of them.

In [107]:
df.in_ROF.value_counts()

True     7188
False    2144
Name: in_ROF, dtype: int64

#### The data records 7,188 earthquakes within the Ring of Fire, and 2,144 outside of it. 

---

#### 6. Find the tsunami count along the Ring of Fire.

In [108]:
df[df['in_ROF'] == True].tsunami.sum()

45

#### The data shows 45 tsunamis along the Ring of Fire. 