# 4.1 Data Wrangling and Advanced Indexing

**Goal:** Build data wrangling skills to clean and navigate datasets.

**Outline:**
* Numpy Array and DataFrame methods and functions
* Table indexing and slicing
* Boolean (logical) indexing
* Sorting
* Data cleaning and inspection

## Additional Required Reading: Functions
Data 8 textbook "Computational and Inferential Thinking: The Foundations of Data Science" By Ani Adhikari and John DeNero [Chapter 8 Functions and Tables](https://www.inferentialthinking.com/chapters/08/Functions_and_Tables.html). This should overlap with your assigned reading for Data 8.

## Numpy Arrays and DataFrames

NumPy and Pandas offer several types of data structures, the two main structures that we use are `nparray` and `DataFrame`. A `nparray` is a fast and flexible container for large datasets that allows you to perform operations on whole blocks of data at once. `nparray`s are best suited for homogenous (just one type) numerical data. `DataFrames` are designed for tabular datasets, and can handle heterogenous data (multiple types: int, float, string, etc.).


In [1]:
import numpy as np
import pandas as pd

### nparray

Here is an example of a `nparray` with random float data:

In [2]:
# Generate a random nparray called arr_data
arr_data = np.random.randn(5,3)
arr_data

array([[ 0.35988552,  1.623691  ,  0.90489181],
       [ 1.72314484,  2.32527034,  0.77304889],
       [ 0.00665197,  0.39375186,  0.04524883],
       [ 1.13567391, -0.64897299,  0.80954756],
       [-0.77626325,  0.19721016, -1.41357468]])

The function `arrayName.shape` is useful for finding the number of rows and columns in an array.

In [3]:
# use .shape to determine the shape of arr_data
arr_data.shape

(5, 3)

`arrayName.dtype` will display the data type of the data stored in the array.

In [4]:
# use .dtype to determine the type of arr_data
arr_data.dtype

dtype('float64')

The functions `np.zeros` and `np.ones` are similar, they create arrays full of zeros and ones respectively. The input for these functions is the shape of the array you want. This is an effective way of setting up an array of place-holders that you can then fill in with a loop or to make an array of a single value.

In [5]:
# Generate a nparray of zeros with np.zeros
arr0 = np.zeros((4,4))
arr0

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [6]:
# Generate a nparray of ones with np.ones
arr1 = np.ones((4,4))
arr1

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [7]:
# np.ones is handy for making a nparray of any single value
arr5 = arr1 * 5
arr5

array([[5., 5., 5., 5.],
       [5., 5., 5., 5.],
       [5., 5., 5., 5.],
       [5., 5., 5., 5.]])

The functions `np.arange` and `np.linspace` can be used to make monotonic number lines. They are really useful! `np.arange` makes an array of integers or floats between the starting and ending (note that the ending point is exclusive) in steps that you set as inputs. `np.linspace` will make evenly spaced float points between the starting and ending (note that the ending point is inclusive) you set.

In [8]:
# Generate an array of integers between 0 and 10 in steps of 1, including 0 (start) but not 11 (end)
arr2 = np.arange(0,11,1) 
arr2

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [9]:
# Generate an array of floats between 0 and 10 in steps of 2, including 0 (start) but not 11 (end)
arr2 = np.arange(0.0,11.0,2.0) 
arr2

array([ 0.,  2.,  4.,  6.,  8., 10.])

In [10]:
# Generate an array of 14 evenly spaced numbers between 0 and 10, including 0 (start) and 10 (end).
arr3=np.linspace(0,10,14) 
arr3

array([ 0.        ,  0.76923077,  1.53846154,  2.30769231,  3.07692308,
        3.84615385,  4.61538462,  5.38461538,  6.15384615,  6.92307692,
        7.69230769,  8.46153846,  9.23076923, 10.        ])

### DataFrames 

`Series` and `DataFrames` are like nparrays but they have the added feature of index labels assigned to each row and column -- the bold labels in the below `DataFrame`. These labels can be used to bin and select data.

In [11]:
# generate a new DataFrame
# note that index values (like the column labels) don't have to integers and don't have to be in order
frame = pd.DataFrame(np.random.rand(3, 3), index=['Nevada','Montana','Arizona'], columns=['sedimentary','igneous','metamorphic'])
frame

Unnamed: 0,sedimentary,igneous,metamorphic
Nevada,0.49679,0.233385,0.948418
Montana,0.698572,0.072442,0.054833
Arizona,0.537125,0.935727,0.753275


We've seen `DataFrame` structures before in our tabular data files. The Earthquake catalog we were dealing with last week was a .csv (Comma Separated Variable) data file of all the earthquakes. We imported it as a DataFrame from the USGS API by setting up a query URL and using `pd.read_csv`. This time we'll look at the earthquakes of magnitude 2.5 and greater from the past week.

In [12]:
start_day = '2020-09-07'
end_day = '2020-09-14'
standard_url = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=magnitude'

query_url = standard_url + '&starttime=' + start_day + '&endtime=' + end_day + '&minmagnitude=2.5'
EQ_data = pd.read_csv(query_url)
EQ_data .head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-09-11T07:35:57.195Z,-21.3957,-69.9093,51.0,6.2,mww,,72.0,0.076,0.87,...,2020-10-06T15:10:12.394Z,"82 km NNE of Tocopilla, Chile",earthquake,6.3,1.9,0.071,19.0,reviewed,us,us
1,2020-09-12T02:44:10.969Z,38.7591,142.2473,32.09,6.1,mww,,47.0,2.223,0.95,...,2020-09-14T03:01:02.354Z,"57 km SE of Ōfunato, Japan",earthquake,7.3,3.5,0.057,30.0,reviewed,us,us
2,2020-09-07T06:12:39.710Z,-17.1086,168.4935,10.0,6.0,mww,,41.0,2.065,1.13,...,2020-10-04T13:09:52.650Z,"72 km NNE of Port-Vila, Vanuatu",earthquake,6.9,1.8,0.098,10.0,reviewed,us,us
3,2020-09-08T00:45:20.952Z,-4.9124,129.7627,174.05,5.9,mww,,12.0,3.173,0.78,...,2020-09-09T00:49:01.271Z,"197 km SSE of Amahai, Indonesia",earthquake,6.4,3.6,0.086,13.0,reviewed,us,us
4,2020-09-12T08:34:27.322Z,-17.2576,167.6786,10.0,5.9,mww,,64.0,1.857,0.7,...,2020-09-25T12:11:10.835Z,"85 km NW of Port-Vila, Vanuatu",earthquake,6.0,1.9,0.043,51.0,reviewed,us,us


We have seen referencing individual columns (which are called `Series`) with: `DataFrame['Column_Name']`.

In [13]:
EQ_data['depth']

0       51.00
1       32.09
2       10.00
3      174.05
4       10.00
        ...  
429      1.87
430     25.60
431      6.90
432     21.00
433     10.30
Name: depth, Length: 434, dtype: float64

The `.values` function can be used to return the values of the `Series` as a `nparray`, so without the labled index values of the `Series`.

In [14]:
print(type(EQ_data['depth']))

<class 'pandas.core.series.Series'>


In [15]:
EQ_data['depth'].values

array([5.1000e+01, 3.2090e+01, 1.0000e+01, 1.7405e+02, 1.0000e+01,
       1.8000e+01, 1.0000e+01, 1.0000e+01, 5.5657e+02, 3.5000e+01,
       6.1860e+01, 4.1170e+01, 1.0000e+01, 3.1524e+02, 1.3872e+02,
       1.0000e+01, 1.3604e+02, 1.0000e+01, 1.0000e+01, 5.7700e+01,
       1.0000e+01, 9.8250e+01, 1.3161e+02, 1.0000e+01, 1.0000e+01,
       1.0000e+01, 1.3131e+02, 1.4900e+01, 1.0000e+01, 7.3140e+01,
       1.0000e+01, 4.1180e+01, 8.7260e+01, 3.5000e+01, 1.0000e+01,
       1.0000e+01, 1.0000e+01, 1.0000e+01, 1.0000e+01, 6.0190e+01,
       1.3834e+02, 1.3670e+01, 4.1180e+01, 2.4900e+01, 1.0000e+01,
       9.1090e+01, 1.0000e+01, 1.0000e+01, 4.1160e+01, 4.6839e+02,
       1.0000e+01, 1.0000e+01, 1.1292e+02, 1.3864e+02, 1.0000e+01,
       1.0000e+01, 9.1600e+00, 1.0000e+01, 1.0000e+01, 1.0000e+01,
       6.7950e+01, 1.0000e+01, 1.0000e+01, 8.4110e+01, 1.0000e+01,
       1.0000e+01, 1.0000e+01, 1.5297e+02, 1.0000e+01, 1.0000e+01,
       1.0000e+01, 1.0000e+01, 1.0470e+01, 5.3357e+02, 1.0000e

In [16]:
type(EQ_data['depth'].values)

numpy.ndarray

## Indexing and Slicing

Arrays and dataframes have two axes of indices, rows and columns. Remember that python indexing starts at zero, and the end bounds are generally exclusive.

![indices](./figures/indices.png)
> Source: Python for Data Analysis (2nd Edition) McKinney, W.

<br>

Using square brackes we can select subsections of tables to work with:

![slicing](./figures/array_slicing.png)
> Source: Python for Data Analysis (2nd Edition) McKinney, W.

In [17]:
# generate a random array
arr_data = np.random.randn(10,5)
arr_data

array([[ 0.76823431,  1.81794074, -1.39667892, -0.25036848, -0.04705715],
       [ 1.04558631,  0.47485928,  0.66149437,  1.2167943 ,  1.16193451],
       [-2.37190187, -2.23530491,  1.77379018,  0.30269194,  1.77168498],
       [-0.17565099,  1.14573643, -0.98358351,  0.32682684, -0.38612276],
       [-0.64219194, -1.99047492, -1.05742762, -0.94308056, -0.04087562],
       [-0.59981829,  0.23629244,  0.38961944,  0.43837453,  0.88100058],
       [-0.08873591, -1.32929757, -0.63297316,  0.17815864,  2.71384836],
       [ 0.02942959, -1.09826491, -0.45508769,  0.48399063,  1.2632796 ],
       [-1.3883526 , -0.32795587, -0.04944992, -0.29276361,  0.02859621],
       [ 0.79424422, -0.79036533, -0.76636221, -0.74616752,  0.2708726 ]])

**slice out the first 3 rows of arr_data**

In [18]:
a = arr_data[:3]
a

array([[ 0.76823431,  1.81794074, -1.39667892, -0.25036848, -0.04705715],
       [ 1.04558631,  0.47485928,  0.66149437,  1.2167943 ,  1.16193451],
       [-2.37190187, -2.23530491,  1.77379018,  0.30269194,  1.77168498]])

**slice out the last 2 columns of arr_data**

In [19]:
b = arr_data[:,-2:]  
b

array([[-0.25036848, -0.04705715],
       [ 1.2167943 ,  1.16193451],
       [ 0.30269194,  1.77168498],
       [ 0.32682684, -0.38612276],
       [-0.94308056, -0.04087562],
       [ 0.43837453,  0.88100058],
       [ 0.17815864,  2.71384836],
       [ 0.48399063,  1.2632796 ],
       [-0.29276361,  0.02859621],
       [-0.74616752,  0.2708726 ]])

In [20]:
#Or this works too
b = arr_data[:,3:] 
b

array([[-0.25036848, -0.04705715],
       [ 1.2167943 ,  1.16193451],
       [ 0.30269194,  1.77168498],
       [ 0.32682684, -0.38612276],
       [-0.94308056, -0.04087562],
       [ 0.43837453,  0.88100058],
       [ 0.17815864,  2.71384836],
       [ 0.48399063,  1.2632796 ],
       [-0.29276361,  0.02859621],
       [-0.74616752,  0.2708726 ]])

Slicing a `DataFrame` is a bit different because you can reference the index labels and use `.iloc`.

**slice out the first 10 rows of EQ_data**

In [21]:
EQ_data.iloc[:10]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-09-11T07:35:57.195Z,-21.3957,-69.9093,51.0,6.2,mww,,72.0,0.076,0.87,...,2020-10-06T15:10:12.394Z,"82 km NNE of Tocopilla, Chile",earthquake,6.3,1.9,0.071,19.0,reviewed,us,us
1,2020-09-12T02:44:10.969Z,38.7591,142.2473,32.09,6.1,mww,,47.0,2.223,0.95,...,2020-09-14T03:01:02.354Z,"57 km SE of Ōfunato, Japan",earthquake,7.3,3.5,0.057,30.0,reviewed,us,us
2,2020-09-07T06:12:39.710Z,-17.1086,168.4935,10.0,6.0,mww,,41.0,2.065,1.13,...,2020-10-04T13:09:52.650Z,"72 km NNE of Port-Vila, Vanuatu",earthquake,6.9,1.8,0.098,10.0,reviewed,us,us
3,2020-09-08T00:45:20.952Z,-4.9124,129.7627,174.05,5.9,mww,,12.0,3.173,0.78,...,2020-09-09T00:49:01.271Z,"197 km SSE of Amahai, Indonesia",earthquake,6.4,3.6,0.086,13.0,reviewed,us,us
4,2020-09-12T08:34:27.322Z,-17.2576,167.6786,10.0,5.9,mww,,64.0,1.857,0.7,...,2020-09-25T12:11:10.835Z,"85 km NW of Port-Vila, Vanuatu",earthquake,6.0,1.9,0.043,51.0,reviewed,us,us
5,2020-09-09T07:18:40.873Z,4.2064,126.6394,18.0,5.8,mww,,32.0,3.033,1.61,...,2020-09-11T18:42:17.040Z,"185 km SE of Sarangani, Philippines",earthquake,4.1,3.1,0.053,34.0,reviewed,us,us
6,2020-09-07T06:29:14.938Z,-17.1622,168.5076,10.0,5.7,mww,,41.0,2.116,0.83,...,2020-10-04T09:05:11.136Z,"66 km NNE of Port-Vila, Vanuatu",earthquake,7.0,1.8,0.05,38.0,reviewed,us,us
7,2020-09-07T17:40:44.171Z,-24.512,-111.9898,10.0,5.6,mww,,34.0,3.5,1.1,...,2020-09-11T16:20:37.144Z,Easter Island region,earthquake,10.2,1.8,0.066,22.0,reviewed,us,us
8,2020-09-12T02:37:29.712Z,-17.8809,-178.0547,556.57,5.6,mww,,36.0,3.553,1.1,...,2020-09-13T02:41:07.273Z,"279 km E of Levuka, Fiji",earthquake,8.9,6.4,0.089,12.0,reviewed,us,us
9,2020-09-09T03:41:18.641Z,4.1712,126.733,35.0,5.4,mww,,47.0,3.1,0.96,...,2020-09-10T03:44:21.006Z,"195 km SE of Sarangani, Philippines",earthquake,7.5,1.9,0.08,15.0,reviewed,us,us


**slice out the a chunk of depths starting at index 5 and up to (but excluding) index 10**

In [22]:
EQ_data.iloc[5:10]['depth']

5     18.00
6     10.00
7     10.00
8    556.57
9     35.00
Name: depth, dtype: float64

Notice that this is still a `Series` with corresponding index values. If you just want the values from that chunk and not the index labels use `.values`.

In [23]:
EQ_data.iloc[5:10]['depth'].values

array([ 18.  ,  10.  ,  10.  , 556.57,  35.  ])

## Boolean Indexing

We can use Boolean (i.e. logical) indexing to select values from our DataFrame where the argument we want is `True`. You'll use the logical symbols (`<`,`>`,`==`,`&`,`|`,`~`).

**Use Boolean Indexing to filter out data so that we are only looking at rows with magnitudes larger than or equal to 6.0**

In [24]:
EQ_data[EQ_data['mag']>=6.0]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-09-11T07:35:57.195Z,-21.3957,-69.9093,51.0,6.2,mww,,72.0,0.076,0.87,...,2020-10-06T15:10:12.394Z,"82 km NNE of Tocopilla, Chile",earthquake,6.3,1.9,0.071,19.0,reviewed,us,us
1,2020-09-12T02:44:10.969Z,38.7591,142.2473,32.09,6.1,mww,,47.0,2.223,0.95,...,2020-09-14T03:01:02.354Z,"57 km SE of Ōfunato, Japan",earthquake,7.3,3.5,0.057,30.0,reviewed,us,us
2,2020-09-07T06:12:39.710Z,-17.1086,168.4935,10.0,6.0,mww,,41.0,2.065,1.13,...,2020-10-04T13:09:52.650Z,"72 km NNE of Port-Vila, Vanuatu",earthquake,6.9,1.8,0.098,10.0,reviewed,us,us


## Sorting

DataFrames can be sorted by the values in a given column (`.sort_values`).

In [25]:
EQ_data.sort_values(by=['depth']).head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
347,2020-09-10T12:09:31.800Z,44.322333,-110.520833,0.54,2.8,md,10.0,118.0,0.05854,0.27,...,2020-09-10T20:52:41.360Z,"59 km SE of West Yellowstone, Montana",earthquake,0.82,0.4,0.2,4.0,reviewed,uu,uu
383,2020-09-10T18:33:51.740Z,44.333333,-110.5055,1.19,2.62,ml,17.0,125.0,0.07115,0.18,...,2020-09-10T19:12:59.040Z,"60 km SE of West Yellowstone, Montana",earthquake,0.57,0.5,0.264,8.0,reviewed,uu,uu
411,2020-09-10T11:40:22.540Z,44.320333,-110.497,1.2,2.56,ml,19.0,134.0,0.06052,0.21,...,2020-09-10T17:43:55.390Z,"61 km SE of West Yellowstone, Montana",earthquake,0.83,0.73,0.321,8.0,reviewed,uu,uu
349,2020-09-11T05:16:42.570Z,54.3193,-160.8752,1.4,2.8,ml,,209.0,0.907,0.6,...,2020-09-25T07:35:41.040Z,"115 km SSW of Sand Point, Alaska",earthquake,4.7,9.7,0.085,18.0,reviewed,us,us
264,2020-09-08T12:35:17.384Z,38.1497,-118.0383,1.8,3.5,ml,33.0,42.76,0.015,0.1369,...,2020-10-05T10:45:37.040Z,"27 km SSE of Mina, Nevada",earthquake,,0.4,0.36,18.0,reviewed,nn,nn


You can reverse the order of sorting with `ascending=False`.

In [26]:
EQ_data.sort_values(by=['depth'],ascending=False).head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
158,2020-09-07T22:22:20.306Z,-20.8541,-178.6937,605.2,4.3,mb,,115.0,4.359,0.82,...,2020-09-14T04:12:05.040Z,Fiji region,earthquake,14.9,8.6,0.116,21.0,reviewed,us,us
227,2020-09-13T03:50:49.167Z,-21.8264,-179.421,594.51,4.1,mb,,112.0,4.705,0.73,...,2020-09-18T01:50:13.040Z,Fiji region,earthquake,12.9,8.5,0.115,21.0,reviewed,us,us
228,2020-09-13T04:05:15.875Z,6.0099,123.734,556.61,4.1,mb,,146.0,5.276,0.6,...,2020-09-18T01:56:50.040Z,"54 km WSW of Palimbang, Philippines",earthquake,11.3,15.2,0.198,7.0,reviewed,us,us
8,2020-09-12T02:37:29.712Z,-17.8809,-178.0547,556.57,5.6,mww,,36.0,3.553,1.1,...,2020-09-13T02:41:07.273Z,"279 km E of Levuka, Fiji",earthquake,8.9,6.4,0.089,12.0,reviewed,us,us
225,2020-09-11T04:25:44.348Z,5.9897,123.8273,539.45,4.1,mb,,107.0,5.72,0.41,...,2020-09-16T23:11:57.040Z,"46 km WSW of Palimbang, Philippines",earthquake,11.5,11.9,0.126,17.0,reviewed,us,us


## Data cleaning and inspection

`.drop()` can be used to drop whole columns from a DataFrame.

In [27]:
EQ_data_concise = EQ_data.drop(['magType','nst','gap','dmin','rms','net','id','updated','place','type','horizontalError','depthError','magError','magNst','status','locationSource','magSource',], axis='columns')
EQ_data_concise.head()

Unnamed: 0,time,latitude,longitude,depth,mag
0,2020-09-11T07:35:57.195Z,-21.3957,-69.9093,51.0,6.2
1,2020-09-12T02:44:10.969Z,38.7591,142.2473,32.09,6.1
2,2020-09-07T06:12:39.710Z,-17.1086,168.4935,10.0,6.0
3,2020-09-08T00:45:20.952Z,-4.9124,129.7627,174.05,5.9
4,2020-09-12T08:34:27.322Z,-17.2576,167.6786,10.0,5.9


`.unique()` returns the unique values from the specified object.

In [28]:
unique_mags = EQ_data_concise['mag'].unique()
unique_mags.sort()
unique_mags

array([2.5 , 2.51, 2.53, 2.54, 2.55, 2.56, 2.57, 2.58, 2.6 , 2.61, 2.62,
       2.63, 2.64, 2.65, 2.66, 2.68, 2.69, 2.7 , 2.74, 2.75, 2.76, 2.77,
       2.78, 2.79, 2.8 , 2.81, 2.82, 2.84, 2.86, 2.87, 2.9 , 2.91, 2.92,
       2.95, 2.96, 2.97, 2.98, 2.99, 3.  , 3.01, 3.02, 3.03, 3.06, 3.1 ,
       3.17, 3.18, 3.2 , 3.21, 3.23, 3.26, 3.3 , 3.32, 3.34, 3.42, 3.43,
       3.44, 3.46, 3.5 , 3.53, 3.59, 3.6 , 3.7 , 3.71, 3.72, 3.74, 3.75,
       3.83, 3.9 , 4.  , 4.1 , 4.2 , 4.22, 4.3 , 4.33, 4.36, 4.4 , 4.5 ,
       4.6 , 4.7 , 4.8 , 4.9 , 5.  , 5.1 , 5.2 , 5.3 , 5.4 , 5.6 , 5.7 ,
       5.8 , 5.9 , 6.  , 6.1 , 6.2 ])

`.value_counts()` returns the count of each unique value from the specified object. This functionality can be used to find duplicate values.

In [29]:
EQ_data_concise['mag'].value_counts()

4.40    33
4.30    31
4.60    28
4.50    27
4.20    27
        ..
2.77     1
6.20     1
5.80     1
4.33     1
6.00     1
Name: mag, Length: 93, dtype: int64

### Finding missing data (NaNs)

NaN stands for not a number and is used as a placeholder in data tables where no value exists. `np.isnan` returns a boolean object with True where NaNs appear in the DataFrame.

In [30]:
np.isnan(EQ_data['nst'])

0       True
1       True
2       True
3       True
4       True
       ...  
429    False
430     True
431    False
432     True
433     True
Name: nst, Length: 434, dtype: bool

In [31]:
~np.isnan(EQ_data['nst'])

0      False
1      False
2      False
3      False
4      False
       ...  
429     True
430    False
431     True
432    False
433    False
Name: nst, Length: 434, dtype: bool

You can use this boolean object to filter-out rows that contain NaNs.

In [32]:
EQ_data[~np.isnan(EQ_data['nst'])]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
152,2020-09-12T23:22:04.540Z,19.304600,-64.389600,40.00,4.36,md,19.0,233.0,1.86560,0.4800,...,2020-09-25T03:01:38.040Z,"115 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,3.57,29.63,0.060000,8.0,reviewed,pr,pr
153,2020-09-13T07:12:58.980Z,19.396000,-64.284100,35.00,4.33,md,26.0,234.0,2.00520,0.2900,...,2020-09-13T15:15:36.449Z,"129 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,1.81,21.72,0.130000,16.0,reviewed,pr,pr
185,2020-09-11T07:55:45.450Z,36.440667,-117.994500,3.93,4.22,ml,45.0,66.0,0.06908,0.1800,...,2020-09-25T10:09:14.879Z,"18km SSE of Lone Pine, CA",earthquake,0.18,0.57,0.140000,320.0,reviewed,ci,ci
246,2020-09-08T01:19:54.510Z,19.121300,-64.398500,38.00,3.83,md,17.0,338.0,1.74990,0.4000,...,2020-09-09T00:54:19.040Z,"96 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,3.23,24.52,0.110000,14.0,reviewed,pr,pr
247,2020-09-08T06:35:40.270Z,19.316600,-64.570800,37.00,3.83,md,20.0,335.0,1.73570,0.4100,...,2020-09-08T08:09:44.040Z,"111 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,3.32,26.87,0.120000,10.0,reviewed,pr,pr
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
417,2020-09-08T11:03:53.600Z,18.027300,-66.828000,16.00,2.51,md,12.0,117.0,0.07310,0.1500,...,2020-09-08T11:14:28.314Z,"2 km ESE of Yauco, Puerto Rico",earthquake,0.58,0.73,0.150000,5.0,reviewed,pr,pr
418,2020-09-13T17:55:42.130Z,19.182333,-155.475833,32.80,2.51,md,59.0,82.0,,0.1200,...,2020-09-14T23:42:34.030Z,"2 km S of Pāhala, Hawaii",earthquake,0.44,0.60,0.152709,27.0,reviewed,hv,hv
419,2020-09-07T08:24:04.300Z,17.932000,-66.947100,13.00,2.50,md,16.0,221.0,0.08040,0.1300,...,2020-09-07T09:46:21.114Z,"6 km SW of Guánica, Puerto Rico",earthquake,0.79,0.30,0.120000,11.0,reviewed,pr,pr
429,2020-09-11T12:23:38.700Z,44.315167,-110.494500,1.87,2.50,ml,15.0,144.0,0.05648,0.1800,...,2020-09-11T14:39:34.760Z,"61 km SE of West Yellowstone, Montana",earthquake,0.50,2.06,0.377000,8.0,reviewed,uu,uu


## Further Reading (Optional)

This user guide has lots of useful examples and documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html