# 4.1 Data Wrangling and Advanced Indexing

**Goal:** Build data wrangling skills to clean and navigate datasets.

**Outline:**
* Numpy Array and DataFrame methods and functions
* Table indexing and slicing
* Boolean (logical) indexing
* Sorting
* Data cleaning and inspection

## Additional Required Reading: Functions
Data 8 textbook "Computational and Inferential Thinking: The Foundations of Data Science" By Ani Adhikari and John DeNero [Chapter 8 Functions and Tables](https://www.inferentialthinking.com/chapters/08/Functions_and_Tables.html). This should overlap with your assigned reading for Data 8.

## Numpy Arrays and DataFrames

NumPy and Pandas offer several types of data structures, the two main structures that we use are `nparray` and `DataFrame`. A `nparray` is a fast and flexible container for large datasets that allows you to perform operations on whole blocks of data at once. `nparray`s are best suited for homogenous (just one type) numerical data. `DataFrames` are designed for tabular datasets, and can handle heterogenous data (multiple types: int, float, string, etc.).


In [1]:
import numpy as np
import pandas as pd

### nparray

Here is an example of a `nparray` with random float data:

In [2]:
# Generate a random nparray called arr_data
arr_data = np.random.randn(5,3)
arr_data

array([[-0.82903944, -0.1376641 ,  0.02674124],
       [ 0.20267632, -0.29280752, -0.75898888],
       [-0.10239423, -0.28657326,  1.78059789],
       [-0.912293  ,  0.35167243,  0.3479555 ],
       [ 0.2724738 ,  0.87051685,  0.18873814]])

The function `arrayName.shape` is useful for finding the number of rows and columns in an array.

In [3]:
# use .shape to determine the shape of arr_data
arr_data.shape

(5, 3)

`arrayName.dtype` will display the data type of the data stored in the array.

In [4]:
# use .dtype to determine the type of arr_data
arr_data.dtype

dtype('float64')

The functions `np.zeros` and `np.ones` are similar, they create arrays full of zeros and ones respectively. The input for these functions is the shape of the array you want. This is an effective way of setting up an array of place-holders that you can then fill in with a loop or to make an array of a single value.

In [5]:
# Generate a nparray of zeros with np.zeros
arr0 = np.zeros((4,4))
arr0

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [6]:
# Generate a nparray of ones with np.ones
arr1 = np.ones((4,4))
arr1

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [7]:
# np.ones is handy for making a nparray of any single value
arr5 = arr1 * 5
arr5

array([[5., 5., 5., 5.],
       [5., 5., 5., 5.],
       [5., 5., 5., 5.],
       [5., 5., 5., 5.]])

The functions `np.arange` and `np.linspace` can be used to make monotonic number lines. They are really useful! `np.arange` makes an array of integers or floats between the starting and ending (note that the ending point is exclusive) in steps that you set as inputs. `np.linspace` will make evenly spaced float points between the starting and ending (note that the ending point is inclusive) you set.

In [8]:
# Generate an array of integers between 0 and 10 in steps of 1, including 0 (start) but not 11 (end)
arr2 = np.arange(0,11,1) 
arr2

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [9]:
# Generate an array of floats between 0 and 10 in steps of 2, including 0 (start) but not 11 (end)
arr2 = np.arange(0.0,11.0,2.0) 
arr2

array([ 0.,  2.,  4.,  6.,  8., 10.])

In [10]:
# Generate an array of 14 evenly spaced numbers between 0 and 10, including 0 (start) and 10 (end).
arr3=np.linspace(0,10,14) 
arr3

array([ 0.        ,  0.76923077,  1.53846154,  2.30769231,  3.07692308,
        3.84615385,  4.61538462,  5.38461538,  6.15384615,  6.92307692,
        7.69230769,  8.46153846,  9.23076923, 10.        ])

### DataFrames 

`Series` and `DataFrames` are like nparrays but they have the added feature of index labels assigned to each row and column -- the bold labels in the below `DataFrame`. These labels can be used to bin and select data.

In [11]:
# generate a new DataFrame
# note that index values (like the column labels) don't have to integers and don't have to be in order
frame = pd.DataFrame(np.random.rand(3, 3), index=['Nevada','Montana','Arizona'], columns=['sedimentary','igneous','metamorphic'])
frame

Unnamed: 0,sedimentary,igneous,metamorphic
Nevada,0.849493,0.146249,0.328204
Montana,0.74236,0.379321,0.726974
Arizona,0.947756,0.38733,0.796592


We've seen `DataFrame` structures before in our tabular data files. The Earthquake catalog we were dealing with last week was a .csv (Comma Separated Variable) data file of all the earthquakes. We imported it as a DataFrame from the USGS API by setting up a query URL and using `pd.read_csv`. This time we'll look at the earthquakes of magnitude 2.5 and greater from the past week.

In [12]:
start_day = '2020-09-07'
end_day = '2020-09-14'
standard_url = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=magnitude'

query_url = standard_url + '&starttime=' + start_day + '&endtime=' + end_day + '&minmagnitude=2.5'
EQ_data = pd.read_csv(query_url)
EQ_data .head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-09-07T06:12:39.688Z,-17.1102,168.5034,10.0,6.2,mww,,41.0,2.072,1.03,...,2020-11-29T09:47:05.444Z,"72 km NNE of Port-Vila, Vanuatu",earthquake,6.9,1.8,0.058,29.0,reviewed,us,us
1,2020-09-11T07:35:57.187Z,-21.3968,-69.9096,51.0,6.2,mww,,72.0,0.077,0.88,...,2020-11-21T20:10:36.040Z,"82 km NNE of Tocopilla, Chile",earthquake,6.3,1.9,0.071,19.0,reviewed,us,us
2,2020-09-12T02:44:11.224Z,38.7482,142.2446,34.0,6.1,mww,,47.0,2.232,0.97,...,2020-11-21T20:10:37.040Z,"58 km SE of Ōfunato, Japan",earthquake,5.7,1.7,0.057,30.0,reviewed,us,us
3,2020-09-08T00:45:20.853Z,-4.8713,129.7548,172.0,5.9,mww,,13.0,3.154,0.78,...,2020-11-17T19:44:53.040Z,"193 km SSE of Amahai, Indonesia",earthquake,5.3,1.8,0.098,10.0,reviewed,us,us
4,2020-09-12T08:34:27.321Z,-17.2562,167.6792,10.0,5.9,mww,,64.0,1.856,0.69,...,2020-11-21T20:10:38.040Z,"85 km NW of Port-Vila, Vanuatu",earthquake,6.8,1.9,0.043,51.0,reviewed,us,us


We have seen referencing individual columns (which are called `Series`) with: `DataFrame['Column_Name']`.

In [13]:
EQ_data['depth']

0       10.00
1       51.00
2       34.00
3      172.00
4       10.00
        ...  
446     36.40
447      1.87
448     25.60
449      6.90
450     24.04
Name: depth, Length: 451, dtype: float64

The `.values` function can be used to return the values of the `Series` as a `nparray`, so without the labled index values of the `Series`.

In [14]:
print(type(EQ_data['depth']))

<class 'pandas.core.series.Series'>


In [15]:
EQ_data['depth'].values

array([1.0000e+01, 5.1000e+01, 3.4000e+01, 1.7200e+02, 1.0000e+01,
       1.0000e+01, 1.7000e+01, 1.0000e+01, 5.5966e+02, 2.5000e+01,
       1.0000e+01, 3.0920e+01, 3.1061e+02, 1.3893e+02, 5.8030e+01,
       1.3380e+02, 1.0000e+01, 1.0000e+01, 5.7700e+01, 1.0000e+01,
       1.3161e+02, 1.0000e+01, 1.0000e+01, 3.8760e+01, 3.5000e+01,
       1.0000e+01, 1.2154e+02, 1.4900e+01, 1.0000e+01, 1.0125e+02,
       7.3140e+01, 1.0000e+01, 3.5000e+01, 1.0000e+01, 1.0000e+01,
       1.0000e+01, 1.0000e+01, 1.0000e+01, 6.0190e+01, 1.0956e+02,
       1.3670e+01, 4.1180e+01, 2.5640e+01, 1.0000e+01, 9.1090e+01,
       1.0000e+01, 1.0000e+01, 4.1160e+01, 4.6839e+02, 1.0000e+01,
       1.0000e+01, 1.1292e+02, 1.0000e+01, 1.0000e+01, 9.9500e+00,
       1.0000e+01, 1.0000e+01, 6.9820e+01, 1.0000e+01, 1.0000e+01,
       1.0000e+01, 1.5297e+02, 1.0000e+01, 1.0000e+01, 1.0000e+01,
       1.0000e+01, 1.0470e+01, 5.3357e+02, 1.0000e+01, 1.0000e+01,
       1.0000e+01, 1.0000e+01, 6.3290e+01, 1.0000e+01, 1.0000e

In [16]:
type(EQ_data['depth'].values)

numpy.ndarray

## Indexing and Slicing

Arrays and dataframes have two axes of indices, rows and columns. Remember that python indexing starts at zero, and the end bounds are generally exclusive.

![indices](./figures/indices.png)
> Source: Python for Data Analysis (2nd Edition) McKinney, W.

<br>

Using square brackes we can select subsections of tables to work with:

![slicing](./figures/array_slicing.png)
> Source: Python for Data Analysis (2nd Edition) McKinney, W.

In [17]:
# generate a random array
arr_data = np.random.randn(10,5)
arr_data

array([[-0.65625165,  2.38599685,  0.04937532, -0.76770087,  0.15411844],
       [ 2.85665275, -1.0261144 ,  1.09814647,  0.11767113, -1.9833632 ],
       [ 0.18385078,  1.29815227, -0.18665609,  2.39560559, -1.11347955],
       [ 0.68654461, -0.78666554, -0.37731806, -0.08812208, -1.17251183],
       [-1.59634893,  0.88657028, -0.17185229, -0.53773366,  0.74341805],
       [-0.07283298, -0.51451318, -0.59995961, -2.60777899,  1.27196394],
       [ 1.36863487,  2.29502926, -0.40297136,  1.5526885 , -2.74818569],
       [-0.35840188,  0.44278548, -1.53940587,  1.36011313,  0.99834029],
       [ 0.39874941, -2.81366946,  0.58187133,  0.49839446,  1.00206399],
       [ 0.01078945, -0.45778293,  0.40168924,  1.35650175,  0.44681168]])

**slice out the first 3 rows of arr_data**

In [18]:
a = arr_data[:3]
a

array([[-0.65625165,  2.38599685,  0.04937532, -0.76770087,  0.15411844],
       [ 2.85665275, -1.0261144 ,  1.09814647,  0.11767113, -1.9833632 ],
       [ 0.18385078,  1.29815227, -0.18665609,  2.39560559, -1.11347955]])

**slice out the last 2 columns of arr_data**

In [19]:
b = arr_data[:,-2:]  
b

array([[-0.76770087,  0.15411844],
       [ 0.11767113, -1.9833632 ],
       [ 2.39560559, -1.11347955],
       [-0.08812208, -1.17251183],
       [-0.53773366,  0.74341805],
       [-2.60777899,  1.27196394],
       [ 1.5526885 , -2.74818569],
       [ 1.36011313,  0.99834029],
       [ 0.49839446,  1.00206399],
       [ 1.35650175,  0.44681168]])

In [20]:
#Or this works too
b = arr_data[:,3:] 
b

array([[-0.76770087,  0.15411844],
       [ 0.11767113, -1.9833632 ],
       [ 2.39560559, -1.11347955],
       [-0.08812208, -1.17251183],
       [-0.53773366,  0.74341805],
       [-2.60777899,  1.27196394],
       [ 1.5526885 , -2.74818569],
       [ 1.36011313,  0.99834029],
       [ 0.49839446,  1.00206399],
       [ 1.35650175,  0.44681168]])

Slicing a `DataFrame` is a bit different because you can reference the index labels and use `.iloc`.

**slice out the first 10 rows of EQ_data**

In [21]:
EQ_data.iloc[:10]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-09-07T06:12:39.688Z,-17.1102,168.5034,10.0,6.2,mww,,41.0,2.072,1.03,...,2020-11-29T09:47:05.444Z,"72 km NNE of Port-Vila, Vanuatu",earthquake,6.9,1.8,0.058,29.0,reviewed,us,us
1,2020-09-11T07:35:57.187Z,-21.3968,-69.9096,51.0,6.2,mww,,72.0,0.077,0.88,...,2020-11-21T20:10:36.040Z,"82 km NNE of Tocopilla, Chile",earthquake,6.3,1.9,0.071,19.0,reviewed,us,us
2,2020-09-12T02:44:11.224Z,38.7482,142.2446,34.0,6.1,mww,,47.0,2.232,0.97,...,2020-11-21T20:10:37.040Z,"58 km SE of Ōfunato, Japan",earthquake,5.7,1.7,0.057,30.0,reviewed,us,us
3,2020-09-08T00:45:20.853Z,-4.8713,129.7548,172.0,5.9,mww,,13.0,3.154,0.78,...,2020-11-17T19:44:53.040Z,"193 km SSE of Amahai, Indonesia",earthquake,5.3,1.8,0.098,10.0,reviewed,us,us
4,2020-09-12T08:34:27.321Z,-17.2562,167.6792,10.0,5.9,mww,,64.0,1.856,0.69,...,2020-11-21T20:10:38.040Z,"85 km NW of Port-Vila, Vanuatu",earthquake,6.8,1.9,0.043,51.0,reviewed,us,us
5,2020-09-07T06:29:14.938Z,-17.1622,168.5076,10.0,5.7,mww,,41.0,2.116,0.83,...,2020-11-17T19:44:50.040Z,"66 km NNE of Port-Vila, Vanuatu",earthquake,7.0,1.8,0.05,38.0,reviewed,us,us
6,2020-09-09T07:18:40.291Z,4.1773,126.6447,17.0,5.7,mww,,32.0,3.062,1.17,...,2020-11-21T20:10:33.040Z,"188 km SE of Sarangani, Philippines",earthquake,6.8,1.7,0.089,12.0,reviewed,us,us
7,2020-09-07T17:40:44.173Z,-24.5115,-111.9893,10.0,5.6,mww,,34.0,3.501,1.1,...,2020-11-17T19:44:52.040Z,Easter Island region,earthquake,10.1,1.8,0.066,22.0,reviewed,us,us
8,2020-09-12T02:37:29.903Z,-17.8804,-178.0054,559.66,5.6,mww,,36.0,3.554,1.06,...,2020-11-21T20:10:37.040Z,"284 km E of Levuka, Fiji",earthquake,9.1,4.9,0.093,11.0,reviewed,us,us
9,2020-09-08T08:28:53.492Z,-15.1737,-172.9796,25.0,5.4,mww,,54.0,1.71,0.93,...,2020-11-17T19:44:55.040Z,"123 km NE of Hihifo, Tonga",earthquake,7.2,1.8,0.093,11.0,reviewed,us,us


**slice out the a chunk of depths starting at index 5 and up to (but excluding) index 10**

In [22]:
EQ_data.iloc[5:10]['depth']

5     10.00
6     17.00
7     10.00
8    559.66
9     25.00
Name: depth, dtype: float64

Notice that this is still a `Series` with corresponding index values. If you just want the values from that chunk and not the index labels use `.values`.

In [23]:
EQ_data.iloc[5:10]['depth'].values

array([ 10.  ,  17.  ,  10.  , 559.66,  25.  ])

## Boolean Indexing

We can use Boolean (i.e. logical) indexing to select values from our DataFrame where the argument we want is `True`. You'll use the logical symbols (`<`,`>`,`==`,`&`,`|`,`~`).

**Use Boolean Indexing to filter out data so that we are only looking at rows with magnitudes larger than or equal to 6.0**

In [24]:
EQ_data[EQ_data['mag']>=6.0]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2020-09-07T06:12:39.688Z,-17.1102,168.5034,10.0,6.2,mww,,41.0,2.072,1.03,...,2020-11-29T09:47:05.444Z,"72 km NNE of Port-Vila, Vanuatu",earthquake,6.9,1.8,0.058,29.0,reviewed,us,us
1,2020-09-11T07:35:57.187Z,-21.3968,-69.9096,51.0,6.2,mww,,72.0,0.077,0.88,...,2020-11-21T20:10:36.040Z,"82 km NNE of Tocopilla, Chile",earthquake,6.3,1.9,0.071,19.0,reviewed,us,us
2,2020-09-12T02:44:11.224Z,38.7482,142.2446,34.0,6.1,mww,,47.0,2.232,0.97,...,2020-11-21T20:10:37.040Z,"58 km SE of Ōfunato, Japan",earthquake,5.7,1.7,0.057,30.0,reviewed,us,us


## Sorting

DataFrames can be sorted by the values in a given column (`.sort_values`).

In [25]:
EQ_data.sort_values(by=['depth']).head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
298,2020-09-11T19:15:33.180Z,43.5288,-105.3322,0.0,3.2,ml,,68.0,1.109,0.75,...,2020-11-21T20:10:49.040Z,"27 km SSE of Wright, Wyoming",mining explosion,2.5,1.8,0.053,46.0,reviewed,us,us
347,2020-09-12T16:00:45.986Z,43.9949,-105.3898,0.0,2.9,ml,,100.0,0.984,0.34,...,2020-11-21T20:10:49.040Z,"26 km SSE of Antelope Valley-Crestview, Wyoming",mining explosion,6.7,1.8,0.097,14.0,reviewed,us,us
365,2020-09-10T12:09:31.800Z,44.322333,-110.520833,0.54,2.8,md,10.0,118.0,0.05854,0.27,...,2020-11-21T20:10:35.040Z,"59 km SE of West Yellowstone, Montana",earthquake,0.82,0.4,0.2,4.0,reviewed,uu,uu
402,2020-09-10T18:33:51.740Z,44.333333,-110.5055,1.19,2.62,ml,17.0,125.0,0.07115,0.18,...,2020-11-21T20:10:35.040Z,"60 km SE of West Yellowstone, Montana",earthquake,0.57,0.5,0.264,8.0,reviewed,uu,uu
429,2020-09-10T11:40:22.540Z,44.320333,-110.497,1.2,2.56,ml,19.0,134.0,0.06052,0.21,...,2020-11-21T20:10:34.040Z,"61 km SE of West Yellowstone, Montana",earthquake,0.83,0.73,0.321,8.0,reviewed,uu,uu


You can reverse the order of sorting with `ascending=False`.

In [26]:
EQ_data.sort_values(by=['depth'],ascending=False).head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
168,2020-09-07T22:22:20.306Z,-20.8541,-178.6937,605.2,4.3,mb,,115.0,4.359,0.82,...,2020-11-17T19:44:53.040Z,Fiji region,earthquake,14.9,8.6,0.116,21.0,reviewed,us,us
241,2020-09-13T03:50:49.167Z,-21.8264,-179.421,594.51,4.1,mb,,112.0,4.705,0.73,...,2020-11-21T20:10:46.040Z,Fiji region,earthquake,12.9,8.5,0.115,21.0,reviewed,us,us
8,2020-09-12T02:37:29.903Z,-17.8804,-178.0054,559.66,5.6,mww,,36.0,3.554,1.06,...,2020-11-21T20:10:37.040Z,"284 km E of Levuka, Fiji",earthquake,9.1,4.9,0.093,11.0,reviewed,us,us
191,2020-09-13T03:28:35.972Z,-24.1523,-179.9576,550.77,4.3,mb,,143.0,17.743,0.28,...,2020-11-21T20:10:46.040Z,south of the Fiji Islands,earthquake,19.1,13.7,0.147,14.0,reviewed,us,us
242,2020-09-13T04:05:14.608Z,6.0551,123.8095,539.77,4.1,mb,,76.0,2.026,0.63,...,2020-11-21T20:10:46.040Z,"45 km WSW of Palimbang, Philippines",earthquake,15.9,14.2,0.198,7.0,reviewed,us,us


## Data cleaning and inspection

`.drop()` can be used to drop whole columns from a DataFrame.

In [27]:
EQ_data_concise = EQ_data.drop(['magType','nst','gap','dmin','rms','net','id','updated','place','type','horizontalError','depthError','magError','magNst','status','locationSource','magSource',], axis='columns')
EQ_data_concise.head()

Unnamed: 0,time,latitude,longitude,depth,mag
0,2020-09-07T06:12:39.688Z,-17.1102,168.5034,10.0,6.2
1,2020-09-11T07:35:57.187Z,-21.3968,-69.9096,51.0,6.2
2,2020-09-12T02:44:11.224Z,38.7482,142.2446,34.0,6.1
3,2020-09-08T00:45:20.853Z,-4.8713,129.7548,172.0,5.9
4,2020-09-12T08:34:27.321Z,-17.2562,167.6792,10.0,5.9


`.unique()` returns the unique values from the specified object.

In [28]:
unique_mags = EQ_data_concise['mag'].unique()
unique_mags.sort()
unique_mags

array([2.5 , 2.51, 2.53, 2.54, 2.55, 2.56, 2.57, 2.58, 2.6 , 2.61, 2.62,
       2.63, 2.64, 2.65, 2.66, 2.68, 2.69, 2.7 , 2.74, 2.75, 2.76, 2.77,
       2.78, 2.79, 2.8 , 2.81, 2.82, 2.84, 2.86, 2.9 , 2.91, 2.92, 2.95,
       2.96, 2.97, 2.98, 2.99, 3.  , 3.01, 3.02, 3.03, 3.06, 3.1 , 3.17,
       3.18, 3.2 , 3.21, 3.23, 3.26, 3.3 , 3.32, 3.34, 3.42, 3.43, 3.44,
       3.46, 3.5 , 3.53, 3.59, 3.6 , 3.7 , 3.71, 3.72, 3.74, 3.75, 3.8 ,
       3.83, 3.9 , 4.  , 4.1 , 4.2 , 4.22, 4.3 , 4.33, 4.36, 4.4 , 4.5 ,
       4.6 , 4.7 , 4.8 , 4.9 , 5.  , 5.1 , 5.2 , 5.4 , 5.6 , 5.7 , 5.9 ,
       6.1 , 6.2 ])

`.value_counts()` returns the count of each unique value from the specified object. This functionality can be used to find duplicate values.

In [29]:
EQ_data_concise['mag'].value_counts()

4.40    38
4.50    35
4.20    32
4.30    31
4.60    31
        ..
2.69     1
3.53     1
3.72     1
3.21     1
3.44     1
Name: mag, Length: 90, dtype: int64

### Finding missing data (NaNs)

NaN stands for not a number and is used as a placeholder in data tables where no value exists. `np.isnan` returns a boolean object with True where NaNs appear in the DataFrame.

In [30]:
np.isnan(EQ_data['nst'])

0       True
1       True
2       True
3       True
4       True
       ...  
446     True
447    False
448     True
449    False
450     True
Name: nst, Length: 451, dtype: bool

In [31]:
~np.isnan(EQ_data['nst'])

0      False
1      False
2      False
3      False
4      False
       ...  
446    False
447     True
448    False
449     True
450    False
Name: nst, Length: 451, dtype: bool

You can use this boolean object to filter-out rows that contain NaNs.

In [32]:
EQ_data[~np.isnan(EQ_data['nst'])]

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
162,2020-09-12T23:22:04.540Z,19.304600,-64.389600,40.00,4.36,md,19.0,233.0,1.86560,0.4800,...,2020-11-21T20:10:38.040Z,"115 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,3.57,29.63,0.060000,8.0,reviewed,pr,pr
163,2020-09-13T07:12:58.980Z,19.396000,-64.284100,35.00,4.33,md,26.0,234.0,2.00520,0.2900,...,2020-11-21T20:10:39.040Z,"129 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,1.81,21.72,0.130000,16.0,reviewed,pr,pr
195,2020-09-11T07:55:45.450Z,36.440667,-117.994500,3.93,4.22,ml,45.0,66.0,0.06908,0.1800,...,2020-11-21T20:10:36.040Z,"18km SSE of Lone Pine, CA",earthquake,0.18,0.57,0.140000,320.0,reviewed,ci,ci
260,2020-09-08T01:19:54.510Z,19.121300,-64.398500,38.00,3.83,md,17.0,338.0,1.74990,0.4000,...,2020-11-17T19:44:53.040Z,"96 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,3.23,24.52,0.110000,14.0,reviewed,pr,pr
261,2020-09-08T06:35:40.270Z,19.316600,-64.570800,37.00,3.83,md,20.0,335.0,1.73570,0.4100,...,2020-11-17T19:44:54.040Z,"111 km NNE of Cruz Bay, U.S. Virgin Islands",earthquake,3.32,26.87,0.120000,10.0,reviewed,pr,pr
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
435,2020-09-08T11:03:53.600Z,18.027300,-66.828000,16.00,2.51,md,12.0,117.0,0.07310,0.1500,...,2020-09-08T11:14:28.314Z,"2 km ESE of Yauco, Puerto Rico",earthquake,0.58,0.73,0.150000,5.0,reviewed,pr,pr
436,2020-09-13T17:55:42.130Z,19.182333,-155.475833,32.80,2.51,md,59.0,82.0,,0.1200,...,2020-11-21T20:10:40.040Z,"2 km S of Pāhala, Hawaii",earthquake,0.44,0.60,0.152709,27.0,reviewed,hv,hv
437,2020-09-07T08:24:04.300Z,17.932000,-66.947100,13.00,2.50,md,16.0,221.0,0.08040,0.1300,...,2020-09-07T09:46:21.114Z,"6 km SW of Guánica, Puerto Rico",earthquake,0.79,0.30,0.120000,11.0,reviewed,pr,pr
447,2020-09-11T12:23:38.700Z,44.315167,-110.494500,1.87,2.50,ml,15.0,144.0,0.05648,0.1800,...,2020-11-21T20:10:36.040Z,"61 km SE of West Yellowstone, Montana",earthquake,0.50,2.06,0.377000,8.0,reviewed,uu,uu


## Further Reading (Optional)

This user guide has lots of useful examples and documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html