# DATA CLEANING

## Loading the dataset

First things first, we will import the required libraries

In [1]:
# import pandas
import pandas as pd

Now, we will load the quakes dataset into the notebook

In [2]:
# load the dataset
df= pd.read_csv('quakes.csv')

Let us see the first few rows in the dataset

In [3]:
df.head()

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
0,2023-11-15T09:05:07.304Z,61.5808,-149.847,32.8,1.7,ml,,,,0.22,...,2023-11-15T09:06:28.283Z,"5 km SSW of Houston, Alaska",earthquake,,0.2,,,automatic,ak,ak
1,2023-11-15T08:53:06.688Z,61.0794,-147.883,14.8,1.0,ml,,,,0.8,...,2023-11-15T08:54:38.102Z,"55 km NE of Whittier, Alaska",earthquake,,0.3,,,automatic,ak,ak
2,2023-11-15T08:41:52.480Z,19.380667,-155.285339,0.32,1.73,md,15.0,153.0,,0.2,...,2023-11-15T08:56:22.252Z,"8 km SW of Volcano, Hawaii",earthquake,0.33,0.38,0.59,15.0,automatic,hv,hv
3,2023-11-15T07:44:53.035Z,61.6382,-149.7828,32.9,1.9,ml,,,,0.31,...,2023-11-15T07:46:10.981Z,,earthquake,,0.2,,,automatic,ak,ak
4,2023-11-15T07:19:44.540Z,18.972166,-155.45166,34.759998,1.87,md,37.0,236.0,,0.12,...,2023-11-15T07:22:58.830Z,"17 km SE of Naalehu, Hawaii",earthquake,0.71,0.89,0.88,5.0,automatic,hv,hv


Now, we will access the shape of the dataset

In [4]:
# check the shape of the dataset
df.shape

(19244, 22)

**We have 19,244 rows and 22 columns in the dataset**

Let's examine the data types of the columns in the dataset.

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19244 entries, 0 to 19243
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   time             19244 non-null  object 
 1   latitude         19244 non-null  float64
 2   longitude        19244 non-null  float64
 3   depth            19244 non-null  float64
 4   mag              19244 non-null  float64
 5   magType          19244 non-null  object 
 6   nst              12628 non-null  float64
 7   gap              12626 non-null  float64
 8   dmin             10842 non-null  float64
 9   rms              19243 non-null  float64
 10  net              19244 non-null  object 
 11  id               19244 non-null  object 
 12  updated          19244 non-null  object 
 13  place            18272 non-null  object 
 14  type             19244 non-null  object 
 15  horizontalError  11691 non-null  float64
 16  depthError       19243 non-null  float64
 17  magError    

It is evident that the dataset contains null values, and the data types of some columns are not appropriate.

## Data Cleaning

### Step 1: Converting the 'Time' and 'Updated' Column to Datetime

In the initial data cleaning step, we will convert the "time" and "updated" column to a datetime data type.

In [6]:
# Convert the 'time' and 'updated' column to datetime
df['time'] = pd.to_datetime(df['time'])
df['updated'] = pd.to_datetime(df['updated'])

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19244 entries, 0 to 19243
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   time             19244 non-null  datetime64[ns, UTC]
 1   latitude         19244 non-null  float64            
 2   longitude        19244 non-null  float64            
 3   depth            19244 non-null  float64            
 4   mag              19244 non-null  float64            
 5   magType          19244 non-null  object             
 6   nst              12628 non-null  float64            
 7   gap              12626 non-null  float64            
 8   dmin             10842 non-null  float64            
 9   rms              19243 non-null  float64            
 10  net              19244 non-null  object             
 11  id               19244 non-null  object             
 12  updated          19244 non-null  datetime64[ns, UTC]
 13  place           

**We have converted 'time' and 'updated' columns from the object data type to datetime64[ns, UTC]. The data now reflects a more appropriate datetime format for further analysis and processing.**


### Step 2: Setting Time Column as Index

In the second data cleaning step, we will set the time column as the new index for the dataset.

In [8]:
# set time colum as index
df.index = df.time

In [9]:
df.head()

Unnamed: 0_level_0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-11-15 09:05:07.304000+00:00,2023-11-15 09:05:07.304000+00:00,61.5808,-149.847,32.8,1.7,ml,,,,0.22,...,2023-11-15 09:06:28.283000+00:00,"5 km SSW of Houston, Alaska",earthquake,,0.2,,,automatic,ak,ak
2023-11-15 08:53:06.688000+00:00,2023-11-15 08:53:06.688000+00:00,61.0794,-147.883,14.8,1.0,ml,,,,0.8,...,2023-11-15 08:54:38.102000+00:00,"55 km NE of Whittier, Alaska",earthquake,,0.3,,,automatic,ak,ak
2023-11-15 08:41:52.480000+00:00,2023-11-15 08:41:52.480000+00:00,19.380667,-155.285339,0.32,1.73,md,15.0,153.0,,0.2,...,2023-11-15 08:56:22.252000+00:00,"8 km SW of Volcano, Hawaii",earthquake,0.33,0.38,0.59,15.0,automatic,hv,hv
2023-11-15 07:44:53.035000+00:00,2023-11-15 07:44:53.035000+00:00,61.6382,-149.7828,32.9,1.9,ml,,,,0.31,...,2023-11-15 07:46:10.981000+00:00,,earthquake,,0.2,,,automatic,ak,ak
2023-11-15 07:19:44.540000+00:00,2023-11-15 07:19:44.540000+00:00,18.972166,-155.45166,34.759998,1.87,md,37.0,236.0,,0.12,...,2023-11-15 07:22:58.830000+00:00,"17 km SE of Naalehu, Hawaii",earthquake,0.71,0.89,0.88,5.0,automatic,hv,hv


In [10]:
# change the index name to 'time_index'
df.index.name = 'time'
df.head()

Unnamed: 0_level_0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-11-15 09:05:07.304000+00:00,2023-11-15 09:05:07.304000+00:00,61.5808,-149.847,32.8,1.7,ml,,,,0.22,...,2023-11-15 09:06:28.283000+00:00,"5 km SSW of Houston, Alaska",earthquake,,0.2,,,automatic,ak,ak
2023-11-15 08:53:06.688000+00:00,2023-11-15 08:53:06.688000+00:00,61.0794,-147.883,14.8,1.0,ml,,,,0.8,...,2023-11-15 08:54:38.102000+00:00,"55 km NE of Whittier, Alaska",earthquake,,0.3,,,automatic,ak,ak
2023-11-15 08:41:52.480000+00:00,2023-11-15 08:41:52.480000+00:00,19.380667,-155.285339,0.32,1.73,md,15.0,153.0,,0.2,...,2023-11-15 08:56:22.252000+00:00,"8 km SW of Volcano, Hawaii",earthquake,0.33,0.38,0.59,15.0,automatic,hv,hv
2023-11-15 07:44:53.035000+00:00,2023-11-15 07:44:53.035000+00:00,61.6382,-149.7828,32.9,1.9,ml,,,,0.31,...,2023-11-15 07:46:10.981000+00:00,,earthquake,,0.2,,,automatic,ak,ak
2023-11-15 07:19:44.540000+00:00,2023-11-15 07:19:44.540000+00:00,18.972166,-155.45166,34.759998,1.87,md,37.0,236.0,,0.12,...,2023-11-15 07:22:58.830000+00:00,"17 km SE of Naalehu, Hawaii",earthquake,0.71,0.89,0.88,5.0,automatic,hv,hv


Great!!!

### Step 3: Handling Null Values

In this stage of data cleaning, we will address and handle all null values present in the dataset.

Let's identify which datasets within the dataset contain null values.

In [11]:
# checking for nulls
df.isnull().sum()

time                  0
latitude              0
longitude             0
depth                 0
mag                   0
magType               0
nst                6616
gap                6618
dmin               8402
rms                   1
net                   0
id                    0
updated               0
place               972
type                  0
horizontalError    7553
depthError            1
magError           6688
magNst             6659
status                0
locationSource        0
magSource             0
dtype: int64

9 columns contain null values in the dataset

### a. Dealing with nulls in the 'place' column

For null values in the "place" column, we will substitute them with 'Others' to denote alternative locations.

In [12]:
df['place'] = df['place'].fillna('Others')

In [13]:
# checking for nulls
df.isnull().sum()

time                  0
latitude              0
longitude             0
depth                 0
mag                   0
magType               0
nst                6616
gap                6618
dmin               8402
rms                   1
net                   0
id                    0
updated               0
place                 0
type                  0
horizontalError    7553
depthError            1
magError           6688
magNst             6659
status                0
locationSource        0
magSource             0
dtype: int64

Good!!!

### b. Dealing with nulls in the other columns

Now, we will deal with the nulls in other columns. Since the remaining columns are all quantitative variable, we will fill all the nulls with zero.

In [14]:
# List of columns to fill null values
columns_to_fill = ['nst', 'gap', 'dmin', 'horizontalError', 'depthError', 'magError', 'magNst', 'rms']

# Iterate through each column and fill null values with 0
for column in columns_to_fill:
    # Fill null values in the specified column with 0
    df[column] = df[column].fillna(0)


In [15]:
# checking for nulls
df.isnull().sum()

time               0
latitude           0
longitude          0
depth              0
mag                0
magType            0
nst                0
gap                0
dmin               0
rms                0
net                0
id                 0
updated            0
place              0
type               0
horizontalError    0
depthError         0
magError           0
magNst             0
status             0
locationSource     0
magSource          0
dtype: int64

**Excellent! We have effectively addressed all null values.**

### Step 4: Converting columns to appropriate data types

Now, we will attempt to convert the various columns into the appropriate data types.

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 19244 entries, 2023-11-15 09:05:07.304000+00:00 to 2023-09-01 00:04:57.310000+00:00
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   time             19244 non-null  datetime64[ns, UTC]
 1   latitude         19244 non-null  float64            
 2   longitude        19244 non-null  float64            
 3   depth            19244 non-null  float64            
 4   mag              19244 non-null  float64            
 5   magType          19244 non-null  object             
 6   nst              19244 non-null  float64            
 7   gap              19244 non-null  float64            
 8   dmin             19244 non-null  float64            
 9   rms              19244 non-null  float64            
 10  net              19244 non-null  object             
 11  id               19244 non-null  object             
 12  updated      

We will convert the 'magType', 'net', 'id', 'place', 'type', 'status', 'locationSource', 'magSource', and 'state' column from object to string data type

In [17]:
# List of columns to convert to string data type
columns_to_convert_to_string = ['magType', 'net', 'id', 'place', 'type', 'status', 'locationSource', 'magSource']

# Convert specified columns to string data type
df[columns_to_convert_to_string] = df[columns_to_convert_to_string].astype('string')


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 19244 entries, 2023-11-15 09:05:07.304000+00:00 to 2023-09-01 00:04:57.310000+00:00
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   time             19244 non-null  datetime64[ns, UTC]
 1   latitude         19244 non-null  float64            
 2   longitude        19244 non-null  float64            
 3   depth            19244 non-null  float64            
 4   mag              19244 non-null  float64            
 5   magType          19244 non-null  string             
 6   nst              19244 non-null  float64            
 7   gap              19244 non-null  float64            
 8   dmin             19244 non-null  float64            
 9   rms              19244 non-null  float64            
 10  net              19244 non-null  string             
 11  id               19244 non-null  string             
 12  updated      

Next we will convert the 'nst' and 'magNst'column into integers

In [19]:
# List of columns to convert to integer data type
columns_to_convert_to_int = ['nst', 'magNst']

# Convert specified columns to integer data type
df[columns_to_convert_to_int] = df[columns_to_convert_to_int].astype(int)


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 19244 entries, 2023-11-15 09:05:07.304000+00:00 to 2023-09-01 00:04:57.310000+00:00
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   time             19244 non-null  datetime64[ns, UTC]
 1   latitude         19244 non-null  float64            
 2   longitude        19244 non-null  float64            
 3   depth            19244 non-null  float64            
 4   mag              19244 non-null  float64            
 5   magType          19244 non-null  string             
 6   nst              19244 non-null  int32              
 7   gap              19244 non-null  float64            
 8   dmin             19244 non-null  float64            
 9   rms              19244 non-null  float64            
 10  net              19244 non-null  string             
 11  id               19244 non-null  string             
 12  updated      

**Great. We have successfully converted all columns to their appropriate data types.**

In [24]:
list(df.type.unique())


['earthquake',
 'quarry blast',
 'explosion',
 'other event',
 'ice quake',
 'experimental explosion',
 'landslide',
 'volcanic eruption',
 'mining explosion']

### Step 5: Obtaining the state and country of the seismic events

We will enhance our dataset by extracting precise state and country information from the 'longitude' and 'latitude column using the Geopy Python library. This library utilizes longitude and latitude coordinates to generate the corresponding state and country names. Consequently, we will create a new column named 'state_country' to capture this geographically derived information.

In [47]:
df.head(1)

Unnamed: 0_level_0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-11-15 09:05:07.304000+00:00,2023-11-15 09:05:07.304000+00:00,61.5808,-149.847,32.8,1.7,ml,0,0.0,0.0,0.22,...,2023-11-15 09:06:28.283000+00:00,"5 km SSW of Houston, Alaska",earthquake,0.0,0.2,0.0,0,automatic,ak,ak


As an example, we will use the library to obtain the exact state and location of the first seismic event in the datatset

In [48]:
# import the required library
from geopy.geocoders import Nominatim

# Create a Nominatim geolocator
geolocator = Nominatim(user_agent="reverse_geocoding_example")

# coordinates of the first seismic event
latitude = 61.5808
longitude = -149.847

# Perform reverse geocoding
location = geolocator.reverse((latitude, longitude), language='en')

if location is not None:
    address = location.address
    print(f"The location at ({latitude}, {longitude}) is: {address}")
else:
    print(f"No location found for coordinates ({latitude}, {longitude})")


The location at (61.5808, -149.847) is: Beavertail Drive, Matanuska-Susitna, Alaska, United States


**This shows the exact location of the first seismic event**

We will now replicate the process to extract the state and country information for all seismic events. To mitigate runtime errors, we will parse in the coordinates of 50 events at a time.

**Desclaimer: This body of code might take about 90 minutes to run, due to numerous request sent to Geopy**

In [26]:
# import tqdm to show the progress bar
from tqdm import tqdm  

# Create a Nominatim geolocator
geolocator = Nominatim(user_agent="reverse_geocoding_examples")


# Function to perform reverse geocoding for a batch of rows
def reverse_geocode_batch(batch):
    results = []
    for index, row in batch.iterrows():
        location = geolocator.reverse((row['latitude'], row['longitude']), language='en')
        if location is not None:
            results.append(location.address)
        else:
            results.append("No location found")
    return results

# make a copy of the dataset
df_reset = df

# Reset the index before processing in batches
df_reset = df.reset_index(drop=True)

# Process the DataFrame in batches of 50 rows
batch_size = 50
num_batches = len(df_reset) // batch_size + 1

 
# Enable progress bar for pandas
tqdm.pandas()

# Apply the function to create a new column 'state_country'
df_reset['state_country'] = df_reset.groupby(df_reset.index // batch_size).progress_apply(reverse_geocode_batch).explode().reset_index(drop=True)

# Print the updated DataFrame
df_reset


100%|██████████████████████████████████████████████████████████████████████████████| 385/385 [2:40:27<00:00, 25.01s/it]


Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,state_country
0,2023-11-15 09:05:07.304000+00:00,61.580800,-149.847000,32.800000,1.70,ml,0,0.0,0.0000,0.22,...,"5 km SSW of Houston, Alaska",earthquake,0.00,0.20,0.000000,0,automatic,ak,ak,"Beavertail Drive, Matanuska-Susitna, Alaska, U..."
1,2023-11-15 08:53:06.688000+00:00,61.079400,-147.883000,14.800000,1.00,ml,0,0.0,0.0000,0.80,...,"55 km NE of Whittier, Alaska",earthquake,0.00,0.30,0.000000,0,automatic,ak,ak,"Chugach, Alaska, United States"
2,2023-11-15 08:41:52.480000+00:00,19.380667,-155.285339,0.320000,1.73,md,15,153.0,0.0000,0.20,...,"8 km SW of Volcano, Hawaii",earthquake,0.33,0.38,0.590000,15,automatic,hv,hv,"Volcano, Volcano CDP, Hawaiʻi County, Hawaii, ..."
3,2023-11-15 07:44:53.035000+00:00,61.638200,-149.782800,32.900000,1.90,ml,0,0.0,0.0000,0.31,...,Others,earthquake,0.00,0.20,0.000000,0,automatic,ak,ak,"Houston, Matanuska-Susitna, Alaska, 99694, Uni..."
4,2023-11-15 07:19:44.540000+00:00,18.972166,-155.451660,34.759998,1.87,md,37,236.0,0.0000,0.12,...,"17 km SE of Naalehu, Hawaii",earthquake,0.71,0.89,0.880000,5,automatic,hv,hv,United States
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19239,2023-09-01 01:10:27.784000+00:00,53.154800,-164.532800,35.000000,3.00,ml,14,222.0,1.2290,0.33,...,"136 km SE of Akutan, Alaska",earthquake,3.27,2.02,0.128000,8,reviewed,us,us,No location found
19240,2023-09-01 00:55:24.900000+00:00,19.210167,-155.348007,28.299999,1.71,md,21,189.0,0.0000,0.10,...,"13 km E of Pāhala, Hawaii",earthquake,0.82,0.63,1.730000,4,automatic,hv,hv,"Naliʻikakani Point, Hawaiʻi County, Hawaii, Un..."
19241,2023-09-01 00:45:43.100000+00:00,17.937167,-66.917667,13.160000,2.28,md,6,216.0,0.0525,0.08,...,Puerto Rico region,earthquake,0.90,0.41,0.130183,6,reviewed,pr,pr,"Montalva, Guánica, Puerto Rico, 00647, United ..."
19242,2023-09-01 00:42:08.428000+00:00,60.279600,-147.859600,5.600000,2.00,ml,0,0.0,0.0000,0.74,...,"25 km NNE of Chenega, Alaska",earthquake,0.00,0.40,0.000000,0,reviewed,ak,ak,"Unorganized Borough, Alaska, United States"


In [28]:
# set time column as index
df_reset.index = df_reset.time

In [30]:
df_reset.head()

Unnamed: 0_level_0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource,state_country
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-11-15 09:05:07.304000+00:00,2023-11-15 09:05:07.304000+00:00,61.5808,-149.847,32.8,1.7,ml,0,0.0,0.0,0.22,...,"5 km SSW of Houston, Alaska",earthquake,0.0,0.2,0.0,0,automatic,ak,ak,"Beavertail Drive, Matanuska-Susitna, Alaska, U..."
2023-11-15 08:53:06.688000+00:00,2023-11-15 08:53:06.688000+00:00,61.0794,-147.883,14.8,1.0,ml,0,0.0,0.0,0.8,...,"55 km NE of Whittier, Alaska",earthquake,0.0,0.3,0.0,0,automatic,ak,ak,"Chugach, Alaska, United States"
2023-11-15 08:41:52.480000+00:00,2023-11-15 08:41:52.480000+00:00,19.380667,-155.285339,0.32,1.73,md,15,153.0,0.0,0.2,...,"8 km SW of Volcano, Hawaii",earthquake,0.33,0.38,0.59,15,automatic,hv,hv,"Volcano, Volcano CDP, Hawaiʻi County, Hawaii, ..."
2023-11-15 07:44:53.035000+00:00,2023-11-15 07:44:53.035000+00:00,61.6382,-149.7828,32.9,1.9,ml,0,0.0,0.0,0.31,...,Others,earthquake,0.0,0.2,0.0,0,automatic,ak,ak,"Houston, Matanuska-Susitna, Alaska, 99694, Uni..."
2023-11-15 07:19:44.540000+00:00,2023-11-15 07:19:44.540000+00:00,18.972166,-155.45166,34.759998,1.87,md,37,236.0,0.0,0.12,...,"17 km SE of Naalehu, Hawaii",earthquake,0.71,0.89,0.88,5,automatic,hv,hv,United States


### Saving the dataset

Finally, we will save the cleaned dataset as a CSV file called 'quakes-cleaned.csv'

In [31]:
# save as csv
df_reset.to_csv('quakes-cleaned.csv')