# Parsing Dates

Date parsing in an important part of data cleaning especially when it comes to the fact that you have to work with dates.


Its a common occurence for dates to be stored in different formats and most times as string data types. This makes the date columns to be treated as "Object" type instead of date type.

The process of taking a string and identifying its component parts and being able to convert it into dates is called ``parsing dates``

Take a look at the different strftimes [here](https://strftime.org/)

Dates can come in different formats example:

``1/17/07 has the format "%m/%d/%y"``

``17-1-2007 has the format "%d-%m-%Y"``

Sometimes a single date column can have multiple date formats. So have are we able to tell pandas which data format to use? To do this we'll let pandas determine automatically the best date format to use for each row. to do this we set the ``infer_datetime_format`` to be True

``landslides['date_parsed'] = pd.to_datetime(landslides['Date'], infer_datetime_format=True)``

But its not alway a good practice to let pandas figure out the date format used.

1. Its slower
2. Pandas is not always correct

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import datetime

# set seed for reproducibility
np.random.seed(0)

In [3]:
df = pd.read_csv("datasets/volcanoes.csv")

In [5]:
df.head()

Unnamed: 0,Number,Name,Country,Region,Type,Activity Evidence,Last Known Eruption,Latitude,Longitude,Elevation (Meters),Dominant Rock Type,Tectonic Setting
0,210010,West Eifel Volcanic Field,Germany,Mediterranean and Western Asia,Maar(s),Eruption Dated,8300 BCE,50.17,6.85,600,Foidite,Rift Zone / Continental Crust (>25 km)
1,210020,Chaine des Puys,France,Mediterranean and Western Asia,Lava dome(s),Eruption Dated,4040 BCE,45.775,2.97,1464,Basalt / Picro-Basalt,Rift Zone / Continental Crust (>25 km)
2,210030,Olot Volcanic Field,Spain,Mediterranean and Western Asia,Pyroclastic cone(s),Evidence Credible,Unknown,42.17,2.53,893,Trachybasalt / Tephrite Basanite,Intraplate / Continental Crust (>25 km)
3,210040,Calatrava Volcanic Field,Spain,Mediterranean and Western Asia,Pyroclastic cone(s),Eruption Dated,3600 BCE,38.87,-4.02,1117,Basalt / Picro-Basalt,Intraplate / Continental Crust (>25 km)
4,211001,Larderello,Italy,Mediterranean and Western Asia,Explosion crater(s),Eruption Observed,1282 CE,43.25,10.87,500,No Data,Subduction Zone / Continental Crust (>25 km)


In [6]:
df.columns

Index(['Number', 'Name', 'Country', 'Region', 'Type', 'Activity Evidence',
       'Last Known Eruption', 'Latitude', 'Longitude', 'Elevation (Meters)',
       'Dominant Rock Type', 'Tectonic Setting'],
      dtype='object')

https://www.kaggle.com/alexisbcook/character-encodings