<a href="https://colab.research.google.com/github/OptimalDecisions/sports-analytics-foundations/blob/main/pandas-basics/Pandas_Intermediate_2_8_Time_Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


  ## Pandas Basics 2.8

# Formatting Date and time columns


  <img src = "../img/sa_logo.png" width="100" align="left">

  Ram Narasimhan

  <br><br><br>

  << [Writing to Files](Pandas_Basics_2_7_Writing_to_Files.ipynb) | [Time Series](Pandas_Intermediate_2_8_Time_Series.ipynb) | [Merging Dataframes](Pandas_Intermediate_2_9_Merging_DataFrames.ipynb) >>


An integral part of any Sports Analysis is working with dates (and time).

Pandas is extremely useful (and versatile) when it comes to handling dates. In particular, there is a function called `to_datetime()`.

This function is a fundamental tool in data cleaning and preprocessing workflows, when dealing with datasets containing date information.

## Introduction to `pd.to_datetime`

The `pd.to_datetime` function in Pandas serves the purpose of converting "date-like" strings to Pandas `datetime objects`. (Yes, Pandas has its own "datetime" object type -- just as Integers and strings are types.)

The `to_datetime() function is a fundamental tool in data cleaning and preprocessing workflows, especially when dealing with datasets containing date information.



## Key Uses of pd.to_datetime:

### Standardizing Date Formats

It allows you to standardize the format of date-like strings in a DataFrame, making it consistent and suitable for analysis.



### Handling Missing or Invalid Dates

The pd.to_datetime function can handle missing or invalid date values gracefully. By default, it can coerce invalid values to NaT (Not a Time), allowing you to clean datasets with inconsistent date representations.



### Extracting Date Components

Once the date-like strings are converted to datetime objects, we can easily extract various components such as `year`, `month`, `day`, `hour`, `minute`, and `second`. This facilitates time-based analysis and filtering.



### Supporting Datetime Operations

`Datetime` objects support a wide range of time-related operations. After conversion, we can perform operations like time-based filtering, resampling, and calculating time intervals.

### Basic Syntax

```
import pandas as pd

pd.to_datetime(arg, errors='raise', format=None, unit=None, infer_datetime_format=False, origin='unix', cache=True)
```



Common Parameters:
- `arg`: The object to convert to datetime.
- `errors`: How to handle parsing errors (`raise`, `coerce`, `ignore`).
- `format`: Specify the expected date format.
unit: Unit of the arg (e.g., `s` for seconds).
- `infer_datetime_format`: If `True`, infer the datetime format.
- `origin`: The reference date for numeric time-related units.


In [7]:
import pandas as pd

# Example DataFrame with a messy date column
data = {'Event': ['Game 1', 'Game 2', 'Game 3', 'Game 4', 'Game 5', ],
        'Date': ['2022-01-15', '2022-02-20', '2022-03-10', '05/15/2022', '06-25-2022']}
df = pd.DataFrame(data)
df


Unnamed: 0,Event,Date
0,Game 1,2022-01-15
1,Game 2,2022-02-20
2,Game 3,2022-03-10
3,Game 4,05/15/2022
4,Game 5,06-25-2022


In [8]:
# Convert the 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df

Unnamed: 0,Event,Date
0,Game 1,2022-01-15
1,Game 2,2022-02-20
2,Game 3,2022-03-10
3,Game 4,2022-05-15
4,Game 5,2022-06-25


In [4]:
df

Unnamed: 0,Event,Date
0,Game1,2022-01-15
1,Game1,2022-02-20
2,Game1,2022-03-10
3,Game1,2022-05-15
4,Game1,2022-06-25


In [19]:
data = {'GameDate': ['2022/01/15 6:30:00 PM', '2022-02-20 15:45:00',
                     '2022-March-10 20:00:00', '16 May 2022 12:00:00', '2022-06-25 19:15:00'],
        'Team': ['TeamA', 'TeamB', 'TeamC', 'TeamA', 'TeamB']}

sports_df = pd.DataFrame(data)
sports_df


Unnamed: 0,GameDate,Team
0,2022/01/15 6:30:00 PM,TeamA
1,2022-02-20 15:45:00,TeamB
2,2022-March-10 20:00:00,TeamC
3,16 May 2022 12:00:00,TeamA
4,2022-06-25 19:15:00,TeamB


In [20]:

# Convert the 'GameDate' column to datetime
sports_df['GameDate'] = pd.to_datetime(sports_df['GameDate'], errors='coerce')

# Extracting Year
sports_df['Year'] = sports_df['GameDate'].dt.year

# Extracting Time
sports_df['Time'] = sports_df['GameDate'].dt.time

# Extracting Day of Week
sports_df['DayOfWeek'] = sports_df['GameDate'].dt.day_name()

# Display the DataFrame
print(sports_df)


             GameDate   Team  Year      Time DayOfWeek
0 2022-01-15 18:30:00  TeamA  2022  18:30:00  Saturday
1 2022-02-20 15:45:00  TeamB  2022  15:45:00    Sunday
2 2022-03-10 20:00:00  TeamC  2022  20:00:00  Thursday
3 2022-05-16 12:00:00  TeamA  2022  12:00:00    Monday
4 2022-06-25 19:15:00  TeamB  2022  19:15:00  Saturday


### Extraction of time components: `year`, `month`, `day`, `hour`

In [25]:
data = {'GameDate': ['2022-01-15 18:30:00', '2022-02-20 15:45:00', '2022-03-10 20:00:00', '2022-05-15 12:00:00', '2022-06-25 19:15:00'],
        'Team': ['TeamA', 'TeamB', 'TeamC', 'TeamA', 'TeamB']}

sports_df = pd.DataFrame(data)
sports_df


Unnamed: 0,GameDate,Team
0,2022-01-15 18:30:00,TeamA
1,2022-02-20 15:45:00,TeamB
2,2022-03-10 20:00:00,TeamC
3,2022-05-15 12:00:00,TeamA
4,2022-06-25 19:15:00,TeamB


In [26]:

# Convert the 'GameDate' column to datetime
sports_df['GameDate'] = pd.to_datetime(sports_df['GameDate'], errors='coerce')

# Accessing datetime properties
sports_df['Year'] = sports_df['GameDate'].dt.year
sports_df['Month'] = sports_df['GameDate'].dt.month
sports_df['Day'] = sports_df['GameDate'].dt.day
sports_df['Hour'] = sports_df['GameDate'].dt.hour
sports_df['Minute'] = sports_df['GameDate'].dt.minute
sports_df['Second'] = sports_df['GameDate'].dt.second

sports_df


Unnamed: 0,GameDate,Team,Year,Month,Day,Hour,Minute,Second
0,2022-01-15 18:30:00,TeamA,2022,1,15,18,30,0
1,2022-02-20 15:45:00,TeamB,2022,2,20,15,45,0
2,2022-03-10 20:00:00,TeamC,2022,3,10,20,0,0
3,2022-05-15 12:00:00,TeamA,2022,5,15,12,0,0
4,2022-06-25 19:15:00,TeamB,2022,6,25,19,15,0


## Converting a Single Column

- Walk through the process of converting a single column containing date strings to `datetime`
- Discuss how to handle errors and invalid values using the errors parameter
- Illustrate the impact of specifying the date format when needed



## Handling Multiple Columns:

- Extend the lesson to cover scenarios where you have multiple columns with date information.
- Demonstrate how to convert multiple columns simultaneously using pd.to_datetime.
- Discuss any additional considerations, such as dealing with different date formats.
Basic Date Exploration and Extraction:



### Introduce basic exploration of datetime columns

- Show how to access datetime properties like year, month, day, hour, minute, and second.
- Demonstrate how to extract the day of the week using the dt.day_name() method.
- Illustrate other common operations like extracting the date, time, and year.

Practice Exercises:

Provide hands-on exercises where beginners can apply what they've learned.
Encourage participants to clean and convert date columns in a sample dataset.
Include exercises that involve extracting various components and performing basic analysis.



<< [Writing to Files](Pandas_Basics_2_7_Writing_to_Files.ipynb) | [Time Series](Pandas_Intermediate_2_8_Time_Series.ipynb) | [Merging Dataframes](Pandas_Intermediate_2_9_Merging_DataFrames.ipynb) >>