<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Dealing-with-dates" data-toc-modified-id="Dealing-with-dates-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Dealing with dates</a></span><ul class="toc-item"><li><span><a href="#to_datetime()---convert-string-to-datetime" data-toc-modified-id="to_datetime()---convert-string-to-datetime-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span><code>to_datetime()</code> - convert string to datetime</a></span><ul class="toc-item"><li><span><a href="#Format-parameter" data-toc-modified-id="Format-parameter-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span><code>Format</code> parameter</a></span></li><li><span><a href="#More-parameters" data-toc-modified-id="More-parameters-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>More parameters</a></span><ul class="toc-item"><li><span><a href="#dayfirst" data-toc-modified-id="dayfirst-1.1.2.1"><span class="toc-item-num">1.1.2.1&nbsp;&nbsp;</span><code>dayfirst</code></a></span></li><li><span><a href="#infer_datetime_format" data-toc-modified-id="infer_datetime_format-1.1.2.2"><span class="toc-item-num">1.1.2.2&nbsp;&nbsp;</span><code>infer_datetime_format</code></a></span></li><li><span><a href="#errors" data-toc-modified-id="errors-1.1.2.3"><span class="toc-item-num">1.1.2.3&nbsp;&nbsp;</span><code>errors</code></a></span></li></ul></li></ul></li><li><span><a href="#strftime()---changing-the-format-of-a-datetime" data-toc-modified-id="strftime()---changing-the-format-of-a-datetime-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span><code>strftime()</code> - changing the format of a datetime</a></span></li><li><span><a href="#timedelta()---Add/remove-a-period-of-time-to-a-date" data-toc-modified-id="timedelta()---Add/remove-a-period-of-time-to-a-date-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span><code>timedelta()</code> - Add/remove a period of time to a date</a></span><ul class="toc-item"><li><span><a href="#Lets-use-Python's-timedelta-function" data-toc-modified-id="Lets-use-Python's-timedelta-function-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Lets use Python's timedelta function</a></span></li><li><span><a href="#Lets-use-Pandas-to_timedelta()" data-toc-modified-id="Lets-use-Pandas-to_timedelta()-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Lets use Pandas <code>to_timedelta()</code></a></span></li></ul></li><li><span><a href="#dt-object---Extract-parts-of-the-date" data-toc-modified-id="dt-object---Extract-parts-of-the-date-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span><code>dt</code> object - Extract parts of the date</a></span></li><li><span><a href="#to_datetime()---create-a-datetime-from-several-columns-month,-day,-year..." data-toc-modified-id="to_datetime()---create-a-datetime-from-several-columns-month,-day,-year...-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span><code>to_datetime()</code> - create a datetime from several columns month, day, year...</a></span></li><li><span><a href="#Parsing-dates-when-reading-a-CSV" data-toc-modified-id="Parsing-dates-when-reading-a-CSV-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Parsing dates when reading a CSV</a></span></li><li><span><a href="#Setting-the-date-as-an-index" data-toc-modified-id="Setting-the-date-as-an-index-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Setting the date as an index</a></span></li></ul></li><li><span><a href="#💡-Check-for-understanding" data-toc-modified-id="💡-Check-for-understanding-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>💡 Check for understanding</a></span></li><li><span><a href="#Summary" data-toc-modified-id="Summary-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Summary</a></span></li></ul></div>

# Dealing with dates

Python simplifies the process of working with dates through the built-in `datetime` module and specific `date` type objects. These tools handle the complexities inherent in date-related calculations, such as accounting for the varying number of days in different months, leap years, and week numbering conventions.

However, for an even more streamlined experience when dealing with dates, particularly within the context of data analysis, the Pandas library is highly recommended. Pandas provides its own set of methods for managing date-type columns and for converting string-type columns to datetime64 type, making it an excellent choice for efficient date manipulation in Python.

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame({'index':[0,1],'date':['01-09-18 15:23:11','02-09-18 17:21:45']})
df

Unnamed: 0,index,date
0,0,01-09-18 15:23:11
1,1,02-09-18 17:21:45


In [None]:
df_copy = df.copy()

In [None]:
df.dtypes

index     int64
date     object
dtype: object

`date` is of type `object` since it's a *string*. If we want it as type `date`we can do so using `to_datetime()`.

## `to_datetime()` - convert string to datetime

Many times we will find that dates appear as Strings. In order to perform operations with dates, we need them to be in type `date. Pandas has `to_datetime()` method, which converts a Series of Strings to `datetime64`.

In [None]:
pd.to_datetime(df.date)

  pd.to_datetime(df.date)


0   2018-01-09 15:23:11
1   2018-02-09 17:21:45
Name: date, dtype: datetime64[ns]

We see that by default it first places the month and then the day (US format) - unless it's impossible because the day is 23/10, in which case it uses the European format. If we want to change the format, we use the `format` method.

### `Format` parameter

**Representation of elements of the date**

Each element of the date is represented by the combination of the percentage symbol "%" and a letter. In our case:
- `%d` represents the day of the month in 2 digit format
- `%m` represents the month in 2 digit format.
- `%y` represents the year in 2 digit format. Example: 23 (not 2023)
- `%Y` represents the year in 4 digit format. Example: 2023 (not 23)

Similar to the date, the elements referring to the date are represented by the same set of letters, while those of the hour do so by default by the following set:

- `%H` represents the hour.
- `%M` represents the minutes.
- `%S` represents the seconds.
- `%f` represents the microseconds.

In [None]:
df["date_2"] = pd.to_datetime(df.date,format='%d-%m-%y %H:%M:%S')

In [None]:
df.dtypes

index              int64
date              object
date_2    datetime64[ns]
dtype: object

We can see it changed to *datetime64[ns]*.

### More parameters

#### `dayfirst`

Instead of format, we can use **dayfirst** parameter.

In [None]:
pd.to_datetime(df['date'], dayfirst=True) # change to false to see how it looks!

  pd.to_datetime(df['date'], dayfirst=True) # change to false to see how it looks!


0   2018-09-01 15:23:11
1   2018-09-02 17:21:45
Name: date, dtype: datetime64[ns]

In [None]:
pd.to_datetime(df['date'], dayfirst=False)

  pd.to_datetime(df['date'], dayfirst=False)


0   2018-01-09 15:23:11
1   2018-02-09 17:21:45
Name: date, dtype: datetime64[ns]

# `infer_datetime_format`

In [None]:
pd.to_datetime(df['date'], infer_datetime_format=True) # it doesnt infer them as we wanted it to

  pd.to_datetime(df['date'], infer_datetime_format=True) # it doesnt infer them as we wanted it to
  pd.to_datetime(df['date'], infer_datetime_format=True) # it doesnt infer them as we wanted it to


0   2018-01-09 15:23:11
1   2018-02-09 17:21:45
Name: date, dtype: datetime64[ns]

In [None]:
# same if we use /instead of -
df = pd.DataFrame({'index':[0,1],'date':['18/09/19 15:23:11','18/09/23 17:21:45']})
df

Unnamed: 0,index,date
0,0,18/09/19 15:23:11
1,1,18/09/23 17:21:45


In [None]:
pd.to_datetime(df['date'], infer_datetime_format=True) # since month 19 would be invalid, it does infer it well now

  pd.to_datetime(df['date'], infer_datetime_format=True) # since month 19 would be invalid, it does infer it well now
  pd.to_datetime(df['date'], infer_datetime_format=True) # since month 19 would be invalid, it does infer it well now


0   2019-09-18 15:23:11
1   2023-09-18 17:21:45
Name: date, dtype: datetime64[ns]

In [None]:
# or with format
df["date"] = pd.to_datetime(df.date,format='%y/%m/%d %H:%M:%S') # converts it to datetime, but uses - instead of /
df["date"]

0   2018-09-19 15:23:11
1   2018-09-23 17:21:45
Name: date, dtype: datetime64[ns]

#### `errors`

If we use the `errors` parameter, it allows us to ignore errors caused by invalid date formats or force them to an invalid value, NaT (Not a Time).

In [None]:
df_error = pd.DataFrame({'date': ['3/10/2000', 'a/11/2000', '3/12/2000']})
df_error['date'] = pd.to_datetime(df_error['date'])


ValueError: time data "a/11/2000" doesn't match format "%m/%d/%Y", at position 1. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [None]:
df_error['date'] = pd.to_datetime(df_error['date'], errors='ignore')
df_error

Unnamed: 0,date
0,3/10/2000
1,a/11/2000
2,3/12/2000


In [None]:
df_error['date'] = pd.to_datetime(df_error['date'], errors='coerce')
df_error

Unnamed: 0,date
0,2000-03-10
1,NaT
2,2000-03-12


Pandas always represents dates in the `%Y-%m-%d` format. If we want to use a different format, we need to change it after converting the column from a `string` type to a `datetime64` type.

## `strftime()` - changing the format of a datetime

We can change the date format, for example by replacing the dashes with slashes using the strftime method. **It returns a String.**

Note: It is called on the **'dt' object** of a Series (datetime).

In [None]:
df["date_2"] = df.date.dt.strftime('%d %B %Y') #serie.dt() Accessor object for datetimelike properties of the Series values.
df["date_2"]
#To extract each component of the date, pandas implements its own methods from the `dt` (datetime) object.

0    19 September 2018
1    23 September 2018
Name: date_2, dtype: object

In [None]:
df.dtypes

index              int64
date      datetime64[ns]
date_2            object
dtype: object

The column is converted back to a string, which means we will lose the methods specific to dates.

For example, we could also switch from a long year format to a short one by changing `"%Y"` to `"%y"`.

In [None]:
df.date.dt.strftime("%Y/%m/%d")

0    2018/09/19
1    2018/09/23
Name: date, dtype: object

We can change the month number to its English name using `"%B"`.

In [None]:
df.date.dt.strftime("%y/%m/%d")

0    18/09/19
1    18/09/23
Name: date, dtype: object

Of course, we can change the order of each element according to our preferences:

In [None]:
df.date.dt.strftime("%Y/%B/%d")

0    2018/September/19
1    2018/September/23
Name: date, dtype: object

In [None]:
df.date.dt.strftime("%d/%m/%Y")

0    19/09/2018
1    23/09/2018
Name: date, dtype: object

We can also work with both the date and time.

It is not mandatory to declare all the elements that make up the `datetime` object when we change the format, but obviously, the ones that are not declared will not appear. So, if we don't want milliseconds to appear in our variable, it's as simple as removing that part when we create the format.

In [None]:
df.date.dt.strftime("%d/%m/%Y %H:%M:%S")

0    19/09/2018 15:23:11
1    23/09/2018 17:21:45
Name: date, dtype: object

## `timedelta()` - Add/remove a period of time to a date

The really interesting thing about datetime type objects is that they possess the logic of date operations, so we don't have to worry about the days a month has or about leap years, since Python itself will take care of these considerations.

### Lets use Python's timedelta function

In [None]:
from datetime import timedelta

df["date_3"] = df.date + timedelta(days=3)
df

Unnamed: 0,index,date,date_2,date_3
0,0,2018-09-19 15:23:11,19 September 2018,2018-09-22 15:23:11
1,1,2018-09-23 17:21:45,23 September 2018,2018-09-26 17:21:45


In [None]:
df.date + timedelta(hours=24)

0   2018-09-20 15:23:11
1   2018-09-24 17:21:45
Name: date, dtype: datetime64[ns]

In [None]:
df.date + timedelta(days=31)

0   2018-10-20 15:23:11
1   2018-10-24 17:21:45
Name: date, dtype: datetime64[ns]

When we add 31 days to our current date, Python automatically determines whether the month has 30 or 31 days and will add as many units to the month as necessary.

However, we cannot add full months using the 'months' argument. This is because timedelta doesn't support 'months' as an argument due to variability in the number of days in a month.

In [None]:
df.date+timedelta(months=1)

TypeError: 'months' is an invalid keyword argument for __new__()

### Lets use Pandas `to_timedelta()`

The `pd.to_timedelta()` function in Pandas is used to convert a scalar, array, list, or series from a recognized time format or value into a Timedelta type. Timedelta is a type that represents a duration, the difference between two dates or times.

The `pd.to_timedelta()` function can accept several types of arguments:

- A single string or a list of strings: These should represent a duration. For example, '1 days', '1 days 00:00:00', '1 days 2 hours', '1D', etc.
- An integer, float, array of these, or a Series: These should represent the duration in terms of the unit specified.

You can specify the unit of the input with the `unit` parameter. For instance, if you provide an integer with `unit='s'`, the function will interpret the input as seconds. If you provide an integer with `unit='m'`, the function will interpret the input as minutes.

In [None]:
df.date+pd.to_timedelta(1, unit='m')

0   2018-09-19 15:24:11
1   2018-09-23 17:22:45
Name: date, dtype: datetime64[ns]

## `dt` object - Extract parts of the date

In order to extract each part that makes up the date, pandas implements its own methods from the dt (datetime) object.

In [None]:
df.date.dt.year # Extracts the year from the date. For instance, '2023-08-02' will return 2023.

0    2018
1    2018
Name: date, dtype: int32

In [None]:
df.date.dt.day # Extracts the day of the month from the date. For '2023-08-02', this will return 2.

0    19
1    23
Name: date, dtype: int32

In [None]:
df.date.dt.month # Extracts the numerical month from the date. For example, '2023-08-02' will return 8.

0    9
1    9
Name: date, dtype: int32

In [None]:
df.date.dt.hour # Extracts the hour from the time component of the datetime. For a date like '2023-08-02 15:23:11', this will return 15.

0    15
1    17
Name: date, dtype: int32

In [None]:
df.date.dt.minute # Extracts the minutes from the time component of the datetime. '2023-08-02 15:23:11' would return 23.

0    23
1    21
Name: date, dtype: int32

In [None]:
df.date.dt.second

0    11
1    45
Name: date, dtype: int32

We can also obtain more complex information from the `datetime` object that does not appear in the tuple we see on screen, such as the day of the week.

In [None]:
df.date.dt.weekday # This extracts the weekday from the date. The days are numbered from 0 (Monday) to 6 (Sunday).

0    2
1    6
Name: date, dtype: int32

In [None]:
df.date.dt.isocalendar() # This returns a DataFrame with the year, week number, and weekday as per ISO 8601.

Unnamed: 0,year,week,day
0,2018,38,3
1,2018,38,7


## `to_datetime()` - create a datetime from several columns month, day, year...

to_datetime() can be used to assemble a datetime from multiple columns as well. The keys (column labels) can be common abbreviations like ['year', 'month', 'day', 'minute', 'second', 'ms', 'us', 'ns']) or plurals of the same.

In [None]:
df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5]})
df['date'] = pd.to_datetime(df)
df

Unnamed: 0,year,month,day,date
0,2015,2,4,2015-02-04
1,2016,3,5,2016-03-05


## Parsing dates when reading a CSV

We can specify which fields are dates so that they are read as such using the parameter `parse_dates=['column_name']`.

The following CSV contains information about football matches, including the date, home and away teams, scores, tournament type, city, country, and whether the match was played at a neutral venue.

In [None]:
url = "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/results.csv"

In [None]:
df = pd.read_csv(url)
df.head()

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1872-11-30,Scotland,England,0.0,0.0,Friendly,Glasgow,Scotland,False
1,1873-03-08,England,Scotland,4.0,2.0,Friendly,London,England,False
2,1874-03-07,Scotland,England,2.0,1.0,Friendly,Glasgow,Scotland,False
3,1875-03-06,England,Scotland,2.0,2.0,Friendly,London,England,False
4,1876-03-04,Scotland,England,3.0,0.0,Friendly,Glasgow,Scotland,False


In [None]:
df.dtypes

date           object
home_team      object
away_team      object
home_score    float64
away_score    float64
tournament     object
city           object
country        object
neutral          bool
dtype: object

In [None]:
df = pd.read_csv(url, parse_dates = ["date"])
df.dtypes

date          datetime64[ns]
home_team             object
away_team             object
home_score           float64
away_score           float64
tournament            object
city                  object
country               object
neutral                 bool
dtype: object

## Setting the date as an index

If you are going to do a lot of selections by date, it would be faster to set the date column as the index first so you take advantage of the Pandas built-in optimization.

Instead of doing

```python
condition = (df['date'] > start_date) & (df['date'] <= end_date)

df.loc[condition]
```

With the date as index, you could do
```python
df.loc[start_date:end_date]
```

In [None]:
df.head(1)

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1872-11-30,Scotland,England,0.0,0.0,Friendly,Glasgow,Scotland,False


In [None]:
# Without the date as index
condition = (df['date'] > '2022-1-1') & (df['date'] <= '2023-1-1')

df.loc[condition]

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
43319,2022-01-02,Gabon,Burkina Faso,0.0,3.0,Friendly,Dubai,United Arab Emirates,True
43320,2022-01-02,Sudan,Zimbabwe,0.0,0.0,Friendly,Yaoundé,Cameroon,True
43321,2022-01-03,Rwanda,Guinea,3.0,0.0,Friendly,Kigali,Rwanda,False
43322,2022-01-04,Mauritania,Gabon,1.0,1.0,Friendly,Dubai,United Arab Emirates,True
43323,2022-01-05,Algeria,Ghana,3.0,0.0,Friendly,Al Rayyan,Qatar,True
...,...,...,...,...,...,...,...,...,...
44055,2022-09-27,Norway,Serbia,0.0,2.0,UEFA Nations League,Oslo,Norway,False
44056,2022-09-27,Sweden,Slovenia,1.0,1.0,UEFA Nations League,Stockholm,Sweden,False
44057,2022-09-27,Kosovo,Cyprus,5.0,1.0,UEFA Nations League,Pristina,Kosovo,False
44058,2022-09-27,Greece,Northern Ireland,3.0,1.0,UEFA Nations League,Athens,Greece,False


In [None]:
# With the date as index
df.set_index("date", drop=True, inplace=True)

In [None]:
df.loc['2022':'2023']

Unnamed: 0_level_0,home_team,away_team,home_score,away_score,tournament,city,country,neutral
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-01-02,Gabon,Burkina Faso,0.0,3.0,Friendly,Dubai,United Arab Emirates,True
2022-01-02,Sudan,Zimbabwe,0.0,0.0,Friendly,Yaoundé,Cameroon,True
2022-01-03,Rwanda,Guinea,3.0,0.0,Friendly,Kigali,Rwanda,False
2022-01-04,Mauritania,Gabon,1.0,1.0,Friendly,Dubai,United Arab Emirates,True
2022-01-05,Algeria,Ghana,3.0,0.0,Friendly,Al Rayyan,Qatar,True
...,...,...,...,...,...,...,...,...
2022-09-27,Norway,Serbia,0.0,2.0,UEFA Nations League,Oslo,Norway,False
2022-09-27,Sweden,Slovenia,1.0,1.0,UEFA Nations League,Stockholm,Sweden,False
2022-09-27,Kosovo,Cyprus,5.0,1.0,UEFA Nations League,Pristina,Kosovo,False
2022-09-27,Greece,Northern Ireland,3.0,1.0,UEFA Nations League,Athens,Greece,False


In [None]:
df.loc['2018-5'].head(3) #get specific month
#To select data with a specific day of the month, for example, 1st May 2018 df.loc['2018-5-1']

Unnamed: 0_level_0,home_team,away_team,home_score,away_score,tournament,city,country,neutral
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-05-06,Barawa,Surrey,1.0,3.0,Friendly,London,England,False
2018-05-08,Iraq,Palestine,0.0,0.0,Friendly,Basra,Iraq,False
2018-05-09,Algeria,Saudi Arabia,0.0,2.0,Friendly,Cádiz,Spain,True


There is no time in this dataframe, but if there were, we could just do:
```python
df.between_time('10:30','10:45')
```

In [None]:
# We can also perform aggregations
# Let's get the total home score in 2018

df.loc['2018',"home_score"].sum()

1409.0

# 💡 Check for understanding

**Analyzing Football Shootouts Data**

The dataset you have is a collection of football shootouts between various teams, with the following columns:
- `date`: The date of the match in the format YYYY-MM-DD.
- `home_team`: The name of the home team.
- `away_team`: The name of the away team.
- `winner`: The name of the winning team.

Your tasks are:

1. **Load the Dataset**: Load the CSV file into a Pandas DataFrame.

2. **Convert Date Column**: Convert the `date` column to a Pandas datetime object. Ensure that the format is correct.

3. **Time Analysis**:
    - Find the earliest and latest dates in the dataset.
    - Extract the month and year from the dates, and create two new columns `month` and `year` in the DataFrame.
    - Find out which specific month and year had the most shootouts in the dataset.   

**Tips**
- Utilize Pandas' date-related functions like `dt.month`, `dt.year`, etc., for extracting components of the date.
- Consider using groupby, aggregation, and sorting to perform the analyses.
- *Hint: To find out which specific month and year had the most shootouts in the dataset, you'll need to group the data by both the year and month, then count the number of occurrences for each group to identify the month and year with the maximum number of shootouts. Check the `size` method.*

In [2]:
import pandas as pd
url = "https://raw.githubusercontent.com/data-bootcamp-v4/data/main/shootouts.csv"
df = pd.read_csv(url)

In [3]:
df.head()

Unnamed: 0,date,home_team,away_team,winner
0,1967-08-22,India,Taiwan,Taiwan
1,1971-11-14,South Korea,Vietnam Republic,South Korea
2,1972-05-17,Thailand,South Korea,South Korea
3,1972-05-19,Thailand,Cambodia,Thailand
4,1973-04-21,Senegal,Ghana,Ghana


In [4]:
df.dtypes

date         object
home_team    object
away_team    object
winner       object
dtype: object

In [5]:
#Convert date column from object to datetime64

df = pd.read_csv(url, parse_dates = ["date"])
df.dtypes

date         datetime64[ns]
home_team            object
away_team            object
winner               object
dtype: object

In [6]:
pd.to_datetime(df['date'])

0     1967-08-22
1     1971-11-14
2     1972-05-17
3     1972-05-19
4     1973-04-21
         ...    
500   2022-03-29
501   2022-06-13
502   2022-06-14
503   2022-09-22
504   2022-09-25
Name: date, Length: 505, dtype: datetime64[ns]

In [9]:
df['date'].min()

Timestamp('1967-08-22 00:00:00')

In [10]:
df['date'].max()

Timestamp('2022-09-25 00:00:00')

In [11]:
df["month"] = df.date.dt.strftime('%B')
df["year"] = df.date.dt.strftime('%Y')
df

Unnamed: 0,date,home_team,away_team,winner,month,year
0,1967-08-22,India,Taiwan,Taiwan,August,1967
1,1971-11-14,South Korea,Vietnam Republic,South Korea,November,1971
2,1972-05-17,Thailand,South Korea,South Korea,May,1972
3,1972-05-19,Thailand,Cambodia,Thailand,May,1972
4,1973-04-21,Senegal,Ghana,Ghana,April,1973
...,...,...,...,...,...,...
500,2022-03-29,Senegal,Egypt,Senegal,March,2022
501,2022-06-13,Australia,Peru,Australia,June,2022
502,2022-06-14,Chile,Ghana,Ghana,June,2022
503,2022-09-22,Thailand,Malaysia,Malaysia,September,2022


In [17]:
df_shootouts = df.groupby([df['month'], df['year']]).size()
month, year = df_shootouts.idxmax()
df_shootouts.idxmax(), df_shootouts[month, year]

(('June', '2016'), 12)

# Summary

This lesson introduced handling dates in Python through the built-in datetime module and the Pandas library.

1. **to_datetime() - Convert String to Datetime**
    - Converts string to datetime64 type.
    - **Format parameter**: Uses combinations of "%" and a letter to represent elements of the date and time.
    - **More parameters**: `dayfirst`, `infer_datetime_format`, and `errors` to handle various date-related situations.

2. **strftime() - Changing the Format of a Datetime**
    - Changes the date format, returning a string.
    - Allows customization of date and time format.

3. **timedelta() - Add/Remove a Period of Time to a Date**
    - Allows addition or subtraction of specific periods to a date.
    - Uses Python's `timedelta` function or Pandas' `to_timedelta()`.

4. **dt Object - Extract Parts of the Date**
    - Extracts specific parts like year, month, day, hour, minute, second.
    - Provides more complex information such as the day of the week.

5. **to_datetime() - Create a Datetime from Several Columns (month, day, year...)**
    - Assembles a datetime from multiple columns using common abbreviations.

6. **Parsing Dates When Reading a CSV**
    - Specifies date fields when reading a CSV using `parse_dates=['column_name']`.

7. **Setting the Date as an Index**
    - Allows faster selections by date.
    - Enables specific date and time operations, like selecting a specific month or performing aggregations.
