<a href="https://colab.research.google.com/github/OptimalDecisions/sports-analytics-foundations/blob/main/pandas-basics/Pandas_Intermediate_2_9_Time_Based_Index.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


  ## Pandas Basics 2.9

  # Making Time to be the Index

  <img src = "../img/sa_logo.png" width="100" align="left">

  Ram Narasimhan

  <br><br><br>

  << [Writing to Files](Pandas_Basics_2_7_Writing_to_Files.ipynb) | [Time Series](Pandas_Intermediate_2_8_Time_Series.ipynb) | [Merging Dataframes](Pandas_Intermediate_2_9_Merging_DataFrames.ipynb) >>



## Using Time Column as the Index



Pandas has one nice functionality. The Indices (the "row names") do not have to be integers. They could be any label, including Time.

Sometimes, it might be advantageous to set the "Year" column as the new index, the benefits of doing so, and how to leverage the index for various operations.




In [1]:
import pandas as pd

# Data for ICC Cricket World Cup winners and host countries
data = {
    'Year': [1975, 1979, 1983, 1987, 1992, 1996, 1999, 2003, 2007, 2011, 2015, 2019],
    'Winner': ['West Indies', 'West Indies', 'India', 'Australia', 'Pakistan', 'Sri Lanka',
               'Australia', 'Australia', 'Australia', 'India', 'Australia', 'England'],
    'Host': ['England', 'England', 'England', 'India', 'Australia', 'India',
             'England', 'South Africa', 'West Indies', 'India, Sri Lanka, Bangladesh', 'Australia, New Zealand', 'England']
}

# Create DataFrame
df = pd.DataFrame(data)

df


Unnamed: 0,Year,Winner,Host
0,1975,West Indies,England
1,1979,West Indies,England
2,1983,India,England
3,1987,Australia,India
4,1992,Pakistan,Australia
5,1996,Sri Lanka,India
6,1999,Australia,England
7,2003,Australia,South Africa
8,2007,Australia,West Indies
9,2011,India,"India, Sri Lanka, Bangladesh"


### Why Set the "Year" Column as the Index?

Setting the "Year" column as the index in the DataFrame can offer several advantages:

**Efficient Data Retrieval**: When the index is set to the "Year" column, Pandas can efficiently locate and retrieve data for specific years, making it faster to access information about winners in a particular year.

**Enhanced Readability**: The index provides a more meaningful and intuitive way to reference rows in the DataFrame, especially when dealing with time-series data such as World Cup winners.



### How to Set the "Year" Column as the Index:



In [2]:
# Assuming 'df' is the DataFrame containing ICC Cricket World Cup winners
df.set_index('Year', inplace=True)

# Convert the Year to be in datetime format
df.index = pd.to_datetime(df.index, format='%Y')



This operation sets the "Year" column as the index in the DataFrame (inplace=True modifies the DataFrame in place).

In [3]:
df

Unnamed: 0_level_0,Winner,Host
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
1975-01-01,West Indies,England
1979-01-01,West Indies,England
1983-01-01,India,England
1987-01-01,Australia,India
1992-01-01,Pakistan,Australia
1996-01-01,Sri Lanka,India
1999-01-01,Australia,England
2003-01-01,Australia,South Africa
2007-01-01,Australia,West Indies
2011-01-01,India,"India, Sri Lanka, Bangladesh"


### Leveraging the Index

Print Winners in a Specific Year

In [4]:
# Print the winner for the year 2011
print(df.loc["2011"])

           Winner                          Host
Year                                           
2011-01-01  India  India, Sri Lanka, Bangladesh


Slicing the Index for Time-Based Selection



In [None]:
# Select winners for the years 1996 to 2007 (inclusive)
df["1996":"2007"]['Winner']

Year
1996-01-01    Sri Lanka
1999-01-01    Australia
2003-01-01    Australia
2007-01-01    Australia
Name: Winner, dtype: object

In [None]:
df[:"2000"] # Display rows before the Year 2000

Unnamed: 0_level_0,Winner,Host
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
1975-01-01,West Indies,England
1979-01-01,West Indies,England
1983-01-01,India,England
1987-01-01,Australia,India
1992-01-01,Pakistan,Australia
1996-01-01,Sri Lanka,India
1999-01-01,Australia,England


### Advantages and Considerations for time-based Indexing

- Efficient Searches: With the "Year" column as the index, searching for information about specific years becomes more efficient.

- Time-Based Slicing: The index allows for easy time-based slicing, making it convenient to extract data for a range of years.

- Improved Readability: The index provides a clear and meaningful label for each row, enhancing the readability of the DataFrame.



Summary

Setting a column as the index in a Pandas DataFrame, especially for time-series data, offers improved efficiency, readability, and ease of data retrieval. In the case of the ICC Cricket World Cup winners DataFrame, setting the "Year" column as the index allows for convenient access to information about winners in specific years and facilitates time-based slicing operations.



<< [Writing to Files](Pandas_Basics_2_7_Writing_to_Files.ipynb) | [Time Series](Pandas_Intermediate_2_8_Time_Series.ipynb) | [Merging Dataframes](Pandas_Intermediate_2_9_Merging_DataFrames.ipynb) >>