In [29]:
# Import the dependencies and libraries:
import pandas as pd
from pathlib import Path

In [30]:
# IMPORT S&P 500 DATA
# A common example of time series data in the FinTech space involves the stock market.
# This is because we can measure the price of each stock at specific intervals throughout the trading day.
# Such as every minute, hour, or day - the latter of which gives us the closing price.
# We'll analyze the S&P 500, which is the index of the top 500 public stocks in the United States.

# Read the S&P 500 CSV data into a DataFrame:
sp500_df = pd.read_csv(
    Path('sp500.csv')
)

# Display the first and last five rows of data:
display(sp500_df.head())
display(sp500_df.tail())

Unnamed: 0,time,close
0,2019-01-02 12:45:00+00:00,246.16
1,2019-01-02 13:00:00+00:00,246.15
2,2019-01-02 13:15:00+00:00,245.5
3,2019-01-02 13:30:00+00:00,245.71
4,2019-01-02 13:45:00+00:00,245.76


Unnamed: 0,time,close
9323,2019-12-30 20:30:00+00:00,321.1
9324,2019-12-30 20:45:00+00:00,321.12
9325,2019-12-30 21:00:00+00:00,321.23
9326,2019-12-30 21:30:00+00:00,321.23
9327,2019-12-30 21:45:00+00:00,321.2


In [31]:
# Now let's examine the data types of the data that we just read using the `dtypes` function:
sp500_df.dtypes

time      object
close    float64
dtype: object

In [32]:
# Notice that the output states that the time column is an `object` data type. Why is that?
# These functions will read the data and time values contained in the 'time' column as generic strings and not work as intended.
# Because we want to do complex calculations with our date and time information, let's start converting them.

# ON THE JOB:
# As a FinTech professional, you'll encounter business scenarios thata have a global impact.
# In these scenarios, data will come from different countries or regions and have different formats.
# Because of the importance of time in financial decisions, the ability to deal with the different formats of dates and times is a crucial skill that can save you time and money.

In [33]:
# WORK WITH DATETIME OBJECTS
# Python has many types of data structures for storing time data.
# Python considers dates and times to have the following elements:
    # 1. The date, which includes the year, month, and day.
    # 2. The time, which includes the hour, minute, and second.
    # 3. The time zone.
# Thus, a `date` object stores the year, month, and day.
# A `time` object stores the hour, minute, second, and sometimes, the time zone.
# A `datetime` object stores both a `date` and a `time` object.
# The `datetime` format is the most common one that you'll work with.
# To demonstrate how this format appears, the following code block shoes one of these objects in our data:

# Reviewing the time value from index position 0:
sp500_df['time'][0]

'2019-01-02 12:45:00+00:00'

In [34]:
# The value of the output has a lot of date and time information, so let's break it down.
# First, the year, month, and day seem clear.
# Then, we can observe the hour, minute, and second.
# Finally, the value has a plus sign followed by '00:00'.

In [35]:
# WHAT'S COORDINATED UNIVERSAL TIME?
# It turns out that a time standard exists that anyone in the world can use to standardize an exact moment in time - regardless of the location.
# This is called COORDINATED UNIVERSAL TIME (UTC).
# It's a global time standard that doesn't adjust for daylight savings time, which differentiates it from Greenwich Mean Time (GMT).
# Whenever a timestamp includes a plus sign or a minus sing, it refers to the time zone information.
# Specifically, a plus sign or minus sign is followed by the number of hours that we need to add or subtract from the UTC to get the correct time zone.
# Common time zones for stock data inclue `-05:00` for New York in standard time and `-04:00` for New York in daylight savings time.
# Note that the UTC time standard also matches London's time zone at `+00:00`.

In [36]:
# USE PANDAS FOR DATETIME OBJECTS
# With Pandas, we can use `datetime` objects to do mathematical and other programming operations on dates and times.
# For example, we can get today's date (and the current time) by running the following function.

# Getting the current date and time:
pd.to_datetime('today')

Timestamp('2024-01-22 18:01:55.441611')

In [37]:
# Calling the `to_datetime` function and passing it a parameter of 'today' returns an object called `Timestamp`, which contains the following:
    # 1. The `date` and `time` information of the user's current date in the form of year-month-day.
    # 2. the time in the format of hours-minutes-seconds-miliseconds.

# NOTE:
# Timestamp is the Pandas equivalent of Python's `datetime` object.
# It's used for entries that make up the Pandas `datetimeindex` and other time oriented data structures.

# We often find this function convenient, for example, whn we want to use an API to pull financial market data that ranges from a particular time in the past to today.

In [38]:
# USE THE TIME-RELATED FUNCTIONS
# Now that we understand the format of our time series, we can use the `pd.to_datetime` function to conver the 'time' column of our S&P 500 data to a `datetime` Series.
# We can then use various time-related functions such as `pd.date_range`, which allows us to generate dates spanning a specific period.
# Let's convert our series to a `datetime` version:

# Transform the time column to a datetime data type:
sp500_df['time'] = pd.to_datetime(
    sp500_df['time'],
    infer_datetime_format=True,
    utc=True
)

# Verify the datatype transformation using the info function:
display(sp500_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9328 entries, 0 to 9327
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype              
---  ------  --------------  -----              
 0   time    9328 non-null   datetime64[ns, UTC]
 1   close   9328 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(1)
memory usage: 145.9 KB


None

In [39]:
# The preceding code calls the `pd.to_datetime` function, which accepts three parameters:
    # 1. The series that we want to convert.
    # 2. The `infer_datetime_format` parameter: As with the `read_csv` function, this parameter tells Pandas whether to guess the format of the time data.
    # 3. the `utc` parameter: We set `utc=True` because we know that the time zone in our data has the UTC format.
    
# NOTE:
# After running this oce, you should observe in the output that the time series has the `datetime64[ns, UTC]` data type.
# The `64` in the `datetime` name might seem mysterious, but it just relates to the way that the data is stored - That is, in a 64-bit format.

# After we transform the 'time' column in our S&P 500 DataFrame, every value in this series has the `datetime` data type.
# We then have access to all the native features that `datetime` provieds.
# For example, we can do arithmetic operations on dates.
# We can also use the convenient Pandas time series functions, such as querying data by using date ranges.

In [40]:
# CONVERT UTC DATA TO A SPECIFIC TIME ZONE
# One of the most relevant stock markets is based in New York's Eastern Time Zone.
# So let's convert our time series to that time zone.
# Then, we can make a bit more sense about how the S&P 500 behaves.
# Because the 'time' column in our DataFrame already has a `datetime` format, we can use the Pandas `dt.tz_convert` function to convert the series to the `US/Eastern` time zone.

# Convert the time column to the US/Eastern timezone:
sp500_df['time'] = sp500_df['time'].dt.tz_convert('US/Eastern')

# Verify the data type transformation using the info function:
sp500_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9328 entries, 0 to 9327
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype                     
---  ------  --------------  -----                     
 0   time    9328 non-null   datetime64[ns, US/Eastern]
 1   close   9328 non-null   float64                   
dtypes: datetime64[ns, US/Eastern](1), float64(1)
memory usage: 145.9 KB


In [43]:
# In the preceding output, note that the format of the 'time' column in our `datetime` data now uses the `US/Eastern` time zone.
# Review the DataFrame with the new timezone information:
display(sp500_df.head())
display(sp500_df.tail())

Unnamed: 0,time,close
0,2019-01-02 07:45:00-05:00,246.16
1,2019-01-02 08:00:00-05:00,246.15
2,2019-01-02 08:15:00-05:00,245.5
3,2019-01-02 08:30:00-05:00,245.71
4,2019-01-02 08:45:00-05:00,245.76


Unnamed: 0,time,close
9323,2019-12-30 15:30:00-05:00,321.1
9324,2019-12-30 15:45:00-05:00,321.12
9325,2019-12-30 16:00:00-05:00,321.23
9326,2019-12-30 16:30:00-05:00,321.23
9327,2019-12-30 16:45:00-05:00,321.2


In [None]:
# We can now more easily interpret this time data - because we converted it to have the proper time zone and format that we need to parse it.