## Time Series Data

Time series data is common in many industries, including finance, healthcare, and manufacturing. Pandas provides powerful tools for working with time-indexed data, enabling you to manipulate, analyze, and visualize time-dependent patterns efficiently.

In this module, we will cover:

- Working with datetime objects,
- Time-based indexing and slicing,
- Resampling and aggregation,
- Handling missing time data,
- Rolling and expanding windows,
- Time shifts and differences.

## What is a Time Series?

A time series is a sequence of data points collected or recorded at successive points in time, often at uniform intervals (such as hourly, daily, monthly, or yearly). Unlike other types of data, time series data has a temporal aspect, meaning that time plays a key role in the analysis. The order of the data points matters, and analyzing how the data changes over time is often a central focus.

# Key Characteristics of Time Series Data:
#### Temporal Ordering: 
Time series data is ordered by time, and the sequence in which data points occur is crucial.
Frequency: Time series data can be recorded at various intervals, such as:
- Hourly: Sensor readings from a machine every hour.
- Daily: Stock prices at the end of each trading day.
- Monthly: Monthly sales data for a retail store.
#### Trend and Seasonality: 
Time series data often exhibits trends (long-term upward or downward movements) and seasonality (recurring patterns over time).
#### Examples of Time Series Data
- Stock Market Prices: The closing price of a stock is recorded at the end of each trading day, forming a time series. You could analyze how the price fluctuates daily, weekly, or monthly.

Date	Stock Price
2024-01-01	$150
2024-01-02	$152
2024-01-03	$148
- Weather Data: Daily temperature readings form a time series. You can analyze temperature trends over days, months, or years.

Date	Temperature (°C)
2024-01-01	5.2
2024-01-02	4.8
2024-01-03	6.0
- Sales Data: A retail store’s daily or monthly sales figures are time series data. You could look for trends over time (e.g., sales growing during holiday seasons or dipping during off-seasons).

Month	Sales Amount
2024-01	$10,000
2024-02	$9,500
2024-03	$12,000
- Website Traffic: A website’s hourly or daily visitors form a time series, allowing analysis of how traffic varies throughout the day or week.

Hour	Visitors
09:00 AM	200
10:00 AM	230
11:00 AM	250


## Why is Time Series Data Important?

Time series data allows you to analyze patterns over time and make predictions about future values based on past behavior. It’s used in various domains to answer questions like:

- `Trends`: Is there a long-term upward or downward trend in stock prices or sales?
- `Seasonality`: Are there recurring patterns, such as higher sales during holiday seasons or lower website traffic on weekends?
- `Forecasting`: Can we predict the future temperature, sales, or stock prices based on historical data?

In [2]:
import pandas as pd 

In [3]:
# Sample data with date strings
data = {'Date': ['2024-01-01', '2024-01-02', '2024-01-03'], 'SalesAmount': [200, 150, 300]}
df = pd.DataFrame(data)

In [5]:
# convert date cumns into datetime 

df['Date'] = pd.to_datetime(df['Date'])
print(df['Date'].dtypes)

datetime64[ns]


## Extracting Date Components

In [10]:
df

Unnamed: 0,Date,SalesAmount,Year,Month,Day
0,2024-01-01,200,2024,1,1
1,2024-01-02,150,2024,1,2
2,2024-01-03,300,2024,1,3


In [7]:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
print(df)

        Date  SalesAmount  Year  Month  Day
0 2024-01-01          200  2024      1    1
1 2024-01-02          150  2024      1    2
2 2024-01-03          300  2024      1    3


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Date         3 non-null      datetime64[ns]
 1   SalesAmount  3 non-null      int64         
 2   Year         3 non-null      int32         
 3   Month        3 non-null      int32         
 4   Day          3 non-null      int32         
dtypes: datetime64[ns](1), int32(3), int64(1)
memory usage: 212.0 bytes


In [14]:
print(type(df['Date'].dt.year))

<class 'pandas.core.series.Series'>
