# Time Series Basics with Pandas

## What is the time series?

#### A time series is a sequence of data points that occur in sequential order over a given time period. It is a data structure that represents a series of data points indexed or organized in chronological order. It is specifically designed to handle and analyze data that changes over time, such as stock prices, temperature readings, sales figures, or any other data recorded at regular time intervals.

## Working with Time series in Pandas

# Import Necessary libraries

In [1]:
import numpy as np
import pandas as pd
from datetime import datetime

## Create a list called "date" containing five elements, which are instances of the "datetime" class from the Python's built-in "datetime" module.

In [2]:
date = [datetime(2020 ,1,5),
       datetime(2020, 1, 10),
       datetime(2020, 1, 15),
       datetime(2020, 1, 20),
       datetime(2020 , 1 , 25)]
date

[datetime.datetime(2020, 1, 5, 0, 0),
 datetime.datetime(2020, 1, 10, 0, 0),
 datetime.datetime(2020, 1, 15, 0, 0),
 datetime.datetime(2020, 1, 20, 0, 0),
 datetime.datetime(2020, 1, 25, 0, 0)]

## How to create a time series data structure using pandas.lets make use of Numpy array to form random numbers , also to create the sereis lets make use of Pandas pd.Series

In [3]:
ts = pd.Series(np.random.randn(5), index = date)
ts

2020-01-05    1.992497
2020-01-10    1.448087
2020-01-15    0.146658
2020-01-20   -0.081126
2020-01-25    0.395909
dtype: float64

## Retrieve the index associated with the Series ts.

In [4]:
ts.index

DatetimeIndex(['2020-01-05', '2020-01-10', '2020-01-15', '2020-01-20',
               '2020-01-25'],
              dtype='datetime64[ns]', freq=None)

# Time Series Data Structures

## Convert a given date string ("01/01/2020") into a pandas datetime object.

In [5]:
pd.to_datetime("01/01/2020")

Timestamp('2020-01-01 00:00:00')

## Create a pandas DatetimeIndex object called dates by converting a list of date representations into pandas datetime objects.

In [6]:
dates = pd.to_datetime([datetime (2020,7,5),
                       "6th of july 2020",
                       "2020-Jul-7",
                       "20200708"])
dates

DatetimeIndex(['2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08'], dtype='datetime64[ns]', freq=None)

## Convert a pandas DatetimeIndex object (dates) into a PeriodIndex object with a daily frequency.

In [8]:
dates.to_period("D")

PeriodIndex(['2020-07-05', '2020-07-06', '2020-07-07', '2020-07-08'], dtype='period[D]')

## Perform an operation to calculate the time difference between each date in the dates index and the first date in the index (dates[0]).

In [9]:
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '2 days', '3 days'], dtype='timedelta64[ns]', freq=None)

# Creating a Time Series

## Create a date range using the pd.date_range() function from the pandas library. Take two arguments:

The first argument is the starting date of the range, "2020-08-15" in this case.

The second argument is the end date of the range, "2020-09-01" in this case.

In [10]:
pd.date_range("2020-08-15", "2020-09-01")

DatetimeIndex(['2020-08-15', '2020-08-16', '2020-08-17', '2020-08-18',
               '2020-08-19', '2020-08-20', '2020-08-21', '2020-08-22',
               '2020-08-23', '2020-08-24', '2020-08-25', '2020-08-26',
               '2020-08-27', '2020-08-28', '2020-08-29', '2020-08-30',
               '2020-08-31', '2020-09-01'],
              dtype='datetime64[ns]', freq='D')

## Use the pd.date_range() function from the pandas library to create a date range. Generate a sequence of dates starting from a specified date and with a specified number of periods. Take two arguments:

The first argument is the starting date of the range, which is '2020-07-15' in this case.

The second argument is periods, which specifies the number of periods or dates to generate. Here, it is set to 10.

In [11]:
pd.date_range("2020-08-15", periods = 10)

DatetimeIndex(['2020-08-15', '2020-08-16', '2020-08-17', '2020-08-18',
               '2020-08-19', '2020-08-20', '2020-08-21', '2020-08-22',
               '2020-08-23', '2020-08-24'],
              dtype='datetime64[ns]', freq='D')

## Take the reference from above code and generate a sequence of dates at hourly intervals within a specified range.

In [12]:
pd.date_range("2020-08-15", periods= 10 , freq = 'H')

DatetimeIndex(['2020-08-15 00:00:00', '2020-08-15 01:00:00',
               '2020-08-15 02:00:00', '2020-08-15 03:00:00',
               '2020-08-15 04:00:00', '2020-08-15 05:00:00',
               '2020-08-15 06:00:00', '2020-08-15 07:00:00',
               '2020-08-15 08:00:00', '2020-08-15 09:00:00'],
              dtype='datetime64[ns]', freq='H')

## Generate a sequence of periods at monthly intervals within a specified range. 

In [13]:
pd.period_range("2020-10", periods = 10, freq="M")

PeriodIndex(['2020-10', '2020-11', '2020-12', '2021-01', '2021-02', '2021-03',
             '2021-04', '2021-05', '2021-06', '2021-07'],
            dtype='period[M]')

## Generate a sequence of time deltas at hourly intervals within a specified range. Use the pd.timedelta_range() function from the pandas library to create a range of time deltas with a specific frequency.

Hint : Take Three arguments

Starting time delta of the range = 0 in this case as it represents the initial duration or time difference.

periods = 8 as it specifies the number of periods or time deltas

freq = "H" as hourly intervals should be taken in action



In [14]:
pd.timedelta_range(0 , periods=8 , freq= "H")

TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                '0 days 06:00:00', '0 days 07:00:00'],
               dtype='timedelta64[ns]', freq='H')

## Retrieve the value at index position 1 of the index of the pandas Series ts. It then should be assigned to the variable `stamp`.

In [15]:
stamp= ts.index[1]
stamp

Timestamp('2020-01-10 00:00:00')

## Retrieve the value from a pandas Series ts at the specific index `stamp`.

In [16]:
ts[stamp]

1.4480870493470355

## Try to retrieve the value from the Series `ts` that corresponds to the index label` "25.1.2020"`. Attempt to access the element from the Series based on the provided label.

In [17]:
ts["25.1.2020"]

0.39590939738847947

## Access the value in the Series `ts` that corresponds to the index label `"20200125"`. It should retrieve the element from the Series based on the provided label.

In [18]:
ts["20200125"]

0.39590939738847947

## Create a pandas Series long_ts with 1000 random values and assign it a datetime index starting from January 1, 2020.

In [20]:
long_ts = pd.Series(np.random.randn(1000), index = pd.date_range("1/1/2020", periods=1000) )
long_ts.head()

2020-01-01   -0.697040
2020-01-02    0.288077
2020-01-03   -2.320190
2020-01-04   -0.769711
2020-01-05    0.195912
Freq: D, dtype: float64

## Retrieve the first few elements of a pandas Series `long_ts` that correspond to the year 2020.

In [21]:
long_ts["2020"].head()

2020-01-01   -0.697040
2020-01-02    0.288077
2020-01-03   -2.320190
2020-01-04   -0.769711
2020-01-05    0.195912
Freq: D, dtype: float64

## Retrieve the first 15 elements of a pandas Series `long_ts` that correspond to the month of October 2020.

In [22]:
long_ts["2020-10"].head(15)

2020-10-01   -0.184095
2020-10-02    2.517817
2020-10-03    1.206873
2020-10-04    1.943541
2020-10-05    1.584399
2020-10-06    1.658162
2020-10-07    0.132665
2020-10-08    0.885443
2020-10-09    0.634043
2020-10-10   -1.357682
2020-10-11    0.114963
2020-10-12    1.379589
2020-10-13   -1.881778
2020-10-14    0.864461
2020-10-15    0.727835
Freq: D, dtype: float64

## Retrieve a subset of a pandas Series `long_ts` starting from a specific date, in this case, `September 20, 2022`, and extending until the end of the Series.

In [23]:
long_ts[datetime(2022,9,20):]

2022-09-20   -1.784143
2022-09-21   -0.879286
2022-09-22   -1.128714
2022-09-23    0.189311
2022-09-24   -0.095503
2022-09-25   -0.869514
2022-09-26   -0.990770
Freq: D, dtype: float64

# The Important Methods Used in Time Series

In [24]:
ts

2020-01-05    1.992497
2020-01-10    1.448087
2020-01-15    0.146658
2020-01-20   -0.081126
2020-01-25    0.395909
dtype: float64

## Modify the Series ts by removing all the elements that come after the specified date `("1/15/2020")`. It keeps only the elements that fall on or before the specified date.

In [25]:
ts.truncate(after="1/15/2020")

2020-01-05    1.992497
2020-01-10    1.448087
2020-01-15    0.146658
dtype: float64

## Generate a sequence of dates starting from January 1, 2020, and continuing for 100 periods at a weekly frequency, with each week ending on a Sunday.

In [26]:
date=pd.date_range("1/1/2020",periods=100, freq="W-SUN")

In [27]:
date

DatetimeIndex(['2020-01-05', '2020-01-12', '2020-01-19', '2020-01-26',
               '2020-02-02', '2020-02-09', '2020-02-16', '2020-02-23',
               '2020-03-01', '2020-03-08', '2020-03-15', '2020-03-22',
               '2020-03-29', '2020-04-05', '2020-04-12', '2020-04-19',
               '2020-04-26', '2020-05-03', '2020-05-10', '2020-05-17',
               '2020-05-24', '2020-05-31', '2020-06-07', '2020-06-14',
               '2020-06-21', '2020-06-28', '2020-07-05', '2020-07-12',
               '2020-07-19', '2020-07-26', '2020-08-02', '2020-08-09',
               '2020-08-16', '2020-08-23', '2020-08-30', '2020-09-06',
               '2020-09-13', '2020-09-20', '2020-09-27', '2020-10-04',
               '2020-10-11', '2020-10-18', '2020-10-25', '2020-11-01',
               '2020-11-08', '2020-11-15', '2020-11-22', '2020-11-29',
               '2020-12-06', '2020-12-13', '2020-12-20', '2020-12-27',
               '2021-01-03', '2021-01-10', '2021-01-17', '2021-01-24',
      

## Create a pandas DataFrame long_df with 100 rows and 4 columns. Assign random values to the DataFrame and sets the index as the previously generated date sequence. The columns of the DataFrame are labeled as "A", "B", "C", and "D".

In [29]:
long_df = pd.DataFrame(np.random.randn(100,4),
                      index = date,
                      columns= list("ABCD"))
long_df

Unnamed: 0,A,B,C,D
2020-01-05,0.276530,0.100734,0.335528,-0.651135
2020-01-12,0.264843,-0.853976,-0.185890,-0.139381
2020-01-19,-0.790441,-1.390141,0.220222,0.938405
2020-01-26,-0.251019,-0.191315,1.695448,-0.953687
2020-02-02,0.427040,0.583313,-1.083852,0.862135
...,...,...,...,...
2021-10-31,-0.348841,0.367758,0.032431,0.665362
2021-11-07,1.171098,-1.003783,-0.877122,1.052987
2021-11-14,-0.234984,-0.659181,0.386292,0.851238
2021-11-21,-2.223043,0.779961,0.628473,-0.261832


## Define a pandas DatetimeIndex date with a list of dates. It should include multiple occurrences of some dates, such as "1/2/2020", to demonstrate that the same date can appear multiple times.

In [30]:
date = pd.DatetimeIndex(["1/2/2020", "1/2/2020", "1/2/2020","1/2/2020","1/2/2020"])
ts1 = pd.Series(np.arange(5), index= date)
ts1

2020-01-02    0
2020-01-02    1
2020-01-02    2
2020-01-02    3
2020-01-02    4
dtype: int64

## Check whether the index of a pandas Series ts1 contains unique values or not.

In [32]:
ts1.index.is_unique

False

## Group the elements of a pandas Series ts1 based on the values of its index at level 0.

In [33]:
group = ts1.groupby(level= 0)

## Calculate the count of elements in each group of a pandas GroupBy object group.

In [34]:
group.count()

2020-01-02    5
dtype: int64

## Calculate the mean value for each group of a pandas GroupBy object group.