# Time Series
Time Series is a Series of data points indexed (or listed or graphed) in time order. Therefore the data, is organized by relatively deterministic timestamps, and may be compared to random sample data.

Time series data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a time series.

Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values.

While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series.


**How you mark and refer to time series data depends on the application, and you may have one of the following:**
- Predicting Stock price or predicting the weather conditions for tomorrow,time-series has significant role to play.

- Forecasting the birth rate at all hospitals in a city each year.
* Forecasting product sales in units sold each day for a store.
* Forecasting the number of passengers through a train station each day.


In [94]:
from datetime import datetime

### Collect present details 

In [95]:
a =datetime.now()
a

datetime.datetime(2019, 12, 18, 14, 42, 54, 473064)

In [96]:
a.year

2019

In [101]:
a.date()

datetime.date(2019, 12, 18)

In [102]:
print('sum of {} and {} = {}'.format(2,5,2+5))

sum of 2 and 5 = 7


In [103]:
#local date and time

now = datetime.now()
print(now)
print('Date now :{}-{}-{}'.format(now.day, now.month, now.year))
print('Time now :{}:{}:{}'.format(now.hour,now.minute, now.second))

2019-12-18 14:50:58.440518
Date now :18-12-2019
Time now :14:50:58


### Datetime stores both the date and time down to the microsecond. timedelta represents the temporal difference between two datetime objects:

In [104]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(days=926, seconds=56700)

In [107]:
datetime.now() - datetime(1998,5,28,17,45)

datetime.timedelta(days=7873, seconds=76428, microseconds=597375)

In [105]:
delta.seconds

56700

In [108]:
delta.total_seconds()

80063100.0

In [109]:
# using timedelta
from datetime import *
datetime(2011,12,26) + timedelta(365)

datetime.datetime(2012, 12, 25, 0, 0)

### Converting Between String and Datetime

In [117]:
value = '2019 december 12' #'19/12/2019' '2019-12-19', 2019 december 12
datetime.strptime(value, '%Y %B %d')

datetime.datetime(2019, 12, 12, 0, 0)

###  Datetime format specification (ISO C89 compatible)
- %Y - 4-digit year
- %y - 2-digit year
- %m - 2-digit month [01, 12]
- %d - 2-digit day [01, 31]
- %H - Hour (24-hour clock) [00, 23]
- %I - Hour (12-hour clock) [01, 12]
- %M - 2-digit minute [00, 59]
- %S - Second [00, 61] (seconds 60, 61 account for leap seconds)
- %w - Weekday as integer [0 (Sunday), 6]
- %U - Week number of the year [00, 53]. Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”.
- %W - Week number of the year [00, 53]. Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”.
- %z - UTC time zone offset as +HHMM or -HHMM, empty if time zone naive
- %F - Shortcut for %Y-%m-%d, for example 2012-4-18
- %D - Shortcut for %m/%d/%y, for example 04/18/12


##  Locale-specific date formatting
- %a -  Abbreviated weekday name
- %A -  Full weekday name
- %b -  Abbreviated month name
- %B -  Full month name
- %c -  Full date and time, for example ‘Tue 01 May 2012 04:20:57 PM’
- %p -  Locale equivalent of AM or PM
- %x -  Locale-appropriate formatted date; e.g. in US May 1, 2012 yields ’05/01/2012’
- %X -  Locale-appropriate time, e.g. ’04:24:12 PM’

In [127]:
datestrs = ['11-December-2019', '26-December-1993']
[datetime.strptime(x,"%d-%B-%Y") for x in datestrs]

[datetime.datetime(2019, 12, 11, 0, 0), datetime.datetime(1993, 12, 26, 0, 0)]

In [128]:
a= date.today()
a

datetime.date(2019, 12, 18)

In [137]:
a.strftime('%d  %m  %Y')

'18  12  2019'

### A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second.

In [138]:
now = datetime.now()
print(now)
timestamp = datetime.timestamp(now)
print("timestamp =", timestamp)

2019-12-18 15:23:54.071766
timestamp = 1576662834.071766


### Generating data range 
## [offset aliases](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)

In [141]:
pd.date_range('2000-01-01', '2001-01-01', freq='6M')

DatetimeIndex(['2000-01-31', '2000-07-31'], dtype='datetime64[ns]', freq='6M')

In [142]:
pd.date_range('2000-01-01', '2001-01-01', freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30', '2000-12-29'],
              dtype='datetime64[ns]', freq='BM')

In [147]:
rng = pd.date_range('2019-01-01', '2020-01-01', freq='WOM-4FRI')
rng

DatetimeIndex(['2019-01-25', '2019-02-22', '2019-03-22', '2019-04-26',
               '2019-05-24', '2019-06-28', '2019-07-26', '2019-08-23',
               '2019-09-27', '2019-10-25', '2019-11-22', '2019-12-27'],
              dtype='datetime64[ns]', freq='WOM-4FRI')

In [148]:
pd.date_range('2019-01-01', '2020-01-01', freq='D')

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10',
               ...
               '2019-12-23', '2019-12-24', '2019-12-25', '2019-12-26',
               '2019-12-27', '2019-12-28', '2019-12-29', '2019-12-30',
               '2019-12-31', '2020-01-01'],
              dtype='datetime64[ns]', length=366, freq='D')

### Create a Data Series with data range as index

In [156]:
import pandas as pd
import numpy as np
longer_ts = pd.Series(np.random.randn(1000),
                      index=pd.date_range('1/1/2000', periods=1000))
longer_ts

2000-01-01   -0.878505
2000-01-02   -0.413872
2000-01-03   -0.108379
2000-01-04   -1.471584
2000-01-05   -0.796679
2000-01-06    1.370450
2000-01-07   -0.990518
2000-01-08    0.130633
2000-01-09   -0.214298
2000-01-10   -0.383955
2000-01-11    1.685344
2000-01-12    0.498973
2000-01-13    1.387367
2000-01-14   -0.209656
2000-01-15   -0.105931
2000-01-16    0.646030
2000-01-17   -0.292303
2000-01-18   -0.702123
2000-01-19    2.411838
2000-01-20   -0.046116
2000-01-21   -0.962409
2000-01-22   -0.971404
2000-01-23    1.039077
2000-01-24    0.160348
2000-01-25   -1.358724
2000-01-26   -0.625444
2000-01-27    1.257434
2000-01-28    0.353144
2000-01-29    0.153819
2000-01-30    0.346195
                ...   
2002-08-28   -1.518374
2002-08-29   -2.077856
2002-08-30   -1.788688
2002-08-31    0.367998
2002-09-01    0.558159
2002-09-02    0.351742
2002-09-03    1.490216
2002-09-04    0.348507
2002-09-05   -1.285056
2002-09-06    0.206245
2002-09-07    0.711712
2002-09-08   -0.476542
2002-09-09 

In [157]:
longer_ts['2001']

2001-01-01   -0.883426
2001-01-02   -0.459491
2001-01-03    0.704907
2001-01-04   -0.129879
2001-01-05   -0.901947
2001-01-06    0.361665
2001-01-07    1.345538
2001-01-08    0.780203
2001-01-09    0.530875
2001-01-10   -1.136843
2001-01-11   -0.379170
2001-01-12    0.959589
2001-01-13    1.022273
2001-01-14    2.012980
2001-01-15   -0.958426
2001-01-16    0.524657
2001-01-17    0.334868
2001-01-18    1.015291
2001-01-19    0.601446
2001-01-20   -1.057350
2001-01-21    1.522587
2001-01-22    0.677173
2001-01-23   -0.764899
2001-01-24   -0.698372
2001-01-25   -0.503519
2001-01-26    1.949230
2001-01-27    0.193961
2001-01-28    1.250634
2001-01-29   -1.566485
2001-01-30    0.908609
                ...   
2001-12-02    0.588465
2001-12-03    3.276282
2001-12-04   -0.854006
2001-12-05   -0.854345
2001-12-06   -0.318416
2001-12-07    0.544534
2001-12-08   -1.077761
2001-12-09   -0.352911
2001-12-10    0.104301
2001-12-11    0.332317
2001-12-12   -1.603850
2001-12-13    1.041578
2001-12-14 

In [158]:
longer_ts['2001-05']

2001-05-01    0.752489
2001-05-02   -0.046481
2001-05-03   -0.695871
2001-05-04    0.442994
2001-05-05   -0.588336
2001-05-06   -0.221604
2001-05-07   -2.449289
2001-05-08   -0.925252
2001-05-09    0.208047
2001-05-10   -0.775207
2001-05-11   -0.948620
2001-05-12    0.752197
2001-05-13    1.156492
2001-05-14   -2.282072
2001-05-15   -0.056139
2001-05-16   -1.153556
2001-05-17    0.196040
2001-05-18    0.184149
2001-05-19   -1.275212
2001-05-20    0.259945
2001-05-21   -1.805860
2001-05-22    0.090550
2001-05-23   -0.062806
2001-05-24   -0.009436
2001-05-25   -0.732004
2001-05-26    2.261413
2001-05-27   -0.180727
2001-05-28    0.368162
2001-05-29   -1.048837
2001-05-30   -0.110454
2001-05-31   -0.795041
Freq: D, dtype: float64

In [159]:
dates = pd.date_range('1/1/2019', periods=100, freq='W-WED')
dates

DatetimeIndex(['2019-01-02', '2019-01-09', '2019-01-16', '2019-01-23',
               '2019-01-30', '2019-02-06', '2019-02-13', '2019-02-20',
               '2019-02-27', '2019-03-06', '2019-03-13', '2019-03-20',
               '2019-03-27', '2019-04-03', '2019-04-10', '2019-04-17',
               '2019-04-24', '2019-05-01', '2019-05-08', '2019-05-15',
               '2019-05-22', '2019-05-29', '2019-06-05', '2019-06-12',
               '2019-06-19', '2019-06-26', '2019-07-03', '2019-07-10',
               '2019-07-17', '2019-07-24', '2019-07-31', '2019-08-07',
               '2019-08-14', '2019-08-21', '2019-08-28', '2019-09-04',
               '2019-09-11', '2019-09-18', '2019-09-25', '2019-10-02',
               '2019-10-09', '2019-10-16', '2019-10-23', '2019-10-30',
               '2019-11-06', '2019-11-13', '2019-11-20', '2019-11-27',
               '2019-12-04', '2019-12-11', '2019-12-18', '2019-12-25',
               '2020-01-01', '2020-01-08', '2020-01-15', '2020-01-22',
      

In [160]:
long_df = pd.DataFrame(np.random.randn(100, 4), 
                        index=dates, 
                       columns=['Colorado','Texas','New York', 'Ohio'])

In [164]:
long_df.head(3)

Unnamed: 0,Colorado,Texas,New York,Ohio
2019-01-02,0.44756,-0.281495,0.994874,1.778436
2019-01-09,0.197519,0.173006,0.175385,2.686081
2019-01-16,-0.429121,-0.886166,-0.297629,0.182272


In [165]:
long_df.loc['5-2019']

Unnamed: 0,Colorado,Texas,New York,Ohio
2019-05-01,-1.660863,0.238315,0.671502,1.115286
2019-05-08,-0.015231,0.333527,1.125162,-0.945032
2019-05-15,-0.211318,1.257821,-0.64777,1.629351
2019-05-22,-0.511379,0.810453,0.224623,-0.537365
2019-05-29,-0.301913,0.362663,-0.515643,-0.398048


In [168]:
pd.date_range(start='2012-04-01',end ='2012-06-01',freq="W-Thu")

DatetimeIndex(['2012-04-05', '2012-04-12', '2012-04-19', '2012-04-26',
               '2012-05-03', '2012-05-10', '2012-05-17', '2012-05-24',
               '2012-05-31'],
              dtype='datetime64[ns]', freq='W-THU')

In [174]:
#freq can also be specified as an Offset object.
pd.date_range(start='1/1/2020', periods=5, freq=pd.offsets.MonthEnd(3))

DatetimeIndex(['2020-01-31', '2020-04-30', '2020-07-31', '2020-10-31',
               '2021-01-31'],
              dtype='datetime64[ns]', freq='3M')

In [173]:
#freq can also be specified as an Offset object.
pd.date_range(start='1/1/2019', periods=5, freq='3M')

DatetimeIndex(['2019-01-31', '2019-04-30', '2019-07-31', '2019-10-31',
               '2020-01-31'],
              dtype='datetime64[ns]', freq='3M')

###  Hours and Minutes

In [178]:
from pandas.tseries.offsets import Hour, Minute
hour = Hour()
hour

<2 * Hours>

In [179]:
pd.date_range('2000-01-01', '2000-01-03 23:59', freq='4H')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [180]:
pd.date_range('2019-01-01', periods=10, freq='1H30MIN')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

### Iterate over the dates and print the number and name of the weekday

In [187]:
s = pd.date_range('2019-12-18',periods = 10)

for day in s:
    print(day.day,day.dayofweek,day.day_name())

18 2 Wednesday
19 3 Thursday
20 4 Friday
21 5 Saturday
22 6 Sunday
23 0 Monday
24 1 Tuesday
25 2 Wednesday
26 3 Thursday
27 4 Friday


### Shifting (Leading and Lagging) Data
ts.shift() - Shift index by desired number of periods with an optional time freq
                            (or)
“Shifting” refers to moving data backward and forward through time. Both Series and DataFrame have a shift method for doing naive shifts forward or backward, leaving the index unmodified:

In [192]:
ts = pd.Series(np.random.randn(4),
               index=pd.date_range('2002', periods=4, freq='A'))
ts

2002-12-31    1.391166
2003-12-31   -0.921772
2004-12-31    0.103192
2005-12-31    0.460523
Freq: A-DEC, dtype: float64

In [190]:
ts.shift(2)

2002-01-31         NaN
2002-02-28         NaN
2002-03-31    0.773425
2002-04-30    0.305569
Freq: M, dtype: float64

In [191]:
ts.shift(-2)

2002-01-31   -0.199929
2002-02-28   -2.591069
2002-03-31         NaN
2002-04-30         NaN
Freq: M, dtype: float64

In [193]:
ts.shift(-2, freq='M')

2002-10-31    1.391166
2003-10-31   -0.921772
2004-10-31    0.103192
2005-10-31    0.460523
Freq: A-DEC, dtype: float64

In [198]:
ts.shift(3, freq='D')

2003-01-03    1.391166
2004-01-03   -0.921772
2005-01-03    0.103192
2006-01-03    0.460523
dtype: float64

### Shifting dates with offsets 


In [214]:
from pandas.tseries.offsets import Day, MonthBegin

now = datetime(2019,12,18)
now

datetime.datetime(2019, 12, 18, 0, 0)

In [215]:

now + 2 *Day()

Timestamp('2019-12-20 00:00:00')

In [216]:
now + 2*MonthEnd()

Timestamp('2020-01-31 00:00:00')

In [217]:
now + Day()

Timestamp('2019-12-19 00:00:00')

In [218]:
now + MonthEnd(2)

Timestamp('2020-01-31 00:00:00')

### Anchored offsets can explicitly “roll” dates forward or backward by simply using their rollforward and rollback methods, respectively:


In [223]:
offset = MonthBegin()
offset,now

(<MonthBegin>, datetime.datetime(2019, 12, 18, 0, 0))

In [224]:
offset.rollforward(now)

Timestamp('2020-01-01 00:00:00')

In [225]:
offset.rollback(now)

Timestamp('2019-12-01 00:00:00')

### Time Zone Handling
In Python, time zone information comes from the third-party pytz library (installable with pip or conda), This is especially important for historical data because the daylight saving time (DST) transition dates (and even UTC offsets) have been changed numerous times depending on the whims of local governments


In [226]:
import pytz

In [227]:
pytz.common_timezones

['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', 'Africa/Asmara', 'Africa/Bamako', 'Africa/Bangui', 'Africa/Banjul', 'Africa/Bissau', 'Africa/Blantyre', 'Africa/Brazzaville', 'Africa/Bujumbura', 'Africa/Cairo', 'Africa/Casablanca', 'Africa/Ceuta', 'Africa/Conakry', 'Africa/Dakar', 'Africa/Dar_es_Salaam', 'Africa/Djibouti', 'Africa/Douala', 'Africa/El_Aaiun', 'Africa/Freetown', 'Africa/Gaborone', 'Africa/Harare', 'Africa/Johannesburg', 'Africa/Juba', 'Africa/Kampala', 'Africa/Khartoum', 'Africa/Kigali', 'Africa/Kinshasa', 'Africa/Lagos', 'Africa/Libreville', 'Africa/Lome', 'Africa/Luanda', 'Africa/Lubumbashi', 'Africa/Lusaka', 'Africa/Malabo', 'Africa/Maputo', 'Africa/Maseru', 'Africa/Mbabane', 'Africa/Mogadishu', 'Africa/Monrovia', 'Africa/Nairobi', 'Africa/Ndjamena', 'Africa/Niamey', 'Africa/Nouakchott', 'Africa/Ouagadougou', 'Africa/Porto-Novo', 'Africa/Sao_Tome', 'Africa/Tripoli', 'Africa/Tunis', 'Africa/Windhoek', 'America/Adak', 'America/Anchorage', 'Amer

In [228]:
tz = pytz.timezone('Asia/Kolkata')
tz

<DstTzInfo 'Asia/Kolkata' LMT+5:53:00 STD>

In [229]:
stamp_moscow = pd.Timestamp('2011-03-12 04:00', tz='Asia/Kolkata')

In [230]:
stamp_moscow

Timestamp('2011-03-12 04:00:00+0530', tz='Asia/Kolkata')