### Chapter 2: Reading Time Series Data from Files 


In [2]:
import pandas as pd 
from pathlib import Path

#### Read data from a CSV file 

In [3]:
filepath =\
Path('../TimeSeriesAnalysisWithPythonCookbook/Data/movieboxoffice.csv')

In [4]:
ts = pd.read_csv(filepath, 
                 header=0,
                 parse_dates=['Date'],
                 index_col=0,
                 infer_datetime_format=True,
                 usecols=['Date', 'DOW', 'Daily', 'Forecast', 'Percent Diff'])
ts.head(5)

Unnamed: 0_level_0,DOW,Daily,Forecast,Percent Diff
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-04-26,Friday,"$125,789.89","$235,036.46",-46.48%
2021-04-27,Saturday,"$99,374.01","$197,622.55",-49.72%
2021-04-28,Sunday,"$82,203.16","$116,991.26",-29.74%
2021-04-29,Monday,"$33,530.26","$66,652.65",-49.69%
2021-04-30,Tuesday,"$30,105.24","$34,828.19",-13.56%


In [5]:
# Print a summary of the DataFrame
ts.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 128 entries, 2021-04-26 to 2021-08-31
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   DOW           128 non-null    object
 1   Daily         128 non-null    object
 2   Forecast      128 non-null    object
 3   Percent Diff  128 non-null    object
dtypes: object(4)
memory usage: 5.0+ KB


In [6]:
# Removing any non-numeric characters from the columns
clean = lambda x: x.str.replace('[^\d]', '', regex=True)
c_df = ts[['Daily', 'Forecast']].apply(clean, axis=1)
ts[['Daily', 'Forecast']] = c_df.astype(float)

In [7]:
ts.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 128 entries, 2021-04-26 to 2021-08-31
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   DOW           128 non-null    object 
 1   Daily         128 non-null    float64
 2   Forecast      128 non-null    float64
 3   Percent Diff  128 non-null    object 
dtypes: float64(2), object(2)
memory usage: 5.0+ KB


In [8]:
# To get the exact memory usage for each column 
ts.memory_usage()

Index           1024
DOW             1024
Daily           1024
Forecast        1024
Percent Diff    1024
dtype: int64

In [9]:
# Total memory usage
ts.memory_usage().sum()

5120

There are situations where _parse_dates_ may not work! 
* in such cases the columns will be returned unchanged, and no error will be thrown 

This is where the _date_parser_ parameter can be useful! 

For example we can pass a lambda function that uses the _to_datetime_ function in pandas to _date_parser_. We can specify the string representation for the date forma inside _to_datetime()_

* %d represents the day of the month, such as 01 or 02 
* %b represents the abbreviated month name, such as Apr or May
* %y represents a two-digit year, such as 19 or 20 

Other common string codes include the following: 

* %Y represents the year as a for-digit number, such as 2020 or 2021
* %B represent the month's full name, such as January or February
* %m represents the month as a two-digit number such as 01 or 02

The _infer_datetime_format_ parameter in read_csv() function can speed up the parsing by 5-10x.

In [10]:
date_parser = lambda x: pd.to_datetime(x, format="%d-%b-%y")
ts = pd.read_csv(filepath, 
                 parse_dates=[0], 
                 index_col=0,
                 date_parser=date_parser,
                 usecols=[0,1,3,7,6])
ts.head()

Unnamed: 0_level_0,DOW,Daily,Forecast,Percent Diff
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-04-26,Friday,"$125,789.89","$235,036.46",-46.48%
2021-04-27,Saturday,"$99,374.01","$197,622.55",-49.72%
2021-04-28,Sunday,"$82,203.16","$116,991.26",-29.74%
2021-04-29,Monday,"$33,530.26","$66,652.65",-49.69%
2021-04-30,Tuesday,"$30,105.24","$34,828.19",-13.56%


In [11]:
ts = pd.read_csv(filepath,
                 header=0, 
                 parse_dates=[0], 
                 index_col=0,
                 infer_datetime_format=True, 
                 usecols=['Date', 'DOW', 'Daily', 'Forecast', 'Percent Diff'])

#### Reading data from an Excel File