# Getting started with Pandas TimeSeries

This notebook is intended to introduce you to the basic Pandas DateTime    
The following five points will be covered:

1. [Parsing DateTime](#task1)
2. [Aggregating columns](#task2)
3. [Extracting DateTime properties](#task3)
4. [Fitering and Selecting specific durations](#task4)
5. [Changing the granularity of the Timeseries](#task5)


### Prepare environment and read data 

In [1]:
# Constants 
INPUT_PATH = '/kaggle/input/netflix-shows/netflix_titles.csv'

# Libraries 
import pandas as pd 
import matplotlib.pyplot as plt

# Set default properties for plotting 
plt.rcParams['figure.figsize'] = [11, 4]
plt.rcParams['figure.dpi'] = 100 

In [2]:
# Read data and display 5 random entries 
raw_df = pd.read_csv(INPUT_PATH)

_____

## Task 1: Countthe number of shows added per day     

In the following section, we will parse the raw date format into      
pandas datetime and summarize the daily shows added to the total number 

### Parse timestamp into datetime column <a id='task1'></a>

Change the raw format to a pandas datetime format.    
Once we have changed the format as such, we will be able to    
apply more functionalities illustrated below 


In [3]:
df=raw_df.copy()
df["date_added"]=pd.to_datetime(df["date_added"])
show_Data=df.groupby("date_added")[["show_id"]].count()
show_Data=show_Data.rename({"show_id":"number of shows added per day"} ,axis=1)
print (show_Data)

            number of shows added per day
date_added                               
2008-01-01                              1
2008-02-04                              1
2009-05-05                              1
2009-11-18                              1
2010-11-01                              1
...                                   ...
2021-09-21                              5
2021-09-22                              9
2021-09-23                              2
2021-09-24                             10
2021-09-25                              1

[1714 rows x 1 columns]


### Count shows added  per date <a id='task2'></a>
All the shows have been listed in the original dataframe.     
Now let's count the total number of shows added per day

______

## Task 2: Extract the day name and sum-up the shows added 
<a id='task3'></a>

In the last step, we have used the `date_added` column to count the number of shows.    
Since we've used the `groupby` functionality to count the number of shows,     
the column is set as our index. 

We could now use our new index directly to extract the Attributes of the timestamp.    
One example of those Attributes is the `day_name`.    
Check out the [full list of the attributes here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html).

In [4]:
show_Data["Day_Name"]=show_Data.index.day_name()
total_per_weekday=show_Data.groupby("Day_Name")[["number of shows added per day"]].sum()
print(total_per_weekday)

           number of shows added per day
Day_Name                                
Friday                              2498
Monday                               851
Saturday                             816
Sunday                               751
Thursday                            1396
Tuesday                             1197
Wednesday                           1288


______

## Task 3: Select data from 2016 onwards 
<a id='task4'></a>

You can also use the regular masking way to select and filter entries.      
The syntax is even simpler than one could expect. You don't even need to parse    
your filtering criteria to `datetime`. A simple string with `%YYYY-%MM-%DD` format     
will do the job  


In [5]:
demo=show_Data.index >= "01-01-2016"
show_Data=show_Data[demo].copy()
print(show_Data)

            number of shows added per day   Day_Name
date_added                                          
2016-01-01                             33     Friday
2016-01-08                              1     Friday
2016-01-13                              1  Wednesday
2016-01-15                              2     Friday
2016-01-22                              1     Friday
...                                   ...        ...
2021-09-21                              5    Tuesday
2021-09-22                              9  Wednesday
2021-09-23                              2   Thursday
2021-09-24                             10     Friday
2021-09-25                              1   Saturday

[1609 rows x 2 columns]


______

## Task 4: Sum up weekly data 
<a id='task5'></a>

It is possible to change the granularity of your timeseries directly using Pandas datetie module.        
       
       
To do that, you need to specify two things: 
- Your new granularity passed as an argument to the `resample` function. [Read more details](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects)
- The function that will be used to generate the new granularity

In [6]:


# Use the resampling function to group data per week
weekly_data = (show_Data.
               resample('1W')   # For each week 
               .sum())          # Calculate the sum
print(weekly_data)


            number of shows added per day
date_added                               
2016-01-03                             33
2016-01-10                              1
2016-01-17                              3
2016-01-24                              2
2016-01-31                              4
...                                   ...
2021-08-29                             55
2021-09-05                             81
2021-09-12                             28
2021-09-19                             49
2021-09-26                             28

[300 rows x 1 columns]
