### AccelerateAI - Python for Data Science
##### Introduction to Python Language  (Python 3) 
In this notebook we will cover the following: 
- Date / Time / Time Series
- Handling time in Pandas
- Case 01: Covid-19 Analysis
- Case 02: Stock Price Analysis

In the next one we will look at:
- Processing JSON/Xls
- Web Scraping
- Relational Database
- Data Pipeline

##### 1. Date  & Time : 
- The Python standard library includes data types for date and time data, as well as calendar-related functionality. 
- The datetime.datetime type, or simply datetime, is widely used

In [None]:
#The datetime module supplies classes for manipulating dates and times.
from datetime import *

In [None]:
now = datetime.now()
print("Year:",now.year, "Month:", now.month, "Date:",now.day)

In [None]:
#Time difference - timedelta
timediff = datetime.now() - datetime(2020, 3, 22)        # days since lockdown
timediff

In [None]:
timediff.days , timediff.seconds, timediff.microseconds

In [None]:
#Date arithmetic
d = datetime(2021, 12, 31)
t = timedelta(days=10)    
d2 = d + t
d2.isoformat()

In [None]:
d2.weekday()                   # week starts from Monday: 0

In [None]:
today = datetime.now()
print('Week #:',today.isocalendar()[1])

In [None]:
#Formatting 
d2.strftime("%A, %d %B,%Y")                        #%b - Jan, %y - 22

In [None]:
#The dateutil module provides powerful extensions to the standard datetime module, available in Python.
from dateutil.parser import parse

In [None]:
date = parse('Mar 19, 2022 6:45 PM')              #can parsing almost any human-intelligible date representation
date

In [None]:
t1 = time(13,20,13,40)
print(t1)
print(type(t1))

In [None]:
import calendar
print(calendar.calendar(2022))

In [None]:
#Date time in Pandas
import pandas as pd 
import numpy as np 

##### 2  Handling time in Pandas

- Pandas contains extensive capabilities and features for working with time series data. 
- Using the NumPy datetime64 and timedelta64 dtypes, pandas has functionality for manipulating time series data.

Parsing time series information from various sources and formats using to_datetime(): 
Many input types are supported, and lead to different output types:
  - scalars can be int, float, str, datetime object.
       - They are converted to Timestamp when possible, otherwise they are converted to datetime.datetime. 
       - None/NaN/null scalars are converted to NaT.

  - array-like can contain int, float, str, datetime objects. 
      - They are converted to DatetimeIndex when possible, otherwise they are converted to Index with object dtype, containing datetime.datetime. 
      - None/NaN/null entries are converted to NaT in both cases.
  - Series:
      - They are converted to Series with datetime64 dtype when possible, otherwise they are converted to Series with object dtype, containing datetime.datetime. 
      - None/NaN/null entries are converted to NaT in both cases.
  - DataFrame/dict-like :
      - They are converted to Series with datetime64 dtype.

In [None]:
pd.to_datetime(10, unit="D")

In [None]:
pd.to_datetime([1, 2, 3], unit="D", origin=pd.Timestamp("2000-01-01"))

In [None]:
pd.to_datetime('24th of April, 2020')

In [None]:
datestrs = ['7/6/2011', '8/6/2011']
pd.to_datetime(datestrs)                        

In [None]:
pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')

In [None]:
# to_daterange() - Returns the range of equally spaced time points 
pd.date_range(start='1/1/2018', 
              end='1/08/2018', 
              freq='D')                   #freq = D- Calender day, B- business day, W- week, M- Month, Q - Quarter, H-Hour etc

In [None]:
# Creating custom date range 
start = datetime(2022, 3, 1)
end = datetime(2022, 4, 1)
weekmask = "Mon Wed Fri"
holidays = [datetime(2022, 3, 18), datetime(2022, 3, 24)]
pd.bdate_range(start, end, freq="C", weekmask=weekmask, holidays=holidays)

In [None]:
#Date arithmetic with pandas datetime 
today = pd.Timestamp("2022-03-19")
DayAfterTomorrow = today + pd.Timedelta("2 day")
DayAfterTomorrow.day_name()

In [None]:
nextBDay = today + pd.offsets.BDay()                  #next business day 
nextBDay.date()

In [None]:
#Indexing - One of the main uses for DatetimeIndex is as an index for pandas objects
timeindex = pd.date_range("2022-01-01", periods= 10, freq="5D")
ts = pd.Series(np.random.randn(len(timeindex)), index=timeindex)
ts

In [None]:
ts[1:5]                                 # row 1 to 4

In [None]:
ts[::3]                                 # every third row

In [None]:
ts["Jan-2022"]                          # intel inside ! 

In [None]:
ts["2022-01-16" : "2022-01-31"]         # includes the endpoints 

In [None]:
ts.truncate(before="2022-01-16", after="2022-01-31")

In [None]:
ts.shift(2)                        #creating lags and leads

In [None]:
ts.resample("10D").asfreq()

In [None]:
ts.resample("2D").asfreq()                     #What would happen here?

In [None]:
df = pd.DataFrame(
    np.random.rand(120, 3),
     index=pd.date_range("1/1/2012", freq="H", periods=120),
     columns=["A", "B", "C"]
     )

r = df.resample("D")
r.sum()

In [None]:
r["A"].agg([np.sum, np.mean, np.std])

In [None]:
r.agg({"A": np.sum, "B": np.mean})

#### 3.1  Case Study : Covid-19 Analysis

In [None]:
# Import libraries
import pandas as pd
import numpy as np

In [None]:
# Import data
coviddata = "https://api.covidtracking.com/v1/us/daily.csv"
covid_df = pd.read_csv(coviddata)

In [None]:
covid_df.info()

In [None]:
covid_df['date']= pd.to_datetime(covid_df['date'],format='%Y%m%d')

In [None]:
covid_df.head()

In [None]:
#flip the data 
df = covid_df[::-1]

In [None]:
# select columns to work with
df = df.loc[:,['date','states','positiveIncrease','deathIncrease','hospitalizedIncrease']] 
df.set_index('date',inplace=True)
df.sample(20)

In [None]:
df.plot()

In [None]:
#lets get the monthly averages 
monthly_df = df.resample("M").sum()

In [None]:
# install plot library
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

fig,ax = plt.subplots(figsize=(12,6))

ax.plot(monthly_df.index,monthly_df['positiveIncrease'],color='red',label='Avg Monthly infections')
ax.plot(monthly_df.index, monthly_df["deathIncrease"], color="blue",label='Avg Monthly deaths')

In [None]:
fig,ax = plt.subplots(figsize=(12,6))

ax.plot(monthly_df.index,monthly_df['positiveIncrease'],color='red',label='Avg Monthly infections')
ax.set_ylabel('Monthly Infections')

ax2=ax.twinx()
ax2.plot(monthly_df.index, monthly_df["deathIncrease"], color="blue",label='Avg Monthly deaths')

ax2.grid(False) # turn off grid for second Y axis
ax2.set_ylabel('Monthly deaths')

leg = ax.legend(loc='center', frameon=False,bbox_to_anchor=(0.5, -0.10))
leg2= ax2.legend(loc='center', frameon=False,bbox_to_anchor=(0.49, -0.15))

#### 3.2  Case Study : Stock Price Analysis

In [None]:
import pandas_datareader as pdr               #for retreiving stock prices

In [None]:
# Have a list of tech stock tickers
tickers = ['msft', 'aapl', 'tsla', 'nvda']
stockdata = pdr.get_data_yahoo(tickers)

In [None]:
df = stockdata.Close
df.head()

In [None]:
sns.set(style="whitegrid", rc={'figure.figsize':(12,8)})          #increase the figure size

sns.lineplot(data=df, linewidth=2, palette=['red', 'blue', 'green', 'aqua'])

In [None]:
data_fill = df.asfreq("D", method='ffill')
data_fill.head()

In [None]:
def pct_change (df):
    change = (df - df.shift(-1))/df
    return change


pchange_df = data_fill.apply(pct_change)
pchange_df.head()

In [None]:
sns.set_style(rc={'figure.figsize':(12,8)})
sns.heatmap(
        data=pchange_df.corr(),  # our correlation matrix
        linewidths=0.3,          # the width of lines separating the matrix squares
        square=True
)

In [None]:
# Generate Mask
mask = np.triu(np.ones_like(pchange_df.corr(), dtype=bool))
np.fill_diagonal(mask, False)  # keeps the diagonal
# Creat heatmap with same syntax but add a "mask" argument
sns.set_style(rc={'figure.figsize':(12,8)})
sns.heatmap(
        data=pchange_df.corr(),  # our correlation matrix
        linewidths=0.3,          # the width of lines separating the matrix squares
        square=True,
        mask=mask
)

In [None]:
sns.kdeplot(data=pchange_df)