### 2.1 Load and Explore Time Series Data

- Load Time Series Data ('daily-total-female-births.csv')
- Explore the data--first 10 observations, size..
- Query your data by time

In [4]:
import pandas as pd
data=pd.read_csv("daily-total-female-births.csv",parse_dates=["Date"])
print(data.head(10))

print("\nSize of the dataset:",data.shape)

data.set_index("Date",inplace=True)

        Date  Births
0 1959-01-01      35
1 1959-01-02      32
2 1959-01-03      30
3 1959-01-04      31
4 1959-01-05      44
5 1959-01-06      29
6 1959-01-07      45
7 1959-01-08      43
8 1959-01-09      38
9 1959-01-10      27

Size of the dataset: (365, 2)


### 2.2 Date-Time Features
- Date Time Features: these are components of the time step itself for each observation.
- Lag Features: these are values at prior time steps.
- Window Features: these are a summary of values over a fixed window of prior time steps.

- Use the dataset: 'daily-minimum-temperatures.csv'

- Practice on datetime and rolling function using "apple.csv" dataset


** Exercise from https://www.geeksforgeeks.org/python-pandas-dataframe-rolling/

Pandas dataframe.rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it. A window of size k means k consecutive values at a time. In a very simple case, all the ‘k’ values are equally weighted.

In [6]:
import pandas as pd

# Veriyi yükleyin ve 'Date' sütununu tarih formatına çevirin
temperature_data = pd.read_csv("daily-minimum-temperatures.csv", parse_dates=["Date"])

# 'Date' sütunundan yıl, ay ve gün özelliklerini çıkartın
temperature_data["Year"] = temperature_data["Date"].dt.year
temperature_data["Month"] = temperature_data["Date"].dt.month
temperature_data["Day"] = temperature_data["Date"].dt.day

print("Date-Time Features:")
print(temperature_data.head())

# Lag Feature (Bir gün önceki sıcaklık)
temperature_data["Temp_Lag_1"] = temperature_data["Temp"].shift(1)

# Window Feature (Son 3 günün ortalama sıcaklığı)
temperature_data["Temp_Rolling_Mean_3"] = temperature_data["Temp"].rolling(window=3).mean()

print("\nWith Lag and Rolling Features:")
print(temperature_data.head(10))


Date-Time Features:
        Date  Temp  Year  Month  Day
0 1981-01-01  20.7  1981      1    1
1 1981-01-02  17.9  1981      1    2
2 1981-01-03  18.8  1981      1    3
3 1981-01-04  14.6  1981      1    4
4 1981-01-05  15.8  1981      1    5

With Lag and Rolling Features:
        Date  Temp  Year  Month  Day  Temp_Lag_1  Temp_Rolling_Mean_3
0 1981-01-01  20.7  1981      1    1         NaN                  NaN
1 1981-01-02  17.9  1981      1    2        20.7                  NaN
2 1981-01-03  18.8  1981      1    3        17.9            19.133333
3 1981-01-04  14.6  1981      1    4        18.8            17.100000
4 1981-01-05  15.8  1981      1    5        14.6            16.400000
5 1981-01-06  15.8  1981      1    6        15.8            15.400000
6 1981-01-07  15.8  1981      1    7        15.8            15.800000
7 1981-01-08  17.4  1981      1    8        15.8            16.333333
8 1981-01-09  21.8  1981      1    9        17.4            18.333333
9 1981-01-10  20.0  1981  

### 2.3 Resampling and Interpolation
- Upsampling: Where you increase the frequency of the samples, such as from minutes to seconds.
- Downsampling: Where you decrease the frequency of the samples, such as from days to months.
- use the dataset: 'shampoo-sales.csv'

In [None]:
import pandas as pd

# Load the dataset and assume the year 1901 to parse dates correctly
shampoo_data = pd.read_csv("shampoo-sales.csv")
shampoo_data["Month"] = "1901-" + shampoo_data["Month"]  
shampoo_data["Month"] = pd.to_datetime(shampoo_data["Month"], format="%Y-%m-%d")
shampoo_data.set_index("Month", inplace=True)

# Display the first few rows of the original data
print("Original Data:")
print(shampoo_data.head())

# Upsampling: Resample the data to daily frequency
daily_data = shampoo_data.resample("D").asfreq()
print("\nDaily Data (After Upsampling):")
print(daily_data.head(10))

# Interpolation: Fill missing values after upsampling using linear interpolation
daily_data_interpolated = daily_data.interpolate(method='linear')
print("\nDaily Data with Interpolation:")
print(daily_data_interpolated.head(10))

# Downsampling: Resample the data to monthly frequency and calculate monthly sums
monthly_data = shampoo_data.resample("M").sum()
print("\nMonthly Total Sales Data (After Downsampling):")
print(monthly_data.head())


Original Data:
            Sales
Month            
1901-01-01  266.0
1901-01-02  145.9
1901-01-03  183.1
1901-01-04  119.3
1901-01-05  180.3

Daily Data (After Upsampling):
            Sales
Month            
1901-01-01  266.0
1901-01-02  145.9
1901-01-03  183.1
1901-01-04  119.3
1901-01-05  180.3
1901-01-06  168.5
1901-01-07  231.8
1901-01-08  224.5
1901-01-09  192.8
1901-01-10  122.9

Daily Data with Interpolation:
            Sales
Month            
1901-01-01  266.0
1901-01-02  145.9
1901-01-03  183.1
1901-01-04  119.3
1901-01-05  180.3
1901-01-06  168.5
1901-01-07  231.8
1901-01-08  224.5
1901-01-09  192.8
1901-01-10  122.9

Monthly Total Sales Data (After Downsampling):
             Sales
Month             
1901-01-31  2357.5
1901-02-28  3153.5
1901-03-31  5742.6




### 2.4 Moving Average Smoothing
- Centered Moving Average
- Trailing Moving Average
- Use the dataset: 'daily-total-female-births.csv'


In [12]:
import pandas as pd

# Load the dataset and parse 'Date' column as date
birth_data = pd.read_csv("daily-total-female-births.csv", parse_dates=["Date"])

# Set 'Date' as the index
birth_data.set_index("Date", inplace=True)

# Display original data
print("Original Data:")
print(birth_data.head())

# Centered Moving Average: 7-day window centered on each point
birth_data["Centered_MA"] = birth_data["Births"].rolling(window=7, center=True).mean()
print("\nCentered Moving Average (7-day window):")
print(birth_data[["Births", "Centered_MA"]].head(10))

# Trailing Moving Average: 7-day trailing window
birth_data["Trailing_MA"] = birth_data["Births"].rolling(window=7).mean()
print("\nTrailing Moving Average (7-day window):")
print(birth_data[["Births", "Trailing_MA"]].head(10))


Original Data:
            Births
Date              
1959-01-01      35
1959-01-02      32
1959-01-03      30
1959-01-04      31
1959-01-05      44

Centered Moving Average (7-day window):
            Births  Centered_MA
Date                           
1959-01-01      35          NaN
1959-01-02      32          NaN
1959-01-03      30          NaN
1959-01-04      31    35.142857
1959-01-05      44    36.285714
1959-01-06      29    37.142857
1959-01-07      45    36.714286
1959-01-08      43    37.714286
1959-01-09      38    36.142857
1959-01-10      27    39.857143

Trailing Moving Average (7-day window):
            Births  Trailing_MA
Date                           
1959-01-01      35          NaN
1959-01-02      32          NaN
1959-01-03      30          NaN
1959-01-04      31          NaN
1959-01-05      44          NaN
1959-01-06      29          NaN
1959-01-07      45    35.142857
1959-01-08      43    36.285714
1959-01-09      38    37.142857
1959-01-10      27    36.714286
