# Year-to-Date (YTD) Order Data Processing

This notebook focuses on processing order data to generate a consolidated Year-to-Date (YTD) dataset. The steps involved include data loading, transformations, for possibly further analysis or model training. Throughout the process, we use the Kensu Provider to monitor and track data lineage, ensuring transparency in data operations.

## Kensu Provider Initialization

Kensu provides a way to monitor and track data lineage, ensuring that data processes are transparent and traceable. Initializing Kensu is the first step in this process. Once initialized, it will automatically track operations performed using its integrated libraries.

In [None]:
# Initialize the KensuProvider to start tracking data operations
from kensu.utils.kensu_provider import KensuProvider
K = KensuProvider().initKensu()

## Data Loading

The order data for January and February is loaded in this section. The `kensu-pandas` library, a tracked version of pandas, is used for this purpose. By using kensu-pandas, each data operation becomes traceable, ensuring transparency in data processing.

In [None]:
# Load order data for January and February
import kensu.pandas as pd
df_jan = pd.read_csv("../data/jan/orders.csv",parse_dates=['date'])
df_feb = pd.read_csv("../data/feb/orders.csv",parse_dates=['date'])

## Data Transformation

Once the data is loaded, it may require some transformations to be in the desired format. Here, columns are renamed for consistency, and datasets for different months are concatenated to produce the Year-to-Date (YTD) data. This consolidated data provides a comprehensive view of the orders up to the current month.

In [None]:
# Rename columns for consistency and concatenate datasets to get YTD data
df_feb = df_feb.rename({'email_customer':'email'},axis=1)
data_YTD = pd.concat([df_jan,df_feb])
data_YTD.to_csv("../data/ytd/ordersYTD.csv",index=False)