#### Exploratory Data Analysis (EDA) - Initial Descriptive Statistics

Dataset: 
- _fs_feature.csv_

Author: Luis Sergio Pastrana Lemus  
Date: 2025-09-09

# Exploratory Data Analysis – Food Supplier Dataset

## __1. Libraries__.

In [1]:
from pathlib import Path
import sys

# Define project root dynamically, gets the current directory from which the notebook belongs and moves one level upper
project_root = Path.cwd().parent

# Add src to sys.path if it is not already
if str(project_root) not in sys.path:

    sys.path.append(str(project_root))

# Import function directly (more controlled than import *)
from src import *


from IPython.display import display, HTML
import os
import pandas as pd
import numpy as np

## __2. Path to Data file__.

In [2]:
# Build route to data file and upload
data_file_path = project_root / "data" / "processed" / "feature"

df_fs = load_dataset_from_csv(data_file_path, "fs_feature.csv", header='infer', parse_dates=['datetime'])


In [3]:
# Format notebook output
format_notebook()

## __3. Exploratory Data Analysis__.

### 3.0 Casting Data types.

In [4]:
# Casting dtypes
# df_fs 'eventname' to category
df_fs.loc[:, 'eventname'] = df_fs['eventname'].astype('category')
df_fs['eventname'].dtype

# dfs 'date' and 'time' to datetime
df_fs['date'] = pd.to_datetime(df_fs['date']).dt.date
df_fs['time'] = pd.to_datetime(df_fs['time'], format='%H:%M:%S').dt.time

In [5]:
df_fs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 243713 entries, 0 to 243712
Data columns (total 6 columns):
 #   Column        Non-Null Count   Dtype              
---  ------        --------------   -----              
 0   eventname     243713 non-null  object             
 1   deviceidhash  243713 non-null  int64              
 2   datetime      243713 non-null  datetime64[ns, UTC]
 3   expid         243713 non-null  int64              
 4   date          243713 non-null  object             
 5   time          243713 non-null  object             
dtypes: datetime64[ns, UTC](1), int64(2), object(3)
memory usage: 11.2+ MB


### 3.1  Descriptive Statistics.

#### 3.1.1 Descriptive statistics for Original datasets.

In [6]:
# Descriptive statistics for df_xxx_clean dataset
df_fs.describe(include='all')

Unnamed: 0,eventname,deviceidhash,datetime,expid,date,time
count,243713,243713.0,243713,243713.0,243713,243713
unique,5,,,,14,69824
top,mainscreenappear,,,,2019-08-01,12:29:55
freq,119101,,,,36141,19
mean,,4.627963e+18,2019-08-04 10:19:17.987665920+00:00,247.022161,,
min,,6888747000000000.0,2019-07-25 04:43:36+00:00,246.0,,
25%,,2.372212e+18,2019-08-02 14:36:45+00:00,246.0,,
50%,,4.623192e+18,2019-08-04 11:51:00+00:00,247.0,,
75%,,6.932517e+18,2019-08-06 06:56:24+00:00,248.0,,
max,,9.222603e+18,2019-08-07 21:15:17+00:00,248.0,,
