# ZEVA Data Storytelling Project – Q3 2025  

**Data Source:**  
All data was collected from the ZEVA backend for **Q3 2025 (July, August, September)**.  

---

## Objective  
The primary objective of this project is to deliver **Phase 1: Driving-Related Performance Insights**. This phase focuses on:  

- **Daily Total Mileage per Vehicle**  
  Measuring how far each vehicle travels on a daily basis.  

- **Travel Patterns per User**  
  Analyzing trip lengths, travel times, and trip frequency for each user.  

These insights will support **internal decision-making** and can also be leveraged for **external communication and marketing initiatives**.  

---

## Phase 1 – Methodology  

### 1. Data Fetching  
- Conducted using the script `pull_vehicle_data.py`.  
- Extracted raw trip and vehicle telemetry data for the Q3 period.  

### 2. Data Cleaning  
- Established relationships between the **auth** tables and corresponding **client** tables.  
- Correctly mapped vehicles to clients to ensure accuracy.  
- Standardized timestamps, vehicle IDs, and user mappings for consistency across datasets.  

### 3. Data Structuring  
- Built curated datasets for analysis, including:  
  - **Vehicle Distribution:** Distribution of vehicle models on the platform, used for assessing data quality.  
  - **Mileage Dataset:** Daily aggregated mileage per vehicle.  
  - **User Travel Dataset:** Trip patterns per user, including distribution of trip lengths and start/end times.  
  - **Driving Behavior Dataset:** Derived metrics capturing driving styles, such as early-morning vs. late-night driving tendencies.  

These datasets form the foundation for visualization, storytelling, and newsletter-ready insights.  


In [1]:
# --- Core Data Handling ---
import pandas as pd
import numpy as np

# --- Date & Time Handling ---
import datetime as dt
import pytz

# --- Visualization ---
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt

# --- Display settings ---
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 100)

# --- Altair settings ---
alt.data_transformers.disable_max_rows()  # allow larger datasets

# --- Matplotlib style ---
plt.style.use("seaborn-v0_8")


In [4]:
# I have downloaded all the necessary and related data under the `pull_vehicle_data.py`
user_company = pd.read_csv("user_company/user_company.csv", sep = ';')
user_company

Unnamed: 0,userprofile_id,user_firstname,user_lastname,user_country,user_gender,postal_code,company_id,company_db,company_type,company_city,company_country
0,2,Raymond,Reid,Canada,Male,M4V 3E1,1,zerocar,FM,Vancouver,Canada
1,14950,Yao,Li,,,,13894,individuals,IN,,
2,2692,Reham,Laythi,United States,Female,92618,1930,individuals,IN,,
3,6667,Stephen,Bouchet,United States,Male,11706,5776,individuals,IN,,
4,7,Chen,jia,Canada,Male,100000,1,zerocar,FM,Vancouver,Canada
...,...,...,...,...,...,...,...,...,...,...,...
1124,27490,Matt,Jennings,,,,26467,individuals,IN,,
1125,27523,KEYU,SUN,,,,26500,individuals,IN,,
1126,27556,Lloyd,Sims,United States,,,26533,individuals,IN,,
1127,27589,Maxime,Laberge,Canada,,,26566,individuals,IN,,


In [None]:

# Filter the DataFrame to exclude rows with 'terminated' status
filtered_df = company_full_list[company_full_list['status'] != 'TERMINATED']

# Define color scheme
color_scheme = alt.Scale(domain=['FM', 'IN'], range=['#143987', '#FF653E'])

# Create a bar chart using Altair to show the distribution of 'datatype'
base = alt.Chart(filtered_df).encode(
    x=alt.X('data_type', title='User Type'),
    y=alt.Y('count()', title='# of Users')
)

# Create a bar chart
bars = base.mark_bar().encode(
    color=alt.Color('data_type', scale=color_scheme, title='User Type')
)

# Add text labels
text = base.mark_text(
    align='center',
    baseline='middle',
    dy=-10,  # Adjust this value to position the text labels
    font='Avenir',
    fontSize=10
).encode(
    text='count()'
)

# Combine the bar chart and text labels
chart = (bars + text).properties(
    # title=alt.TitleParams(text='Distribution of Fleet Managers and Individual Users (Status Active)', font='Avenir', fontSize=16, anchor='middle', color='black'),
)

# Display the chart
chart.show()

# Save the chart as a high-resolution PNG
chart.save('Result_Graph/user_type_distribution.png')


In [6]:
# Count the valid user
user_profile = pd.read_csv("user_company/user_profile.csv", sep = ';')
user_profile.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1306 entries, 0 to 1305
Data columns (total 31 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   id                             1306 non-null   int64  
 1   password                       256 non-null    object 
 2   is_superuser                   1306 non-null   bool   
 3   first_name                     1112 non-null   object 
 4   last_name                      1111 non-null   object 
 5   is_staff                       1306 non-null   bool   
 6   is_active                      1306 non-null   bool   
 7   date_joined                    1306 non-null   object 
 8   email                          1306 non-null   object 
 9   last_login                     149 non-null    object 
 10  phone_number                   47 non-null     object 
 11  department_id                  0 non-null      float64
 12  email_verified                 1306 non-null   b