# **Waze Project**
**Course 2 - Get Started with Python**

Welcome to the Waze Project!

Your Waze data analytics team is still in the early stages of their user churn project. Previously, you were asked to complete a project proposal by your supervisor, May Santner. You have received notice that your project proposal has been approved and that your team has been given access to Waze's user data. To get clear insights, the user data must be inspected and prepared for the upcoming process of exploratory data analysis (EDA).

A Python notebook has been prepared to guide you through this project. Answer the questions and create an executive summary for the Waze data team.

# **Course 2 End-of-course project: Inspect and analyze data**

In this activity, you will examine data provided and prepare it for analysis. This activity will help ensure the information is,

1.   Ready to answer questions and yield insights

2.   Ready for visualizations

3.   Ready for future hypothesis testing and statistical methods
<br/>

**The purpose** of this project is to investigate and understand the data provided.

**The goal** is to use a dataframe contructed within Python, perform a cursory inspection of the provided dataset, and inform team members of your findings.
<br/>

*This activity has three parts:*

**Part 1:** Understand the situation
* How can you best prepare to understand and organize the provided information?

**Part 2:** Understand the data

* Create a pandas dataframe for data learning, future exploratory data analysis (EDA), and statistical activities

* Compile summary information about the data to inform next steps

**Part 3:** Understand the variables

* Use insights from your examination of the summary data to guide deeper investigation into variables


<br/>

Follow the instructions and answer the following questions to complete the activity. Then, you will complete an Executive Summary using the questions listed on the PACE Strategy Document.

Be sure to complete this activity before moving on. The next course item will provide you with a completed exemplar to compare to your own work.



# **Identify data types and compile summary information**


<img src="images/Pace.png" width="100" height="100" align=left>

# **PACE stages**

Throughout these project notebooks, you'll see references to the problem-solving framework, PACE. The following notebook components are labeled with the respective PACE stages: Plan, Analyze, Construct, and Execute.

<img src="images/Plan.png" width="100" height="100" align=left>


## **PACE: Plan**

Consider the questions in your PACE Strategy Document and those below to craft your response:

### **Task 1. Understand the situation**

*   How can you best prepare to understand and organize the provided driver data?


*Begin by exploring your dataset and consider reviewing the Data Dictionary.*

Review the Data Dictionary: This will help in understanding the meaning of each column, the data types, and the context of the data provided.

Initial Data Exploration: Load the dataset into a pandas dataframe to inspect the first few rows and get an overview of the structure.

Identify Data Types: Check the data types of each column to confirm they are appropriate for the type of data they contain. This will help in identifying any data type conversions needed.

Check for Missing Values: Identify if there are any missing or null values that need to be addressed.

Summarize Basic Statistics: Use descriptive statistics (mean, median, min, max, standard deviation) to understand data distribution and identify potential anomalies.

<img src="images/Analyze.png" width="100" height="100" align=left>

## **PACE: Analyze**

Consider the questions in your PACE Strategy Document to reflect on the Analyze stage.

### **Task 2a. Imports and data loading**

Start by importing the packages that you will need to load and explore the dataset. Make sure to use the following import statements:

*   `import pandas as pd`

*   `import numpy as np`


In [1]:
# Import packages for data manipulation
import pandas as pd
import numpy as np


Then, load the dataset into a dataframe. Creating a dataframe will help you conduct data manipulation, exploratory data analysis (EDA), and statistical activities.

**Note:** As shown in this cell, the dataset has been automatically loaded in for you. You do not need to download the .csv file, or provide more code, in order to access the dataset and proceed with this lab. Please continue with this activity by completing the following instructions.

In [None]:
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

### **Task 2b. Summary information**

View and inspect summary information about the dataframe by **coding the following:**

1.   df.head(10)
2.   df.info()

*Consider the following questions:*

1. When reviewing the `df.head()` output, are there any variables that have missing values?

2. When reviewing the `df.info()` output, what are the data types? How many rows and columns do you have?

3. Does the dataset have any missing values?

In [73]:
# Import packages for data manipulation
import pandas as pd
import numpy as np
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')
df.head(10)

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android
5,5,retained,113,103,279.544437,2637,0,0,901.238699,439.101397,15,11,iPhone
6,6,retained,3,2,236.725314,360,185,18,5249.172828,726.577205,28,23,iPhone
7,7,retained,39,35,176.072845,2999,0,0,7892.052468,2466.981741,22,20,iPhone
8,8,retained,57,46,183.532018,424,0,26,2651.709764,1594.342984,25,20,Android
9,9,churned,84,68,244.802115,2997,72,0,6043.460295,2341.838528,7,3,iPhone


In [3]:
### YOUR CODE HERE ###
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   ID                       14999 non-null  int64  
 1   label                    14299 non-null  object 
 2   sessions                 14999 non-null  int64  
 3   drives                   14999 non-null  int64  
 4   total_sessions           14999 non-null  float64
 5   n_days_after_onboarding  14999 non-null  int64  
 6   total_navigations_fav1   14999 non-null  int64  
 7   total_navigations_fav2   14999 non-null  int64  
 8   driven_km_drives         14999 non-null  float64
 9   duration_minutes_drives  14999 non-null  float64
 10  activity_days            14999 non-null  int64  
 11  driving_days             14999 non-null  int64  
 12  device                   14999 non-null  object 
dtypes: float64(3), int64(8), object(2)
memory usage: 1.5+ MB


ĐÁNH GIÁ:
When reviewing the df.head() output, there 700 variables of 'label' column that have missing values
When reviewing the df.info() output, the data types are: float64(3), int64(8), object(2), with 14999 rows and 13 columns

### **Task 2c. Null values and summary statistics**

Compare the summary statistics of the 700 rows that are missing labels with summary statistics of the rows that are not missing any values.

**Question:** Is there a discernible difference between the two populations?


In [47]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Isolate rows with null values
null_rows = df[df['label'].isnull()]

# Isolate rows with non-null values
non_null_rows = df[df['label'].notnull()]

# Display summary stats of rows with null values
#print(f'\nNull:\n {null_rows.describe()}')

# Display summary stats of rows with non-null values
#print(f'\nNon-null:\n {non_null_rows.describe()}')

# Dùng NumPy boolean indexing để tìm outliers của null_rows
null_rows_mean = null_rows.mean()
null_rows_std = null_rows.std()
null_rows_outliers = null_rows[np.abs(null_rows - null_rows_mean) > 2 * null_rows_std]

# Dùng NumPy boolean indexing để tìm outliers của non_null_rows
non_null_rows_mean = non_null_rows.mean()
non_null_rows_std = non_null_rows.std()
non_null_rows_outliers = non_null_rows[np.abs(non_null_rows - non_null_rows_mean) > 2 * non_null_rows_std]

# So sánh 2 nhóm Null và Non-null (tìm điểm khác biệt về mean, outliers rồi đánh giá)
print(f'\nNull rows mean:\n{null_rows_mean}')
print(f'\nNon-null rows mean:\n{non_null_rows_mean}')
print(f'\nNull rows outliers:\n {null_rows_outliers.sum()}') 
print(f'\nNon-null rows outliers:\n {non_null_rows_outliers.sum()}') 



Null rows mean:
ID                         7405.584286
label                              NaN
sessions                     80.837143
drives                       67.798571
total_sessions              198.483348
n_days_after_onboarding    1709.295714
total_navigations_fav1      118.717143
total_navigations_fav2       30.371429
driven_km_drives           3935.967029
duration_minutes_drives    1795.123358
activity_days                15.382857
driving_days                 12.125714
dtype: float64

Non-null rows mean:
ID                         7503.573117
sessions                     80.623820
drives                       67.255822
total_sessions              189.547409
n_days_after_onboarding    1751.822505
total_navigations_fav1      121.747395
total_navigations_fav2       29.638296
driven_km_drives           4044.401535
duration_minutes_drives    1864.199794
activity_days                15.544653
driving_days                 12.182530
dtype: float64

Null rows outliers:
 ID           

ĐÁNH GIÁ:
Về Mean (Trung bình):
Hai nhóm có sự chênh lệch nhỏ, nhưng Non-null nhỉnh hơn ở các chỉ số liên quan đến thời gian sử dụng và mức độ hoạt động.

Về Outliers (Giá trị cực trị):
Non-null vượt trội hơn rất nhiều với các giá trị cực trị cao gấp hàng chục lần.
Điều này cho thấy nhóm Non-null đa dạng hơn về hành vi người dùng và chứa nhiều người dùng tích cực hơn.

Giải Thích Khả Thi:
Nhóm Null có thể bao gồm người dùng ít hoạt động hơn hoặc dữ liệu bị thiếu sót.
Nhóm Non-null cho thấy rõ ràng hơn về những người dùng trung thành và hoạt động mạnh.

### **Task 2d. Null values - device counts**

Next, check the two populations with respect to the `device` variable.

**Question:** How many iPhone users had null values and how many Android users had null values?

In [52]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Lọc các hàng mà 'label' bị null và thiết bị là iPhone
iphone_null_rows = df[(df['device'] == 'iPhone') & (df['label'].isnull())]

# Lọc các hàng mà 'label' bị null và thiết bị là Android
android_null_rows = df[(df['device'] == 'Android') & (df['label'].isnull())]

# Đếm số lượng người dùng bị null trên từng loại thiết bị
print(f'\nSố lượng iPhone null rows: {len(iphone_null_rows)}')
print(f'\nSố lượng Android null rows: {len(android_null_rows)}')


Số lượng iPhone null rows: 447

Số lượng Android null rows: 253


KẾT QUẢ:
Số lượng iPhone null rows: 447
Số lượng Android null rows: 253

Now, of the rows with null values, calculate the percentage with each device&mdash;Android and iPhone. You can do this directly with the [`value_counts()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html) function.

In [72]:
# Calculate % of iPhone nulls and Android nulls
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Lọc các hàng mà 'label' bị null và thiết bị là iPhone
iphone_null_rows = df[(df['device'] == 'iPhone') & (df['label'].isnull())]

# Lọc các hàng mà 'label' bị null và thiết bị là Android
android_null_rows = df[(df['device'] == 'Android') & (df['label'].isnull())]

# Lọc các hàng có 'label' bị null
null_rows = df[df['label'].isnull()]

# Tính phần trăm số lượng từng loại device trong nhóm null bằng hàm value_counts()
null_device_percentage = null_rows['device'].value_counts(normalize=True).round(4) * 100
print(f'\nDevice null percentage:\n{null_device_percentage}')

# Cách khác (cơ bản hơn)
ip_null_device_percentage = round(((len(iphone_null_rows)) / (len(null_rows))) *100,2)
andr_null_device_percentage = round(((len(android_null_rows)) / (len(null_rows))) *100,2)
print(f'\nDevice null percentage:\niPhone: {ip_null_device_percentage}\nAndroid: {andr_null_device_percentage}')



Device null percentage:
iPhone     63.86
Android    36.14
Name: device, dtype: float64

Device null percentage:
iPhone: 63.86
Android: 36.14


How does this compare to the device ratio in the full dataset?

In [65]:
# Calculate % of iPhone users and Android users in full dataset
total_rows = len(df)
iphone_percentage = round((len(df[df['device'] == 'iPhone']) / total_rows) * 100, 2)
android_percentage = round((len(df[df['device'] == 'Android']) / total_rows) * 100, 2)

print(f'iPhone: {iphone_percentage}%')
print(f'Android: {android_percentage}%')

iPhone: 64.48%
Android: 35.52%


ĐÁNH GIÁ:
Tỷ lệ iPhone và Android trong nhóm null và toàn dataset gần như tương đồng:
    iPhone: Chênh lệch 0.62% (64.48% tổng dataset vs. 63.86% trong nhóm null).
    Android: Chênh lệch 0.62% (35.52% tổng dataset vs. 36.14% trong nhóm null).
Không có sự chênh lệch đáng kể về tỷ lệ thiết bị giữa hai nhóm:
    Cả hai nhóm (null và toàn bộ dataset) đều có tỷ lệ iPhone và Android gần giống nhau.
    Điều này cho thấy rằng loại thiết bị không phải là yếu tố chính ảnh hưởng đến dữ liệu bị thiếu (null trong label).

Examine the counts and percentages of users who churned vs. those who were retained. How many of each group are represented in the data?

In [3]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Lọc các hàng mà 'label' là non-null (gồm giá trị 'churned' và 'retained')
non_null_rows = df[df['label'].notnull()]

# Tính phần trăm số lượng từng loại trong nhóm non-null của cột 'label' bằng hàm value_counts()
non_null_percentage = non_null_rows['label'].value_counts(normalize=True).round(2) * 100
print(f'\nNon-null percentage:\n{non_null_percentage}')


Non-null percentage:
retained    82.0
churned     18.0
Name: label, dtype: float64


This dataset contains 82% retained users and 18% churned users.

Next, compare the medians of each variable for churned and retained users. The reason for calculating the median and not the mean is that you don't want outliers to unduly affect the portrayal of a typical user. Notice, for example, that the maximum value in the `driven_km_drives` column is 21,183 km. That's more than half the circumference of the earth!

In [6]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Phân tách dữ liệu thành hai dataframe: 'Retained' và 'Churned''
retained_df = df[df['label'] == 'retained']
churned_df = df[df['label'] == 'churned']
print(f'\nretained_df_median: {retained_df.median()}')
print(f'\nchurned_df_median: {churned_df.median()}')


retained_df_median: ID                         7509.000000
sessions                     56.000000
drives                       47.000000
total_sessions              157.586756
n_days_after_onboarding    1843.000000
total_navigations_fav1       68.000000
total_navigations_fav2        9.000000
driven_km_drives           3464.684614
duration_minutes_drives    1458.046141
activity_days                17.000000
driving_days                 14.000000
dtype: float64

churned_df_median: ID                         7477.500000
sessions                     59.000000
drives                       50.000000
total_sessions              164.339042
n_days_after_onboarding    1321.000000
total_navigations_fav1       84.500000
total_navigations_fav2       11.000000
driven_km_drives           3652.655666
duration_minutes_drives    1607.183785
activity_days                 8.000000
driving_days                  6.000000
dtype: float64


This offers an interesting snapshot of the two groups, churned vs. retained:

Users who churned averaged ~3 more drives in the last month than retained users, but retained users used the app on over twice as many days as churned users in the same time period.

The median churned user drove ~200 more kilometers and 2.5 more hours during the last month than the median retained user.

It seems that churned users had more drives in fewer days, and their trips were farther and longer in duration. Perhaps this is suggestive of a user profile. Continue exploring!

Calculate the median kilometers per drive in the last month for both retained and churned users.

Begin by dividing the `driven_km_drives` column by the `drives` column. Then, group the results by churned/retained and calculate the median km/drive of each group.

In [12]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Add a column to df called `km_per_drive`
df['km_per_drive'] = df['driven_km_drives'] / df['drives']

# Group by `label`, calculate the median, and isolate for km per drive
df.groupby('label')['km_per_drive'].median(numeric_only=True)

label
churned     74.109416
retained    75.014702
Name: km_per_drive, dtype: float64

The median retained user drove about one more kilometer per drive than the median churned user. How many kilometers per driving day was this?

To calculate this statistic, repeat the steps above using `driving_days` instead of `drives`.

In [13]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Add a column to df called `km_per_driving_day`
df['km_per_driving_day'] = df['driven_km_drives'] / df['driving_days']

# Group by `label`, calculate the median, and isolate for km per driving day
df.groupby('label')['km_per_driving_day'].median(numeric_only=True)

label
churned     697.541999
retained    289.549333
Name: km_per_driving_day, dtype: float64

Now, calculate the median number of drives per driving day for each group.

In [14]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Add a column to df called `drives_per_driving_day`
df['drives_per_driving_day'] = df['drives'] / df['driving_days']

# Group by `label`, calculate the median, and isolate for drives per driving day
df.groupby('label')['drives_per_driving_day'].median(numeric_only=True)

label
churned     10.0000
retained     4.0625
Name: drives_per_driving_day, dtype: float64

The median user who churned drove 698 kilometers each day they drove last month, which is almost ~240% the per-drive-day distance of retained users. The median churned user had a similarly disproporionate number of drives per drive day compared to retained users.

It is clear from these figures that, regardless of whether a user churned or not, the users represented in this data are serious drivers! It would probably be safe to assume that this data does not represent typical drivers at large. Perhaps the data&mdash;and in particular the sample of churned users&mdash;contains a high proportion of long-haul truckers.

In consideration of how much these users drive, it would be worthwhile to recommend to Waze that they gather more data on these super-drivers. It's possible that the reason for their driving so much is also the reason why the Waze app does not meet their specific set of needs, which may differ from the needs of a more typical driver, such as a commuter.

Finally, examine whether there is an imbalance in how many users churned by device type.

Begin by getting the overall counts of each device type for each group, churned and retained.

In [26]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Phân tách dữ liệu thành hai dataframe: 'Retained' và 'Churned''
retained_df = df[df['label'] == 'retained']
churned_df = df[df['label'] == 'churned']

# For each label, calculate the number of Android users and iPhone users
print(f"\nRetained_device_user:\n{retained_df['device'].value_counts()}")
print(f"\nChurned_device_user:\n{churned_df['device'].value_counts()}")




Retained_device_user:
iPhone     7580
Android    4183
Name: device, dtype: int64

Churned_device_user:
iPhone     1645
Android     891
Name: device, dtype: int64


Now, within each group, churned and retained, calculate what percent was Android and what percent was iPhone.

In [46]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

# Phân tách dữ liệu thành hai dataframe: 'Retained' và 'Churned''
retained_df = df[df['label'] == 'retained']
churned_df = df[df['label'] == 'churned']

# For each label, calculate the percentage of Android users and iPhone users
print(f"\nRetained_device_rate:\n{retained_df['device'].value_counts(normalize=True).round(2)*100}")
print(f"\nChurned_device_rate:\n{churned_df['device'].value_counts(normalize=True).round(2)*100}")




Retained_device_rate:
iPhone     64.0
Android    36.0
Name: device, dtype: float64

Churned_device_rate:
iPhone     65.0
Android    35.0
Name: device, dtype: float64


The ratio of iPhone users and Android users is consistent between the churned group and the retained group, and those ratios are both consistent with the ratio found in the overall dataset.

<img src="images/Construct.png" width="100" height="100" align=left>

## **PACE: Construct**

**Note**: The Construct stage does not apply to this workflow. The PACE framework can be adapted to fit the specific requirements of any project.



<img src="images/Execute.png" width="100" height="100" align=left>

## **PACE: Execute**

Consider the questions in your PACE Strategy Document and those below to craft your response:

### **Task 3. Conclusion**

Recall that your supervisor, May Santer, asked you to share your findings with the data team in an executive summary. Consider the following questions as you prepare to write your summary. Think about key points you may want to share with the team, and what information is most relevant to the user churn project.

**Questions:**

1. Did the data contain any missing values? How many, and which variables were affected? Was there a pattern to the missing data?

2. What is a benefit of using the median value of a sample instead of the mean?

3. Did your investigation give rise to further questions that you would like to explore or ask the Waze team about?

4. What percentage of the users in the dataset were Android users and what percentage were iPhone users?

5. What were some distinguishing characteristics of users who churned vs. users who were retained?

6. Was there an appreciable difference in churn rate between iPhone users vs. Android users?





1. Missing Values trong Dataset
Cột label là cột duy nhất bị thiếu, với 700 giá trị null.
Các cột khác không có giá trị thiếu.
Không có mẫu (pattern) rõ ràng nào về missing data trong các cột khác, nên vấn đề tập trung ở cột label.

2. Lợi ích của việc sử dụng giá trị trung vị (median) thay vì trung bình (mean)?
Median ít bị ảnh hưởng bởi các giá trị cực trị (outliers).
Trong dataset, ví dụ như cột driven_km_drives có giá trị tối đa lên đến 21,183 km, là một giá trị rất lớn và dễ làm lệch giá trị mean. Nếu dùng median, ta sẽ có một đại diện chính xác hơn cho "người dùng điển hình".

3. Trong quá trình phân tích, có câu hỏi nào khác cần đặt ra với nhóm Waze không?
Tại sao một số người dùng có hoạt động rất tích cực nhưng vẫn churn (rời bỏ)?
Liệu có yếu tố về khu vực địa lý, thời gian sử dụng ứng dụng hoặc phiên bản ứng dụng liên quan đến churn không?
Có chiến lược nào đang được áp dụng để giữ chân người dùng có activity_days thấp không?
Những người dùng đã từng churn có bao giờ quay lại sử dụng ứng dụng không?

4. Tỷ lệ Người Dùng Android và iPhone
iPhone: 64.48%
Android: 35.52%
Điều này cho thấy phần lớn người dùng trong dataset là iPhone.

5. Đặc điểm khác biệt giữa người dùng churn và retained là gì?
Retained users:
Có thời gian sử dụng lâu hơn (n_days_after_onboarding median là 1843 ngày vs. 1321 ngày ở churned).
Tần suất hoạt động cao hơn (activity_days và driving_days cao gấp đôi so với churned users).
Hoạt động đều đặn và duy trì lâu dài với ứng dụng.

Churned users:
Có xu hướng sử dụng nhiều hơn trong ngắn hạn, như số phiên (sessions), số lần lái xe (drives) và điều hướng (total_navigations_fav1 và fav2) đều cao hơn.
Tuy nhiên, số ngày hoạt động thực tế lại ít hơn và thời gian gắn bó ngắn hơn.

6. Có sự khác biệt đáng kể nào về tỷ lệ churn rate giữa người dùng iPhone và Android không?
Tỷ lệ người dùng iPhone và Android trong hai nhóm gần như tương đương, với sự chênh lệch chỉ 1%.

iPhone: Tăng từ 64% (retained) lên 65% (churned).
Android: Giảm từ 36% (retained) xuống 35% (churned).

Kết luận sơ bộ:
Sự khác biệt chỉ 1% là không đáng kể về mặt thống kê.
Điều này cho thấy loại thiết bị không phải là yếu tố chính gây ra churn.
Người dùng iPhone và Android có retain rate và churn rate rất đồng đều, phản ánh rằng churn xảy ra ngẫu nhiên hơn là bị ảnh hưởng bởi loại thiết bị.

**Congratulations!** You've completed this lab. However, you may not notice a green check mark next to this item on Coursera's platform. Please continue your progress regardless of the check mark. Just click on the "save" icon at the top of this notebook to ensure your work has been logged.