# Introduction
Waze is a community-driven navigation app that provides real-time traffic information and route guidance. Developed by an Israeli company, Waze Mobile, it was acquired by Google in 2013. Unlike traditional GPS navigation systems, Waze relies on user-generated data to offer live updates on traffic conditions, road hazards, accidents, and other relevant information.

Key Features of Waze:
Real-Time Traffic Updates: Waze users (also known as Wazers) can report traffic jams, accidents, speed traps, and road closures, allowing the app to provide real-time updates and suggest alternate routes.
Community-Driven: The power of Waze lies in its active community of users who contribute information to keep maps accurate and up-to-date.
Navigation Assistance: Waze offers turn-by-turn voice navigation, lane guidance, and estimated time of arrival based on current traffic conditions.
Alerts and Notifications: Users receive alerts about upcoming road conditions, police activity, and traffic cameras.
Integration with Other Services: Waze integrates with various services, including music streaming apps and calendar events, to enhance the driving experience.
Waze has become a popular tool for drivers looking to avoid traffic and reach their destinations more efficiently. Its reliance on community input makes it a dynamic and constantly evolving platform.

# Task 1: Read data from csv file into a pandas dataframe
* You are given a dataset file called waze.upload it and read it

# 1a: Import statements* 
Import pandas.

In [1]:
import pandas as pd

# 1b: Read in the first file* 
Use the pd.read_csv() function to read in the data from the three states with the most observations. The file is called waze_dataset.csvv'. Assign the resulting dataframe to a variable nameddf3.*

Use the head() method on tdfop3 dataframe to inspect the first five rows.

In [6]:
df = pd.read_csv(r'C:\Users\walte\Downloads\waze_dataset.csv')
df

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.920550,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.247020,1562,166,5,3950.202008,1219.555924,27,18,Android
...,...,...,...,...,...,...,...,...,...,...,...,...,...
14994,14994,retained,60,55,207.875622,140,317,0,2890.496901,2186.155708,25,17,iPhone
14995,14995,retained,42,35,187.670313,2505,15,10,4062.575194,1208.583193,25,20,Android
14996,14996,retained,273,219,422.017241,1873,17,0,3097.825028,1031.278706,18,17,iPhone
14997,14997,churned,149,120,180.524184,3150,45,0,4051.758549,254.187763,6,6,iPhone


In [8]:
df.head(5)

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android


# 1c upload xls file
* upload the xlsx file to get the description of the columns 

In [15]:

# Load the XLS file with column descriptions

column_descriptions = pd.read_excel(r'C:\Users\walte\Downloads\waze_data_dictionary (1).xlsx', sheet_name=None)

# Display the sheet names and first few rows of the first sheet
sheet_names = column_descriptions.keys()
for sheet in sheet_names:
    print(f"Sheet name: {sheet}")
    print(column_descriptions[sheet].head(5))

Sheet name: data_dictionary
  Column name Type                                        Description
0          ID  int                        A sequential numbered index
1       label  obj  Binary target variable (“retained” vs “churned...
2    sessions  int  The number of occurrence of a user opening the...
3      drives  int  An occurrence of driving at least 1 km during ...
4      device  obj    The type of device a user starts a session with


# Task 2: Summary information* 
Now that you have a dataframe with thewazeI data forthe individualsa, get some high-level summary information about it.

## 2a: Metadata
* Use a DataFrame method to examine the number of rows and columns, the column names, the data type contained in each column, the number of non-null values in each column, and the amount of memory the dataframe uses.

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   ID                       14999 non-null  int64  
 1   label                    14299 non-null  object 
 2   sessions                 14999 non-null  int64  
 3   drives                   14999 non-null  int64  
 4   total_sessions           14999 non-null  float64
 5   n_days_after_onboarding  14999 non-null  int64  
 6   total_navigations_fav1   14999 non-null  int64  
 7   total_navigations_fav2   14999 non-null  int64  
 8   driven_km_drives         14999 non-null  float64
 9   duration_minutes_drives  14999 non-null  float64
 10  activity_days            14999 non-null  int64  
 11  driving_days             14999 non-null  int64  
 12  device                   14999 non-null  object 
dtypes: float64(3), int64(8), object(2)
memory usage: 1.5+ MB


In [16]:
df.shape

(14999, 13)

In [17]:
df.columns

Index(['ID', 'label', 'sessions', 'drives', 'total_sessions',
       'n_days_after_onboarding', 'total_navigations_fav1',
       'total_navigations_fav2', 'driven_km_drives', 'duration_minutes_drives',
       'activity_days', 'driving_days', 'device'],
      dtype='object')

## 2b: Summary statistics* 
Examine the summary statistics of the dataframe's numeric columns. The output should be a table that includes row count, mean, standard deviation, min, max, and quartile values.

In [18]:
df.describe()

Unnamed: 0,ID,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days
count,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0
mean,7499.0,80.633776,67.281152,189.964447,1749.837789,121.605974,29.672512,4039.340921,1860.976012,15.537102,12.179879
std,4329.982679,80.699065,65.913872,136.405128,1008.513876,148.121544,45.394651,2502.149334,1446.702288,9.004655,7.824036
min,0.0,0.0,0.0,0.220211,4.0,0.0,0.0,60.44125,18.282082,0.0,0.0
25%,3749.5,23.0,20.0,90.661156,878.0,9.0,0.0,2212.600607,835.99626,8.0,5.0
50%,7499.0,56.0,48.0,159.568115,1741.0,71.0,9.0,3493.858085,1478.249859,16.0,12.0
75%,11248.5,112.0,93.0,254.192341,2623.5,178.0,43.0,5289.861262,2464.362632,23.0,19.0
max,14998.0,743.0,596.0,1216.154633,3500.0,1236.0,415.0,21183.40189,15851.72716,31.0,30.0


# Task 3: Explore your data¶
Practice exploring your data..

## 3a: Rows per device
Select the device column and use the value_counts() method on it to check how many rows there are for each device in the dataframe

In [20]:
df['device'].value_counts()

device
iPhone     9672
Android    5327
Name: count, dtype: int64

In [21]:
df['label'].value_counts()

label
retained    11763
churned      2536
Name: count, dtype: int64

## Sort by sessions
1. Create a new dataframe called df_sorted by using the sort_values() method on the df dataframe. 2. 
The new dataframe should contain the data sorted bysessionsI
3.  beginning with the rows with the highestsessionI values.  d.

In [22]:
# 1. ### YOUR CODE HERE ###
df_sorted = df.sort_values(by='sessions', ascending=False)

# 2. ### YOUR CODE HERE ###

df_sorted.head(10)

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
12446,12446,churned,743,596,839.987478,360,0,143,10943.54797,5731.007821,11,8,iPhone
12646,12646,retained,725,582,737.475335,2098,1077,55,4743.892888,2477.289593,4,4,Android
8234,8234,retained,693,563,871.394467,1968,311,0,3050.036923,1563.565847,13,13,iPhone
8358,8358,churned,690,552,728.187235,3138,499,52,9048.490104,2073.484493,6,3,Android
5895,5895,retained,671,546,839.283344,1304,162,10,3204.09551,1625.804931,20,11,iPhone
5407,5407,retained,671,538,888.529872,1958,137,95,3998.734256,563.99799,19,11,iPhone
13071,13071,retained,657,529,661.914877,2939,334,0,4286.764482,2407.887207,17,10,Android
12059,12059,churned,627,506,628.840276,1326,29,0,2736.047674,1196.293615,11,7,iPhone
5316,5316,retained,625,501,722.190027,467,9,163,3239.144274,2523.460228,21,21,iPhone
9771,9771,retained,608,514,771.743616,3280,218,0,2783.898359,783.351442,29,28,iPhone


## 3c: Use iloc to select rows
Use iloc to select the two rows at indices 10 and 11 of the df_sorted dataframe

In [26]:
df_sorted.iloc[[10,11]]

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
1895,1895,retained,607,492,930.282527,2646,153,39,1931.381832,876.024352,28,18,iPhone
9797,9797,retained,590,472,677.359139,3285,29,0,8172.342435,1200.026489,29,23,Android


Use iloc to select the two rows at indices  upto 11 of the df_sorted dataframe

In [30]:
df_sorted.iloc[:11]

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
12446,12446,churned,743,596,839.987478,360,0,143,10943.54797,5731.007821,11,8,iPhone
12646,12646,retained,725,582,737.475335,2098,1077,55,4743.892888,2477.289593,4,4,Android
8234,8234,retained,693,563,871.394467,1968,311,0,3050.036923,1563.565847,13,13,iPhone
8358,8358,churned,690,552,728.187235,3138,499,52,9048.490104,2073.484493,6,3,Android
5895,5895,retained,671,546,839.283344,1304,162,10,3204.09551,1625.804931,20,11,iPhone
5407,5407,retained,671,538,888.529872,1958,137,95,3998.734256,563.99799,19,11,iPhone
13071,13071,retained,657,529,661.914877,2939,334,0,4286.764482,2407.887207,17,10,Android
12059,12059,churned,627,506,628.840276,1326,29,0,2736.047674,1196.293615,11,7,iPhone
5316,5316,retained,625,501,722.190027,467,9,163,3239.144274,2523.460228,21,21,iPhone
9771,9771,retained,608,514,771.743616,3280,218,0,2783.898359,783.351442,29,28,iPhone


In [46]:
df[['sessions']].min()

sessions    0
dtype: int64