In the 02-dataframe.ipynb notebook, you have the following:
df = dd.read_csv("data/yellow_tripdata_2019-.csv")
df
and
df = dd.read_csv("data/yellow_tripdata_2019-.csv",
dtype={'RatecodeID': 'float64',
'VendorID': 'float64',
'passenger_count': 'float64',
'payment_type': 'float64'
})
However, you are using df_dask in your checkpoint solution (for both) which does not exist.
Solution 1
std_tip = df_dask.groupby("passenger_count").tip_amount.std().compute()
Solution 2
mean_total = df_dask.total_amount.mean()
std_total = df_dask.total_amount.mean()
dask.compute(mean_total, std_total)
I recommend changing to use df_dask for the Dask dataframe section and change existing code to use it.
There are a few places that use df.xxx when going over the Dask dataframe.
e.g.
%%time
mean_tip_amount = df.groupby("passenger_count").tip_amount.mean()
mean_tip_amount
Also, I would make Solution 1 the following (if you use df_dask) and show the output (added 2nd line)
std_tip = df_dask.groupby("passenger_count").tip_amount.std().compute()
std_tip
Thanks for putting this course together and the notebooks! Very much enjoyed it!