#### ID 10302

```You’re given a dataset of uber rides with the traveling distance (‘distance_to_travel’) and cost (‘monetary_cost’) for each ride. First, find the difference between the distance-per-dollar for each date and the average distance-per-dollar for that year-month. Distance-per-dollar is defined as the distance traveled divided by the cost of the ride. Use the calculated difference on each date to calculate absolute average difference in distance-per-dollar metric on monthly basis (year-month). --The output should include the year-month (YYYY-MM) and the absolute average difference in distance-per-dollar (Absolute value to be rounded to the 2nd decimal). You should also count both success and failed request_status as the distance and cost values are populated for all ride requests. Also, assume that all dates are unique in the dataset. Order your results by earliest request date first.```

In [None]:
%%sql
WITH distance_per_dollar AS (SELECT request_date,
                                    TO_CHAR(request_date, 'YYYY-MM')   AS request_mnth,
                                    distance_to_travel / monetary_cost AS distance_per_dollar
                             FROM uber_request_logs),
     daily_diff AS (SELECT request_mnth,
                           ABS(distance_per_dollar - AVG(distance_per_dollar)
                                                     OVER (PARTITION BY request_mnth)) AS difference
                    FROM distance_per_dollar)
select request_mnth, round(avg(difference)::decimal, 2) as difference
from daily_diff
group by request_mnth 
order by request_mnth

In [None]:
df = uber_request_logs

df['request_mnth'] = df['request_date'].dt.strftime('%Y-%m')

df['distance_per_dollar'] = df['distance_to_travel'] / df['monetary_cost']

df['avg_distance_per_dollar'] = df.groupby('request_mnth')['distance_per_dollar'].transform('mean')

df['difference'] = (df['distance_per_dollar'] - df['avg_distance_per_dollar']).abs()

df.groupby('request_mnth').agg(abs_difference=('difference', lambda x: x.mean().round(2))).reset_index().sort_values(
    'request_mnth')

#### ID 10313

```Some forecasting methods are extremely simple and surprisingly effective. Naïve forecast is one of them; we simply set all forecasts to be the value of the last observation. Our goal is to develop a naïve forecast for a new metric called "distance per dollar" defined as the (distance_to_travel/monetary_cost) in our dataset and measure its accuracy. To develop this forecast,  sum "distance to travel"  and "monetary cost" values at a monthly level before calculating "distance per dollar". This value becomes your actual value for the current month. The next step is to populate the forecasted value for each month. This can be achieved simply by getting the previous month's value in a separate column. Now, we have actual and forecasted values. This is your naïve forecast. Let’s evaluate our model by calculating an error matrix called root mean squared error (RMSE). RMSE is defined as sqrt(mean(square(actual - forecast)). Report out the RMSE rounded to the 2nd decimal spot.```

In [None]:
%%sql
WITH monthly_dist_per_dollar AS (SELECT TO_CHAR(request_date, 'YYYY-MM')             AS request_mnth,
                    SUM(distance_to_travel) / SUM(monetary_cost) AS monthly_dist_per_dollar
             FROM uber_request_logs
             GROUP BY request_mnth),

     prev_monthly_dist_per_dollar AS (SELECT request_mnth,
                      monthly_dist_per_dollar,
                      LAG(monthly_dist_per_dollar, 1)
                      OVER (ORDER BY request_mnth) AS prev_monthly_dist_per_dollar
               FROM monthly_dist_per_dollar
               ORDER BY request_mnth),

     power AS (SELECT request_mnth,
                      monthly_dist_per_dollar,
                      prev_monthly_dist_per_dollar,
                      POWER(prev_monthly_dist_per_dollar - monthly_dist_per_dollar,
                            2) AS power
               FROM prev_monthly_dist_per_dollar)

SELECT ROUND(SQRT(AVG(power))::DECIMAL, 2) AS rmse
FROM power

In [None]:
df = uber_request_logs

df['request_mnth'] = df['request_date'].dt.to_period('m')

df_grouped = df.groupby('request_mnth', as_index=False).apply(
    lambda x: x['distance_to_travel'].sum() / x['monetary_cost'].sum())
df_grouped.columns = ['request_mnth', 'monthly_dist_per_dollar']

df_grouped['prev_monthly_dist_per_dollar'] = df_grouped.sort_values('request_mnth')['monthly_dist_per_dollar'].shift(1)

df_grouped['power'] = (df_grouped['prev_monthly_dist_per_dollar'] - df_grouped['monthly_dist_per_dollar']) ** 2

sqrt(df_grouped['power'].mean()).round(2)