<a href="https://colab.research.google.com/github/ADionysopoulos/efood_assesment/blob/main/BigQuery_bquxjob_4268ce04_18b9ad7f5ea.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [88]:
# @title Setup
from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table
import numpy as np
import pandas as pd

project = 'efood2023-404109' # Project ID inserted based on the query results selected to explore
location = 'EU' # Location inserted based on the query results selected to explore
client = bigquery.Client(project=project, location=location)
data_table.enable_dataframe_formatter()
auth.authenticate_user()

## Reference SQL syntax from the original job
Use the ```jobs.query```
[method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query) to
return the SQL syntax from the job. This can be copied from the output cell
below to edit the query now or in the future. Alternatively, you can use
[this link](https://console.cloud.google.com/bigquery?j=efood2023-404109:EU:bquxjob_4268ce04_18b9ad7f5ea)
back to BigQuery to edit the query within the BigQuery user interface.

In [None]:
# Running this code will display the query used to generate your previous job

job = client.get_job('bquxjob_4268ce04_18b9ad7f5ea') # Job ID inserted based on the query results selected to explore
print(job.query)

# Result set loaded from BigQuery job as a DataFrame
Query results are referenced from the Job ID ran from BigQuery and the query
does not need to be re-run to explore results. The ```to_dataframe```
[method](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe)
downloads the results to a Pandas DataFrame by using the BigQuery Storage API.

To edit query syntax, you can do so from the BigQuery SQL editor or in the
```Optional:``` sections below.

In [75]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_4268ce04_18b9ad7f5ea') # Job ID inserted based on the query results selected to explore
results_init = job.to_dataframe()
results_init.head(10)

Unnamed: 0,order_id,user_id,user_class_name,order_timestamp,city,vertical,cuisine,device,paid_cash,order_contains_offer,coupon_discount_amount,amount,delivery_cost
0,11624190919400,555845617152,Loyal,2023-09-25 10:57:40+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,2.0,0.0
1,11609038288316,555845617152,Loyal,2023-09-23 11:52:50+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,2.0,0.0
2,11572921541732,555845617152,Loyal,2023-09-18 12:13:43+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,2.0,0.0
3,11586173047628,555845617152,Loyal,2023-09-20 10:40:07+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,2.0,0.0
4,11557231192192,555845617152,Loyal,2023-09-16 11:43:53+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,4.0,0.0
5,11478121003028,555845617152,Loyal,2023-09-05 09:15:47+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,6.0,0.0
6,11587770326056,555845617152,Loyal,2023-09-20 14:26:55+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,9.2,0.0
7,11593774169068,555845617152,Loyal,2023-09-21 11:40:49+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,2.0,0.0
8,11566835488216,555845617152,Loyal,2023-09-17 15:31:04+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,7.6,0.0
9,11631037259852,555845617152,Loyal,2023-09-26 10:45:20+00:00,Άρτα,Restaurant,Breakfast,iOS,False,False,0.0,2.0,0.0


## Show descriptive statistics using describe()
Use the ```pandas DataFrame.describe()```
[method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)
to generate descriptive statistics. Descriptive statistics include those that
summarize the central tendency, dispersion and shape of a dataset’s
distribution, excluding ```NaN``` values. You may also use other Python methods
to interact with your data.

In [86]:
# Keep only the users who had took coupon in the past
results = results_init
results_init['order_timestamp_day'] = results_init['order_timestamp'].dt.day_of_year
results_init['coupon_discount'] = results_init['coupon_discount_amount'] > 0
users_with_coupon = list(results_init.loc[results_init['coupon_discount'].values, "user_id"].unique())
results = results.set_index('user_id')
target_coupon_results = results.loc[users_with_coupon,:]
target_coupon_results = target_coupon_results.reset_index()

In [95]:
coupon_effect = pd.pivot_table(target_coupon_results,
                               index = ["user_id", "order_timestamp_day"],
                               values = ["order_id", "amount", "coupon_discount_amount"],
                               aggfunc= {"order_id": "count", "amount": [np.median, np.sum],"coupon_discount_amount": np.sum})
coupon_effect.columns = ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']

In [112]:
df_index_user_id = []
df_index_day = []

for i in coupon_effect.reset_index()["user_id"].unique():
  for j in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1):
    df_index_user_id.append(i)
    df_index_day.append(j)


In [113]:
extended_coupon_effect =pd.DataFrame(columns = coupon_effect.columns)
extended_coupon_effect["user_id"] = df_index_user_id
extended_coupon_effect["order_timestamp_day"] = df_index_day
