<a href="https://colab.research.google.com/github/ADionysopoulos/efood_assesment/blob/main/BigQuery_bquxjob_4268ce04_18b9ad7f5ea.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [232]:
# @title Setup
from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table
import numpy as np
import pandas as pd
from itertools import product

project = 'efood2023-404109' # Project ID inserted based on the query results selected to explore
location = 'EU' # Location inserted based on the query results selected to explore
client = bigquery.Client(project=project, location=location)
data_table.enable_dataframe_formatter()
auth.authenticate_user()

## Reference SQL syntax from the original job
Use the ```jobs.query```
[method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query) to
return the SQL syntax from the job. This can be copied from the output cell
below to edit the query now or in the future. Alternatively, you can use
[this link](https://console.cloud.google.com/bigquery?j=efood2023-404109:EU:bquxjob_4268ce04_18b9ad7f5ea)
back to BigQuery to edit the query within the BigQuery user interface.

In [233]:
# Running this code will display the query used to generate your previous job

job = client.get_job('bquxjob_1b334270_18ba011cc46') # Job ID inserted based on the query results selected to explore
print(job.query)


-- SELECT 
--   user_id AS `Users`,
--   count(order_id) As count_order_id,
--   sum(amount) As sum_spending_amount,
--   sum(amount) / count(order_id) As mean_spending_amount,
--   count(NULLIF(coupon_discount_amount, 0)) As count_coupon_discount_amount,
--   sum(coupon_discount_amount) As sum_coupon_discount_amount,
--   CASE WHEN
--    count(NULLIF(coupon_discount_amount, 0)) <> 0 
--       THEN sum(coupon_discount_amount) / count(NULLIF(coupon_discount_amount, 0))
--     ELSE NULL
--   END AS mean_spending_amount
--   FROM `efood2023-404109.main_assessment.orders` 
--   group by user_id
--   order by count_order_id DESC

SELECT * FROM `efood2023-404109.main_assessment.orders`
  WHERE cuisine = "Breakfast"


# Result set loaded from BigQuery job as a DataFrame
Query results are referenced from the Job ID ran from BigQuery and the query
does not need to be re-run to explore results. The ```to_dataframe```
[method](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe)
downloads the results to a Pandas DataFrame by using the BigQuery Storage API.

To edit query syntax, you can do so from the BigQuery SQL editor or in the
```Optional:``` sections below.

## Show descriptive statistics using describe()
Use the ```pandas DataFrame.describe()```
[method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)
to generate descriptive statistics. Descriptive statistics include those that
summarize the central tendency, dispersion and shape of a dataset’s
distribution, excluding ```NaN``` values. You may also use other Python methods
to interact with your data.

In [251]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_1b334270_18ba011cc46') # Job ID inserted based on the query results selected to explore
results_init = job.to_dataframe()

In [252]:
# Keep only the users who had took coupon in the past
results = results_init
results_init['order_timestamp_day'] = results_init['order_timestamp'].dt.day_of_year
results_init['coupon_discount'] = results_init['coupon_discount_amount'] > 0
users_with_coupon = list(results_init.loc[results_init['coupon_discount'].values, "user_id"].unique())
results = results.set_index('user_id')
target_coupon_results = results.loc[users_with_coupon,:]
target_coupon_results = target_coupon_results.reset_index()

In [255]:
coupon_effect = pd.pivot_table(target_coupon_results,
                               index = ["user_id", "order_timestamp_day"],
                               values = ["order_id", "amount", "coupon_discount_amount"],
                               aggfunc= {"order_id": "count", "amount": [np.median, np.sum], "coupon_discount_amount": np.sum})
coupon_effect.columns = ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']

In [256]:
user_ids = coupon_effect.reset_index()['user_id'].unique()
days_range = range(results_init['order_timestamp_day'].unique().min(), results_init['order_timestamp_day'].unique().max() + 1)

df_index_user_id = [user_id for user_id in user_ids for _ in days_range]
df_index_day     = [day for _ in user_ids for day in days_range]

In [257]:
extended_coupon_effect = pd.DataFrame(columns = coupon_effect.columns)
extended_coupon_effect["user_id"] = df_index_user_id
extended_coupon_effect["order_timestamp_day"] = df_index_day
extended_coupon_effect = extended_coupon_effect.set_index(["user_id", "order_timestamp_day"])

In [258]:
extended_coupon_effect.loc[coupon_effect.index, ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']] = coupon_effect[['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']]
extended_coupon_effect = extended_coupon_effect.fillna(0)
extended_coupon_effect["coupon event"] = extended_coupon_effect['coupon_discount_amount_sum'] > 0

In [259]:
A = extended_coupon_effect.reset_index()
events = A.loc[A["coupon event"], ["user_id", "order_timestamp_day","coupon event"]]

coupon_case_bef_df = pd.DataFrame(columns = extended_coupon_effect.columns)
coupon_case_after_df = pd.DataFrame(columns = extended_coupon_effect.columns)

coupon_case = 0
coupon_window = 3

extended_coupon_effect["Aggr_coupon_team_bef"] = np.nan
extended_coupon_effect["Aggr_coupon_team_aft"] = np.nan

for index, row in events.iterrows():

  user_id = row["user_id"]
  coupon_day = row["order_timestamp_day"]

  if (coupon_day - coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1)) & ((coupon_day + coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1))):

    index_prev= []

    for k in range(coupon_day - coupon_window, coupon_day):
      index_prev.append([user_id, k])

    index_after= []

    for k in range(coupon_day + 1, coupon_day + coupon_window + 1):
      index_after.append([user_id, k])

    if (extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"].fillna(0).sum() == 0) & (extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"].fillna(0).sum() == 0):
      extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"] = coupon_case
      extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"] = coupon_case

    coupon_case +=1



In [260]:
A1 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_bef", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})
A2 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_aft", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})

In [261]:
A1.columns
diffs = A2 - A1
diffs.describe()

Unnamed: 0,amount_median,order_id_count
count,6371.0,6371.0
mean,-0.020212,-0.048815
std,2.318656,1.545227
min,-16.366667,-8.0
25%,-1.066667,-1.0
50%,0.0,0.0
75%,0.966667,1.0
max,21.333333,15.0


In [243]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_1b334270_18ba011cc46') # Job ID inserted based on the query results selected to explore
results_init = job.to_dataframe()

for i in results_init['user_class_name'].unique():

  results_init = job.to_dataframe()
  results_init = results_init[results_init['user_class_name'] == i ]

  results = results_init
  # Keep only the users who had took coupon in the past
  results_init['order_timestamp_day'] = results_init['order_timestamp'].dt.day_of_year
  results_init['coupon_discount'] = results_init['coupon_discount_amount'] > 0
  users_with_coupon = list(results_init.loc[results_init['coupon_discount'].values, "user_id"].unique())
  results = results.set_index('user_id')
  target_coupon_results = results.loc[users_with_coupon,:]
  target_coupon_results = target_coupon_results.reset_index()

  coupon_effect = pd.pivot_table(target_coupon_results,
                                index = ["user_id", "order_timestamp_day"],
                                values = ["order_id", "amount", "coupon_discount_amount"],
                                aggfunc= {"order_id": "count", "amount": [np.median, np.sum],"coupon_discount_amount": np.sum})
  coupon_effect.columns = ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']

  user_ids = coupon_effect.reset_index()['user_id'].unique()
  days_range = range(results_init['order_timestamp_day'].unique().min(), results_init['order_timestamp_day'].unique().max() + 1)

  df_index_user_id = [user_id for user_id in user_ids for _ in days_range]
  df_index_day = [day for _ in user_ids for day in days_range]

  extended_coupon_effect =pd.DataFrame(columns = coupon_effect.columns)
  extended_coupon_effect["user_id"] = df_index_user_id
  extended_coupon_effect["order_timestamp_day"] = df_index_day
  extended_coupon_effect = extended_coupon_effect.set_index(["user_id", "order_timestamp_day"])

  extended_coupon_effect.loc[coupon_effect.index, ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']] = coupon_effect[['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']]
  extended_coupon_effect = extended_coupon_effect.fillna(0)
  extended_coupon_effect["coupon event"] = extended_coupon_effect['coupon_discount_amount_sum'] > 0

  A = extended_coupon_effect.reset_index()
  events = A.loc[A["coupon event"], ["user_id", "order_timestamp_day","coupon event"]]

  coupon_case_bef_df = pd.DataFrame(columns = extended_coupon_effect.columns)
  coupon_case_after_df = pd.DataFrame(columns = extended_coupon_effect.columns)

  coupon_case = 0
  coupon_window = 3

  extended_coupon_effect["Aggr_coupon_team_bef"] = np.nan
  extended_coupon_effect["Aggr_coupon_team_aft"] = np.nan

  for index, row in events.iterrows():

    user_id = row["user_id"]
    coupon_day = row["order_timestamp_day"]

    if (coupon_day - coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1)) & ((coupon_day + coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1))):

      index_prev= []

      for k in range(coupon_day - coupon_window, coupon_day):
        index_prev.append([user_id, k])

      index_after= []

      for k in range(coupon_day + 1, coupon_day + coupon_window + 1):
        index_after.append([user_id, k])

      if (extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"].fillna(0).sum() == 0) & (extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"].fillna(0).sum() == 0):
        extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"] = coupon_case
        extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"] = coupon_case

      coupon_case +=1

  A1 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_bef", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})
  A2 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_aft", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})

  A1.columns
  diffs = A2 - A1
  print(i)
  print(diffs.describe())
  print("----------------------------------------------------------------")


Loyal
       amount_median  order_id_count
count    1915.000000     1915.000000
mean       -0.037681       -0.084595
std         1.678881        1.450740
min        -7.566667       -6.000000
25%        -0.983333       -1.000000
50%         0.000000        0.000000
75%         0.833333        1.000000
max         8.766667        6.000000
----------------------------------------------------------------
All Star
       amount_median  order_id_count
count    3308.000000     3308.000000
mean       -0.085144       -0.071644
std         2.816450        1.815501
min       -16.366667       -8.000000
25%        -1.533333       -1.000000
50%         0.000000        0.000000
75%         1.366667        1.000000
max        21.333333       15.000000
----------------------------------------------------------------
Infrequent
       amount_median  order_id_count
count     266.000000      266.000000
mean        0.265602        0.112782
std         1.213527        0.462220
min        -4.500000       -1.

In [244]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_1b334270_18ba011cc46') # Job ID inserted based on the query results selected to explore
results_init = job.to_dataframe()

for i in results_init['city'].unique():

  results_init = job.to_dataframe()
  results_init = results_init[results_init['city'] == i ]

  results = results_init
  # Keep only the users who had took coupon in the past
  results_init['order_timestamp_day'] = results_init['order_timestamp'].dt.day_of_year
  results_init['coupon_discount'] = results_init['coupon_discount_amount'] > 0
  users_with_coupon = list(results_init.loc[results_init['coupon_discount'].values, "user_id"].unique())
  results = results.set_index('user_id')
  target_coupon_results = results.loc[users_with_coupon,:]
  target_coupon_results = target_coupon_results.reset_index()

  coupon_effect = pd.pivot_table(target_coupon_results,
                                index = ["user_id", "order_timestamp_day"],
                                values = ["order_id", "amount", "coupon_discount_amount"],
                                aggfunc= {"order_id": "count", "amount": [np.median, np.sum],"coupon_discount_amount": np.sum})
  coupon_effect.columns = ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']

  user_ids = coupon_effect.reset_index()['user_id'].unique()
  days_range = range(results_init['order_timestamp_day'].unique().min(), results_init['order_timestamp_day'].unique().max() + 1)

  df_index_user_id = [user_id for user_id in user_ids for _ in days_range]
  df_index_day = [day for _ in user_ids for day in days_range]

  extended_coupon_effect =pd.DataFrame(columns = coupon_effect.columns)
  extended_coupon_effect["user_id"] = df_index_user_id
  extended_coupon_effect["order_timestamp_day"] = df_index_day
  extended_coupon_effect = extended_coupon_effect.set_index(["user_id", "order_timestamp_day"])

  extended_coupon_effect.loc[coupon_effect.index, ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']] = coupon_effect[['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']]
  extended_coupon_effect = extended_coupon_effect.fillna(0)
  extended_coupon_effect["coupon event"] = extended_coupon_effect['coupon_discount_amount_sum'] > 0

  A = extended_coupon_effect.reset_index()
  events = A.loc[A["coupon event"], ["user_id", "order_timestamp_day","coupon event"]]

  coupon_case_bef_df = pd.DataFrame(columns = extended_coupon_effect.columns)
  coupon_case_after_df = pd.DataFrame(columns = extended_coupon_effect.columns)

  coupon_case = 0
  coupon_window = 3

  extended_coupon_effect["Aggr_coupon_team_bef"] = np.nan
  extended_coupon_effect["Aggr_coupon_team_aft"] = np.nan

  for index, row in events.iterrows():

    user_id = row["user_id"]
    coupon_day = row["order_timestamp_day"]

    if (coupon_day - coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1)) & ((coupon_day + coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1))):

      index_prev= []

      for k in range(coupon_day - coupon_window, coupon_day):
        index_prev.append([user_id, k])

      index_after= []

      for k in range(coupon_day + 1, coupon_day + coupon_window + 1):
        index_after.append([user_id, k])

      if (extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"].fillna(0).sum() == 0) & (extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"].fillna(0).sum() == 0):
        extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"] = coupon_case
        extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"] = coupon_case

      coupon_case +=1

  A1 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_bef", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})
  A2 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_aft", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})

  A1.columns
  diffs = A2 - A1
  print(i)
  print(diffs.describe())
  print("----------------------------------------------------------------")

Άρτα
       amount_median  order_id_count
count     301.000000      301.000000
mean        0.037680       -0.029900
std         1.685174        1.740432
min        -6.633333       -7.000000
25%        -0.833333       -1.000000
50%         0.000000        0.000000
75%         0.800000        1.000000
max         5.783333        7.000000
----------------------------------------------------------------
Αίγιο
       amount_median  order_id_count
count     183.000000      183.000000
mean       -0.137022        0.065574
std         2.172164        1.503134
min       -16.366667       -5.000000
25%        -0.616667       -1.000000
50%         0.000000        0.000000
75%         0.650000        1.000000
max         6.400000        5.000000
----------------------------------------------------------------
Δράμα
       amount_median  order_id_count
count     693.000000      693.000000
mean       -0.094488       -0.057720
std         2.086982        1.657742
min        -9.900000       -7.000000
25



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Ξάνθη
       amount_median  order_id_count
count     979.000000      979.000000
mean       -0.054826       -0.054137
std         1.982475        1.642399
min        -8.383333       -8.000000
25%        -1.058333       -1.000000
50%         0.000000        0.000000
75%         0.866667        1.000000
max         9.066667        9.000000
----------------------------------------------------------------




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Ρόδος
       amount_median  order_id_count
count    1623.000000     1623.000000
mean        0.127326       -0.007394
std         2.903877        1.427213
min       -13.166667       -5.000000
25%        -1.333333       -1.000000
50%         0.000000        0.000000
75%         1.500000        1.000000
max        21.333333        7.000000
----------------------------------------------------------------
Ραφήνα
       amount_median  order_id_count
count      42.000000       42.000000
mean        0.027778        0.000000
std         1.537374        1.577278
min        -3.300000       -3.000000
25%        -0.520833       -1.000000
50%         0.000000        0.000000
75%         0.000000        0.000000
max         5.033333        6.000000
----------------------------------------------------------------
Αγρίνιο
       amount_median  order_id_count
count     815.000000      815.000000
mean       -0.030378       -0.053988
std         1.573450        1.732627
min        -7.733333       -7.00000

In [245]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_1b334270_18ba011cc46') # Job ID inserted based on the query results selected to explore
results_init = job.to_dataframe()

for i in results_init['device'].unique():

  results_init = job.to_dataframe()
  results_init = results_init[results_init['device'] == i ]

  results = results_init
  # Keep only the users who had took coupon in the past
  results_init['order_timestamp_day'] = results_init['order_timestamp'].dt.day_of_year
  results_init['coupon_discount'] = results_init['coupon_discount_amount'] > 0
  users_with_coupon = list(results_init.loc[results_init['coupon_discount'].values, "user_id"].unique())
  results = results.set_index('user_id')
  target_coupon_results = results.loc[users_with_coupon,:]
  target_coupon_results = target_coupon_results.reset_index()

  coupon_effect = pd.pivot_table(target_coupon_results,
                                index = ["user_id", "order_timestamp_day"],
                                values = ["order_id", "amount", "coupon_discount_amount"],
                                aggfunc= {"order_id": "count", "amount": [np.median, np.sum],"coupon_discount_amount": np.sum})
  coupon_effect.columns = ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']

  user_ids = coupon_effect.reset_index()['user_id'].unique()
  days_range = range(results_init['order_timestamp_day'].unique().min(), results_init['order_timestamp_day'].unique().max() + 1)

  df_index_user_id = [user_id for user_id in user_ids for _ in days_range]
  df_index_day = [day for _ in user_ids for day in days_range]

  extended_coupon_effect =pd.DataFrame(columns = coupon_effect.columns)
  extended_coupon_effect["user_id"] = df_index_user_id
  extended_coupon_effect["order_timestamp_day"] = df_index_day
  extended_coupon_effect = extended_coupon_effect.set_index(["user_id", "order_timestamp_day"])

  extended_coupon_effect.loc[coupon_effect.index, ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']] = coupon_effect[['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']]
  extended_coupon_effect = extended_coupon_effect.fillna(0)
  extended_coupon_effect["coupon event"] = extended_coupon_effect['coupon_discount_amount_sum'] > 0

  A = extended_coupon_effect.reset_index()
  events = A.loc[A["coupon event"], ["user_id", "order_timestamp_day","coupon event"]]

  coupon_case_bef_df = pd.DataFrame(columns = extended_coupon_effect.columns)
  coupon_case_after_df = pd.DataFrame(columns = extended_coupon_effect.columns)

  coupon_case = 0
  coupon_window = 3

  extended_coupon_effect["Aggr_coupon_team_bef"] = np.nan
  extended_coupon_effect["Aggr_coupon_team_aft"] = np.nan

  for index, row in events.iterrows():

    user_id = row["user_id"]
    coupon_day = row["order_timestamp_day"]

    if (coupon_day - coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1)) & ((coupon_day + coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1))):

      index_prev= []

      for k in range(coupon_day - coupon_window, coupon_day):
        index_prev.append([user_id, k])

      index_after= []

      for k in range(coupon_day + 1, coupon_day + coupon_window + 1):
        index_after.append([user_id, k])

      if (extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"].fillna(0).sum() == 0) & (extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"].fillna(0).sum() == 0):
        extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"] = coupon_case
        extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"] = coupon_case

      coupon_case +=1

  A1 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_bef", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})
  A2 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_aft", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})

  A1.columns
  diffs = A2 - A1
  print(i)
  print(diffs.describe())
  print("----------------------------------------------------------------")

iOS
       amount_median  order_id_count
count    2294.000000     2294.000000
mean       -0.051043       -0.050567
std         2.562260        1.576669
min       -16.033333       -8.000000
25%        -1.266667       -1.000000
50%         0.000000        0.000000
75%         1.095833        1.000000
max        21.333333        8.000000
----------------------------------------------------------------
Android
       amount_median  order_id_count
count    3278.000000     3278.000000
mean        0.021505       -0.017084
std         2.141955        1.480938
min       -16.366667       -7.000000
25%        -0.833333       -1.000000
50%         0.000000        0.000000
75%         0.883333        1.000000
max        13.033333        9.000000
----------------------------------------------------------------
Desktop
       amount_median  order_id_count
count     390.000000      390.000000
mean       -0.030641       -0.071795
std         2.343513        1.474714
min       -11.733333       -4.000000

In [246]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_1b334270_18ba011cc46') # Job ID inserted based on the query results selected to explore
results_init = job.to_dataframe()

for i in results_init['paid_cash'].unique():

  results_init = job.to_dataframe()
  results_init = results_init[results_init['paid_cash'] == i]

  results = results_init
  # Keep only the users who had took coupon in the past
  results_init['order_timestamp_day'] = results_init['order_timestamp'].dt.day_of_year
  results_init['coupon_discount'] = results_init['coupon_discount_amount'] > 0
  users_with_coupon = list(results_init.loc[results_init['coupon_discount'].values, "user_id"].unique())
  results = results.set_index('user_id')
  target_coupon_results = results.loc[users_with_coupon,:]
  target_coupon_results = target_coupon_results.reset_index()

  coupon_effect = pd.pivot_table(target_coupon_results,
                                index = ["user_id", "order_timestamp_day"],
                                values = ["order_id", "amount", "coupon_discount_amount"],
                                aggfunc= {"order_id": "count", "amount": [np.median, np.sum],"coupon_discount_amount": np.sum})
  coupon_effect.columns = ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']

  user_ids = coupon_effect.reset_index()['user_id'].unique()
  days_range = range(results_init['order_timestamp_day'].unique().min(), results_init['order_timestamp_day'].unique().max() + 1)

  df_index_user_id = [user_id for user_id in user_ids for _ in days_range]
  df_index_day = [day for _ in user_ids for day in days_range]

  extended_coupon_effect =pd.DataFrame(columns = coupon_effect.columns)
  extended_coupon_effect["user_id"] = df_index_user_id
  extended_coupon_effect["order_timestamp_day"] = df_index_day
  extended_coupon_effect = extended_coupon_effect.set_index(["user_id", "order_timestamp_day"])

  extended_coupon_effect.loc[coupon_effect.index, ['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']] = coupon_effect[['amount_median', 'amount_sum', 'coupon_discount_amount_sum', 'order_id_count']]
  extended_coupon_effect = extended_coupon_effect.fillna(0)
  extended_coupon_effect["coupon event"] = extended_coupon_effect['coupon_discount_amount_sum'] > 0

  A = extended_coupon_effect.reset_index()
  events = A.loc[A["coupon event"], ["user_id", "order_timestamp_day","coupon event"]]

  coupon_case_bef_df = pd.DataFrame(columns = extended_coupon_effect.columns)
  coupon_case_after_df = pd.DataFrame(columns = extended_coupon_effect.columns)

  coupon_case = 0
  coupon_window = 3

  extended_coupon_effect["Aggr_coupon_team_bef"] = np.nan
  extended_coupon_effect["Aggr_coupon_team_aft"] = np.nan

  for index, row in events.iterrows():

    user_id = row["user_id"]
    coupon_day = row["order_timestamp_day"]

    if (coupon_day - coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1)) & ((coupon_day + coupon_window  in range(results_init["order_timestamp_day"].unique().min(), results_init["order_timestamp_day"].unique().max()+1))):

      index_prev= []

      for k in range(coupon_day - coupon_window, coupon_day):
        index_prev.append([user_id, k])

      index_after= []

      for k in range(coupon_day + 1, coupon_day + coupon_window + 1):
        index_after.append([user_id, k])

      if (extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"].fillna(0).sum() == 0) & (extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"].fillna(0).sum() == 0):
        extended_coupon_effect.loc[index_prev, "Aggr_coupon_team_bef"] = coupon_case
        extended_coupon_effect.loc[index_after, "Aggr_coupon_team_aft"] = coupon_case

      coupon_case +=1

  A1 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_bef", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})
  A2 = pd.pivot_table(extended_coupon_effect, index = "Aggr_coupon_team_aft", values= ['amount_median', "order_id_count"], aggfunc = {"amount_median":np.mean, "order_id_count": "sum"})

  A1.columns
  diffs = A2 - A1
  print(i)
  print(diffs.describe())
  print("----------------------------------------------------------------")



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



False
       amount_median  order_id_count
count    4013.000000     4013.000000
mean       -0.005319       -0.032146
std         2.044956        1.330565
min       -16.033333       -6.000000
25%        -0.666667       -1.000000
50%         0.000000        0.000000
75%         0.666667        1.000000
max        21.333333       12.000000
----------------------------------------------------------------
True
       amount_median  order_id_count
count    2408.000000     2408.000000
mean       -0.017762       -0.033223
std         2.461213        1.667817
min       -16.366667       -7.000000
25%        -1.283333       -1.000000
50%         0.000000        0.000000
75%         1.133333        1.000000
max        19.800000        9.000000
----------------------------------------------------------------
