### Resources
Email with subject line "Re: Redemption Flow Questions (adding partnerships and jarad)"

### 01 Objective
* What percentage of your users or buyers are US based?
* What other countries account for the top 90% of your sales? (this can be a rough estimate, just trying to get a sense of volume and top markets)?

### 02 Objective
* Below, I've listed all of the countries which have the same pricing. Do the countries below account for the bulk of your non US/Canadian users?  

Norway, Switzerland, Denmark, Austria, United States, Canada, Germany, Israel, Sweden, Ireland, Australia, Netherlands, Czech Republic, Finland, Japan, Hungary, New Zealand, France, Poland, United Kingdom

### Libraries

In [1]:
import os
tilde = os.path.expanduser('~')

import sys
sys.path.insert(0, tilde + '/Scripts/Fake Folder/Python Libraries')

from jb_libraries import *
%matplotlib inline

### Script settings

In [2]:
date_start = '2018-02-01'
date_end = '2019-02-28'

### Get data

In [3]:
pd.read_sql(
'''
SELECT
*
FROM orders_status
ORDER BY orders_status_id
''', db)

Unnamed: 0,orders_status_id,language_id,orders_status_name
0,1,1,Pending
1,2,1,Processing
2,3,1,Shipped
3,4,1,Update
4,5,1,Printed
5,6,1,Billed
6,7,1,Payment Received
7,8,1,Fraud - Pending
8,9,1,Fraud - Confirmed
9,10,1,Return


In [4]:
o_main = pd.read_sql(
'''
SELECT
DATE(date_purchased) AS date_purchased,
orders_id,
LOWER(customers_country) AS customers_country
FROM orders
WHERE DATE(date_purchased) BETWEEN ' '''+ date_start +''' ' AND ' '''+ date_end +''' '
AND payment_method != 'replacement order'
AND orders_status NOT IN (8,9,14,15) # fraud - pending, fraud - confirmed, voided, fraud - void
''', db)

col_fix(o_main)

o_main['date purchased'] = pd.to_datetime(o_main['date purchased'])
for x in ['year and month','year and quarter']:
    o_main[x] = jb_dates(o_main['date purchased'], x)

o_main['customers country'] = o_main['customers country'].str.strip()

### By country

In [5]:
by_country = o_main.groupby('customers country')[['orders id']].nunique().rename(columns = {'orders id':'unique order count'}).sort_values('unique order count', ascending = False)
by_country['% of total'] = by_country['unique order count']/by_country['unique order count'].sum()
by_country['% running sum'] = by_country['% of total'].cumsum()

ls1 = by_country.index.tolist()
ls2 = [x.title() for x in ls1]
by_country.index = ls2

print('data is from %s to %s' % (date_start, date_end))

ix = by_country[by_country['% running sum'] >= 0.951].index[0]
by_country.loc[:ix].format_(['n0','p1','p1'])

data is from 2018-02-01 to 2019-02-28


Unnamed: 0,Unique Order Count,% Of Total,% Running Sum
United States,280676,88.1%,88.1%
Canada,11000,3.5%,91.6%
Australia,3095,1.0%,92.5%
France,2291,0.7%,93.3%
United Kingdom,1699,0.5%,93.8%
Germany,1687,0.5%,94.3%
Switzerland,1366,0.4%,94.7%
Netherlands,1125,0.4%,95.1%
Norway,1108,0.3%,95.4%


### By country, excluding US and Canada

In [6]:
ls = ['Norway',
      'Switzerland',
      'Denmark',
      'Austria',
      'Germany',
      'Israel',
      'Sweden',
      'Ireland',
      'Australia',
      'Netherlands',
      'Czech Republic',
      'Finland',
      'Japan',
      'Hungary',
      'New Zealand',
      'France',
      'Poland',
      'United Kingdom']

In [7]:
ex = ['United States','Canada']
by_country2 = by_country[~by_country.index.isin(ex)].copy()

by_country2.drop(['% of total','% running sum'], 1, inplace = True)
by_country2['% of total'] = by_country2['unique order count']/by_country2['unique order count'].sum()
by_country2['% running sum'] = by_country2['% of total'].cumsum()

by_country2['in your list'] = np.where(by_country2.index.isin(ls),'X','')

a = np.sum(by_country2['in your list'] == 'X')
b = len(ls)

if a!= b:
    raise ValueError('check ur totals')

fmt = ['n0','p1','p1']
by_country2.format_(fmt).head()

Unnamed: 0,Unique Order Count,% Of Total,% Running Sum,In Your List
Australia,3095,11.5%,11.5%,X
France,2291,8.5%,20.0%,X
United Kingdom,1699,6.3%,26.3%,X
Germany,1687,6.3%,32.6%,X
Switzerland,1366,5.1%,37.7%,X


In [8]:
n = np.max(np.where(by_country2['in your list'] == 'X')[0])
print('the countries in the list are in the top %i out of %i' % (n, len(by_country2)))

the countries in the list are in the top 38 out of 133


In [9]:
do_this = 'no'

if do_this == 'yes':
    
    p = '/Users/jarad/Fake Folder/Accounts and Biz Dev/Ad Hoc/Redemption Flow Questions/CSVs/'
    writer = pd.ExcelWriter(p + 'Codecademy.xlsx', engine = 'xlsxwriter')
    workbook = writer.book
    fmt = workbook.add_format({'bold': True, 'font_size': 16})
    
    by_country2.to_excel(writer, 
                         'data',
                         startrow = 3,
                         startcol = 0,
                         index = True)
       
    sht = writer.sheets['data']
    sht.write(0,0,'Data is from %s to %s' % (date_start, date_end), fmt)
    sht.write(1,0,'Data does not include US or Canada', fmt)
    
    writer.save()

### By month

In [10]:
by_month = o_main.groupby(['customers country','year and month'])[['orders id']].nunique().rename(columns = {'orders id':'unique order count'}).unstack(1).fillna(0)
by_month.columns = by_month.columns.droplevel(0)
by_month['row total'] = by_month.sum(1)
by_month.sort_values('row total', ascending = False, inplace = True)

by_month_proportions = by_month.div(by_month.sum().values, axis = 1)

d = {}
for i in range(len(by_month_proportions)):
    df = by_month_proportions.iloc[i]
    df = df[df.index != 'row total']
    cols = ['lower','mean','upper']
    vals = jb_conf(pd.DataFrame(df.values)).loc[:'upper'].values.flatten().tolist()
    d[by_month_proportions.iloc[i].name] = vals

df = pd.DataFrame(d).T
df.columns = ['lower','mean','upper']

by_month_proportions = by_month_proportions.join(df)
by_month_proportions = by_month_proportions.loc[:,'row total':]

print('avg unique OID count per country per month')
fmt = ['p1'] * 4
by_month_proportions.format_(fmt).head()

avg unique OID count per country per month


Unnamed: 0_level_0,Row Total,Lower,Mean,Upper
customers country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
united states,88.1%,87.4%,88.1%,88.8%
canada,3.5%,3.2%,3.4%,3.7%
australia,1.0%,0.9%,1.0%,1.1%
france,0.7%,0.7%,0.7%,0.8%
united kingdom,0.5%,0.5%,0.5%,0.6%


In [11]:
print('done')

done
