### Credit Card Transactions in India

This dataset contains information about the credit card transactions in India. The data has the following schema:

- City	The city in which the transaction took place. (String)
- Date	The date of the transaction. (Date)
- Card Type	The type of credit card used for the transaction. (String)
- Exp Type	The type of expense associated with the transaction. (String)
- Gender	The gender of the cardholder. (String)
- Amount	The amount of the transaction. (Number)

In [1]:
import pandas as pd
df = pd.read_csv('./data/cc_transactions.csv')

In [2]:
df.head(2)

Unnamed: 0,index,City,Date,Card Type,Exp Type,Gender,Amount
0,0,"Delhi, India",29-Oct-14,Gold,Bills,F,82475
1,1,"Greater Mumbai, India",22-Aug-14,Platinum,Bills,F,32555


Q1. Use the dataset `./data/cc_transactions.csv` to find out the following:
- The number of transactions that have happened on each day of week across month and years.
- From the above find out the average number of transactions per day of week
- Now use an appropriate probability distribution to compute the Probability of seeing upto 190 transactions

In [3]:
df['Date']=pd.to_datetime(df.Date)
df['Month'] = df['Date'].dt.month
df['Dow'] = df['Date'].dt.dayofweek
df['Year'] = df['Date'].dt.year

**The number of transactions that have happened on each day of week across month and years**

In [4]:
df.groupby(['Year','Month','Dow']).agg({'Date':'count'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Date
Year,Month,Dow,Unnamed: 3_level_1
2013,10,0,165
2013,10,1,185
2013,10,2,201
2013,10,3,181
2013,10,4,196
...,...,...,...
2015,5,2,137
2015,5,3,139
2015,5,4,151
2015,5,5,174


**From the above find out the average number of transactions per day of week**

In [5]:
df.groupby(['Year','Month','Dow']).agg({'Date':'count'}).mean().round()

Date    186.0
dtype: float64

**Now use an appropriate probability distribution to compute the Probability of seeing upto 190 transactions**

In [6]:
from scipy.stats import binom, poisson

In [7]:
poisson.cdf(190,186)

0.6333797603939718

Q2. Use the dataset `./data/cc_transactions.csv` to answer the following:
- The number of transactions that have happened on each day of week across month and years when the card type was **Gold**
- From the above find out the average number of transactions per day of week
- Now use an appropriate probability distribution to compute the Probability of seeing upto 70 transactions

**The number of transactions that have happened on each day of week across month and years when the card type was Gold**

In [8]:
df[df['Card Type']=='Gold'].groupby(['Year','Month','Dow']).agg({'Date':'count'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Date
Year,Month,Dow,Unnamed: 3_level_1
2013,10,0,36
2013,10,1,44
2013,10,2,46
2013,10,3,35
2013,10,4,55
...,...,...,...
2015,5,2,33
2015,5,3,30
2015,5,4,37
2015,5,5,44


**From the above find out the average number of transactions per day of week**

In [9]:
df[df['Card Type']=='Gold'].groupby(['Year','Month','Dow']).agg({'Date':'count'}).mean().round()

Date    45.0
dtype: float64

**Now use an appropriate probability distribution to compute the Probability of seeing upto 70 transactions**

In [10]:
poisson.cdf(70,45)

0.9997925314710944

Q3. In a survey of 60 universities across the country it was observed that there were only 10 universities which had courses on quantum computing. If you randomly survey 15 universities, what is the probability that more than 8 will have a course on quantum computing?

In [14]:
num_trials = 15
prob_success = 10/60
num_successes = 8
1 - binom.cdf(num_successes,num_trials,prob_success)

0.00018822421792097366

Q4. A Bengali Sweet restaurant is very famous for it’s Gulab Jamun .Now one day the owner
claimed that 60% of customers like the sweet from his restaurant.If 20 customers were sampled
randomly. What is the probability that 5 would like the sweet.your answer should be
round to 3.

In [16]:
num_trails = 20
prob_success = 0.60
num_successes = 5
round(binom.cdf(num_successes,num_trials,prob_success),3)

0.034

Q5. Use the dataset `./data/bakery_sales.csv`. Find out the probability that out of 40 transactions, atleast 12 will be where a baguette is sold?

In [24]:
df2 = pd.read_csv("./data/bakery_sales.csv")
df2.head(2)

Unnamed: 0.1,Unnamed: 0,date,time,ticket_number,article,Quantity,unit_price
0,0,2021-01-02,08:38,150040.0,BAGUETTE,1.0,"0,90 €"
1,1,2021-01-02,08:38,150040.0,PAIN AU CHOCOLAT,3.0,"1,20 €"


In [25]:
df2[df2['article']=='BAGUETTE'].shape[0]

15292

In [26]:
df2.shape[0]

234005

In [27]:
15292/234005

0.06534903100361103

In [28]:
num_trials = 40
prob_success = 15292/234005
num_successes = 11 ## (Atleast 12 = P(X>=12)=1 - P(X<=11)
1 - binom.cdf(num_successes,num_trials,prob_success)

5.994646452900376e-06