# Ecommerce Purchases 
## Data Exploration

In the world of e-commerce, understanding customer behavior and preferences is paramount for business success. This data analysis project aims to uncover valuable insights from an E-commerce Purchase Dataset obtained from Kaggle. The dataset is rich in information, comprising various columns that provide a snapshot of customer interactions and transactions.

### Dataset Overview:

The E-commerce Purchase Dataset contains the following columns:

1. Address: The customer's shipping address.
2. Lot: A unique identifier for each transaction or purchase.
3. AM or PM: Indicates whether the purchase was made in the morning (AM) or afternoon (PM).
4. Browser Info: Information about the web browser used by the customer.
5. Company: The name of the company the customer works for.
6. Credit Card: The masked credit card number used for the transaction.
7. CC Exp Date: The credit card expiration date.
8. CC Security Code: The security code on the credit card.
9. CC Provider: The provider (e.g., Visa, MasterCard) of the credit card used.
10. Email: The customer's email address.
11. Job: The customer's job title or occupation.
12. IP Address: The customer's IP address.
13. Language: The language used in the transaction.
14. Purchase Price:This is the column we aim to fill with insights, representing the purchase price.

### Project Goals:

1. Display Top 10 Rows of The Dataset
2. Check Last 10 Rows of The Dataset
3. Check Datatype of Each Column
4. Check null values in the dataset
5. How many rows and columns are there in our Dataset? 
6. Highest and Lowest Purchase Prices.
7. Average Purchase Price
8. How many people have French 'fr' as their Language?
9. Job Title Contains Engineer
10. Find The Email of the person with the following IP Address: 132.207.160.22
11. How many People have Mastercard as their Credit Card Provider and made a purchase above 50?
12. Find the email of the person with the following Credit Card Number: 4664825258997302
13. How many people purchase during the AM and how many people purchase during PM?
14. How many people have a credit card that expires in 2020?
15. What are the top 5 most popular email providers (e.g. gmail.com, yahoo.com, etc...) 

In [1]:
# importing the necessary libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as pt

In [2]:
# loading the dataset
data = pd.read_csv('Ecommerce Purchases.csv')

In [3]:
# Statistics about the numerical columns
data.describe()

Unnamed: 0,Credit Card,CC Security Code,Purchase Price
count,10000.0,10000.0,10000.0
mean,2341374000000000.0,907.2178,50.347302
std,2256103000000000.0,1589.693035,29.015836
min,60401860000.0,0.0,0.0
25%,30563220000000.0,280.0,25.15
50%,869994200000000.0,548.0,50.505
75%,4492298000000000.0,816.0,75.77
max,6012000000000000.0,9993.0,99.99


#### Check Datatype of Each Column, How many rows and columns are there in our Dataset?

In [4]:
#Information about the Fields (Datatypes of each field and Non null count of the columns)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Address           10000 non-null  object 
 1   Lot               10000 non-null  object 
 2   AM or PM          10000 non-null  object 
 3   Browser Info      10000 non-null  object 
 4   Company           10000 non-null  object 
 5   Credit Card       10000 non-null  int64  
 6   CC Exp Date       10000 non-null  object 
 7   CC Security Code  10000 non-null  int64  
 8   CC Provider       10000 non-null  object 
 9   Email             10000 non-null  object 
 10  Job               10000 non-null  object 
 11  IP Address        10000 non-null  object 
 12  Language          10000 non-null  object 
 13  Purchase Price    10000 non-null  float64
dtypes: float64(1), int64(2), object(11)
memory usage: 1.1+ MB


#### Display Top 10 Rows of The Dataset

In [5]:
# Printing the first five rows
data.head()

Unnamed: 0,Address,Lot,AM or PM,Browser Info,Company,Credit Card,CC Exp Date,CC Security Code,CC Provider,Email,Job,IP Address,Language,Purchase Price
0,"16629 Pace Camp Apt. 448\nAlexisborough, NE 77...",46 in,PM,Opera/9.56.(X11; Linux x86_64; sl-SI) Presto/2...,Martinez-Herman,6011929061123406,02/20,900,JCB 16 digit,pdunlap@yahoo.com,"Scientist, product/process development",149.146.147.205,el,98.14
1,"9374 Jasmine Spurs Suite 508\nSouth John, TN 8...",28 rn,PM,Opera/8.93.(Windows 98; Win 9x 4.90; en-US) Pr...,"Fletcher, Richards and Whitaker",3337758169645356,11/18,561,Mastercard,anthony41@reed.com,Drilling engineer,15.160.41.51,fr,70.73
2,Unit 0065 Box 5052\nDPO AP 27450,94 vE,PM,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,"Simpson, Williams and Pham",675957666125,08/19,699,JCB 16 digit,amymiller@morales-harrison.com,Customer service manager,132.207.160.22,de,0.95
3,"7780 Julia Fords\nNew Stacy, WA 45798",36 vm,PM,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0 ...,"Williams, Marshall and Buchanan",6011578504430710,02/24,384,Discover,brent16@olson-robinson.info,Drilling engineer,30.250.74.19,es,78.04
4,"23012 Munoz Drive Suite 337\nNew Cynthia, TX 5...",20 IE,AM,Opera/9.58.(X11; Linux x86_64; it-IT) Presto/2...,"Brown, Watson and Andrews",6011456623207998,10/25,678,Diners Club / Carte Blanche,christopherwright@gmail.com,Fine artist,24.140.33.94,es,77.82


#### Check Last 10 Rows of The Dataset

In [6]:
# Printing the last 5 rows
data.tail()

Unnamed: 0,Address,Lot,AM or PM,Browser Info,Company,Credit Card,CC Exp Date,CC Security Code,CC Provider,Email,Job,IP Address,Language,Purchase Price
9995,"966 Castaneda Locks\nWest Juliafurt, CO 96415",92 XI,PM,Mozilla/5.0 (Windows NT 5.1) AppleWebKit/5352 ...,Randall-Sloan,342945015358701,03/22,838,JCB 15 digit,iscott@wade-garner.com,Printmaker,29.73.197.114,it,82.21
9996,"832 Curtis Dam Suite 785\nNorth Edwardburgh, T...",41 JY,AM,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,"Hale, Collins and Wilson",210033169205009,07/25,207,JCB 16 digit,mary85@hotmail.com,Energy engineer,121.133.168.51,pt,25.63
9997,Unit 4434 Box 6343\nDPO AE 28026-0283,74 Zh,AM,Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7...,Anderson Ltd,6011539787356311,05/21,1,VISA 16 digit,tyler16@gmail.com,Veterinary surgeon,156.210.0.254,el,83.98
9998,"0096 English Rest\nRoystad, IA 12457",74 cL,PM,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_8;...,Cook Inc,180003348082930,11/17,987,American Express,elizabethmoore@reid.net,Local government officer,55.78.26.143,es,38.84
9999,"40674 Barrett Stravenue\nGrimesville, WI 79682",64 Hr,AM,Mozilla/5.0 (X11; Linux i686; rv:1.9.5.20) Gec...,Greene Inc,4139972901927273,02/19,302,JCB 15 digit,rachelford@vaughn.com,"Embryologist, clinical",176.119.198.199,el,67.59


#### Check null values in the dataset

In [7]:
# Checking the Null values in the dataset
data.isnull().sum()

Address             0
Lot                 0
AM or PM            0
Browser Info        0
Company             0
Credit Card         0
CC Exp Date         0
CC Security Code    0
CC Provider         0
Email               0
Job                 0
IP Address          0
Language            0
Purchase Price      0
dtype: int64

From this it is evident that there is no missing valus in the dataset

#### Highest and Lowest Purchase Prices.

In [67]:
# Highest purchase price
highest_purchase_price = data['Purchase Price'].max()
print(f'Highest purchase price is: {highest_purchase_price}')

Highest purchase price is: 99.99


In [68]:
# lowest purchase price
lowest_purchase_price = data['Purchase Price'].min()
print(f'Lowest purchase price is: {lowest_purchase_price}')

Lowest purchase price is: 0.0


#### Average Purchase Price

In [69]:
# Average purchase price
avg_purchase_price = data['Purchase Price'].mean()
print(f'Average purchase price is: {round(avg_purchase_price,2)}')

Average purchase price is: 50.35


#### How many people have French 'fr' as their Language?

In [70]:
# Number of people speaking french
fr_as_lan = data[data['Language'] == 'fr']

In [12]:
print(f'Total number of french speaking customers are: {len(fr_as_lan)}')

Total number of french speaking customers are: 1097


#### Job Title Contains Engineer

In [71]:
data.columns # Checking the column names

Index(['Address', 'Lot', 'AM or PM', 'Browser Info', 'Company', 'Credit Card',
       'CC Exp Date', 'CC Security Code', 'CC Provider', 'Email', 'Job',
       'IP Address', 'Language', 'Purchase Price', 'Email Provides'],
      dtype='object')

In [77]:
# We select the 'Job' column in the dataframe, and check whether the job column contains the substring 'engineer'
# while setting 'case = False' we ensure that the search is case insensitive.
data[data['Job'].str.contains('engineer', case = False)] #by default case is True

Unnamed: 0,Address,Lot,AM or PM,Browser Info,Company,Credit Card,CC Exp Date,CC Security Code,CC Provider,Email,Job,IP Address,Language,Purchase Price,Email Provides
1,"9374 Jasmine Spurs Suite 508\nSouth John, TN 8...",28 rn,PM,Opera/8.93.(Windows 98; Win 9x 4.90; en-US) Pr...,"Fletcher, Richards and Whitaker",3337758169645356,11/18,561,Mastercard,anthony41@reed.com,Drilling engineer,15.160.41.51,fr,70.73,reed.com
3,"7780 Julia Fords\nNew Stacy, WA 45798",36 vm,PM,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0 ...,"Williams, Marshall and Buchanan",6011578504430710,02/24,384,Discover,brent16@olson-robinson.info,Drilling engineer,30.250.74.19,es,78.04,olson-robinson.info
50,"41159 Michael Centers\nAdamsfort, RI 37108-6674",46 Ce,PM,Mozilla/5.0 (Windows 98; Win 9x 4.90; sl-SI; r...,"Wright, Williams and Mendez",4008586485908075,05/19,945,JCB 16 digit,susanvalentine@obrien.org,Mechanical engineer,213.203.143.215,de,36.85,obrien.org
55,"27635 Maureen Bypass Apt. 883\nSandraview, SD ...",59 LJ,AM,Mozilla/5.0 (iPod; U; CPU iPhone OS 3_3 like M...,Sims-Lyons,3158113629128344,09/19,857,VISA 16 digit,adkinsarthur@yahoo.com,"Engineer, broadcasting (operations)",227.235.89.210,pt,48.74,yahoo.com
60,"7126 Katherine Squares\nPerkinsview, CO 97299-...",63 qu,AM,Opera/8.68.(X11; Linux x86_64; en-US) Presto/2...,Marshall-Fernandez,349767747049645,05/20,672,JCB 15 digit,sweeneyhannah@jones.biz,"Engineer, agricultural",197.144.142.102,de,20.43,jones.biz
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9948,"95544 Johnson Isle Suite 939\nMichaelberg, RI ...",91 bW,AM,Opera/8.36.(X11; Linux x86_64; sl-SI) Presto/2...,Fox-Peterson,4762924304307,03/17,567,Mastercard,haleybenjamin@gmail.com,Structural engineer,120.36.140.58,en,71.89,gmail.com
9952,"9991 Vaughn Hills\nRacheltown, PA 55409",36 KC,PM,Mozilla/5.0 (X11; Linux i686; rv:1.9.5.20) Gec...,"Ward, Smith and Castillo",6011679271321726,09/19,964,Voyager,jonesjennifer@olson.com,"Engineer, energy",116.228.12.42,es,39.63,olson.com
9970,"0060 Keith Stream\nWestport, CO 47097",11 nt,PM,Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_8...,"Carpenter, Good and Hart",6011485664704662,07/19,543,Discover,rangelbrian@hotmail.com,Electrical engineer,242.8.85.205,en,17.76,hotmail.com
9977,"02182 Keith Expressway\nEast Shannon, CT 20578...",34 RL,AM,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,"Deleon, Jacobson and Benton",4186094003664688,06/21,397,JCB 16 digit,daltoncarter@yahoo.com,Biomedical engineer,146.238.118.2,fr,94.93,yahoo.com


In [20]:
total_len = len(data[data['Job'].str.contains('engineer', case = False)])
print('Total Number of Engineers in the dataset:',total_len)

Total Number of Engineers in the dataset: 984


#### Find The Email of the person with the following IP Address: 132.207.160.22

In [27]:
person_with_ip_Address = data[data['IP Address'] == '132.207.160.22']

In [29]:
person_with_ip_Address

Unnamed: 0,Address,Lot,AM or PM,Browser Info,Company,Credit Card,CC Exp Date,CC Security Code,CC Provider,Email,Job,IP Address,Language,Purchase Price
2,Unit 0065 Box 5052\nDPO AP 27450,94 vE,PM,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...,"Simpson, Williams and Pham",675957666125,08/19,699,JCB 16 digit,amymiller@morales-harrison.com,Customer service manager,132.207.160.22,de,0.95


#### How many People have Mastercard as their Credit Card Provider and made a purchase above 50?

In [30]:
data.columns

Index(['Address', 'Lot', 'AM or PM', 'Browser Info', 'Company', 'Credit Card',
       'CC Exp Date', 'CC Security Code', 'CC Provider', 'Email', 'Job',
       'IP Address', 'Language', 'Purchase Price'],
      dtype='object')

In [78]:
# Finding the number of customers with Mastercard 
data[(data['CC Provider'] == 'Mastercard') & (data['Purchase Price'] > 50)]['Email']

1               anthony41@reed.com
18              hannah63@yahoo.com
31            ashley12@hotmail.com
35          hgonzalez@mcdowell.com
90               nrogers@brown.com
                   ...            
9941         christian55@gmail.com
9948       haleybenjamin@gmail.com
9954          brownamy@perkins.com
9981    laurenbennett@richards.com
9987      denisehamilton@novak.biz
Name: Email, Length: 405, dtype: object

In [40]:
total_master_card_and_above_50 = len(data[(data['CC Provider'] == 'Mastercard') & (data['Purchase Price'] > 50)])

In [42]:
print(f'Total number of customers using Master card and made purchase above 50 is :{total_master_card_and_above_50}')

Total number of customers using Master card and made purchase above 50 is :405


#### Find the email of the person with the following Credit Card Number: 4664825258997302

In [49]:
email_id = data[data['Credit Card'] == 4664825258997302]['Email']
print(f'email of the person with the following Credit Card Number: 4664825258997302 is:  {email_id}')

email of the person with the following Credit Card Number: 4664825258997302 is:  9992    bberry@wright.net
Name: Email, dtype: object


#### How many people purchase during the AM and how many people purchase during PM?

In [53]:
data['AM or PM'].value_counts()

PM    5068
AM    4932
Name: AM or PM, dtype: int64

#### How many people have a credit card that expires in 2020?

In [101]:
import datetime as dt

def card_expires(dataset):
    count = 0
    for date in data['CC Exp Date']:
        date_obj =  dt.datetime.strptime(date, '%d/%y')
        date_obj = date_obj.year
        if date_obj == 2020:
            count += 1
    print('Total number of card that expires in 2020: ',count)
    
card_expires(data)

Total number of card that expires in 2020:  988


#### What are the top 5 most popular email providers (e.g. gmail.com, yahoo.com, etc...)

In [61]:
top_emails = list()

for email in data['Email']:
    top_emails.append(email.split('@')[1])
# print(top_emails)

data['Email Provides'] = top_emails

In [63]:
data.columns

Index(['Address', 'Lot', 'AM or PM', 'Browser Info', 'Company', 'Credit Card',
       'CC Exp Date', 'CC Security Code', 'CC Provider', 'Email', 'Job',
       'IP Address', 'Language', 'Purchase Price', 'Email Provides'],
      dtype='object')

In [81]:
data['Email Provides'].value_counts().head()

hotmail.com     1638
yahoo.com       1616
gmail.com       1605
smith.com         42
williams.com      37
Name: Email Provides, dtype: int64

## Conclusion

This Data Analysis Project aim to do some basic tasks of finding How many rows and columns are there in our Dataset?, Highest and Lowest Purchase Prices, Average Purchase Price, How many people have French 'fr' as their Language? etc..