# Exercise 1: Your first DataFrame

The goal of this exercise is to learn to create basic Pandas objects.

1. Create a DataFrame as below this using two ways:
   - From a NumPy array
   - From a Pandas Series

   |     | color | list    | number |
   | --: | :---- | :------ | -----: |
   |   1 | Blue  | [1, 2]  |    1.1 |
   |   3 | Red   | [3, 4]  |    2.2 |
   |   5 | Pink  | [5, 6]  |    3.3 |
   |   7 | Grey  | [7, 8]  |    4.4 |
   |   9 | Black | [9, 10] |    5.5 |

2. Print the types for every column and the types of the first value of every column

### Applying custom table styling

In [3]:
from IPython.display import display, HTML

css = """
<style>
    /* Table layout */
    table.dataframe {
        border-collapse: collapse;
        font-family: 'Segoe UI', Arial, sans-serif;
        font-size: 14px;
        margin: 20px 0;
        box-shadow: 0 2px 8px rgba(0,0,0,0.1);
        border-radius: 8px;
        overflow: hidden;
    }

    /* Header */
    table.dataframe thead th {
        background-color: #4A90D9;
        color: white;
        padding: 12px 16px;
        text-align: center;
        font-weight: 600;
        letter-spacing: 0.5px;
        border: none;
    }

    /* Index header (top-left corner) */
    table.dataframe thead th:first-child {
        background-color: #3a7bc8;
    }

    /* Rows */
    table.dataframe tbody tr {
        border-bottom: 1px solid #e0e0e0;
        transition: background-color 0.2s;
    }

    /* Alternating row colors */
    table.dataframe tbody tr:nth-child(even) {
        background-color: #f4f8ff;
    }

    table.dataframe tbody tr:nth-child(odd) {
        background-color: #ffffff;
    }

    /* Hover effect */
    table.dataframe tbody tr:hover {
        background-color: #dceeff;
    }

    /* Cells */
    table.dataframe tbody td {
        padding: 10px 16px;
        border: none;
        color: #333;
    }

    /* Index column */
    table.dataframe tbody th {
        padding: 10px 16px;
        background-color: #f0f4fa;
        font-weight: 600;
        color: #555;
        border: none;
        border-right: 2px solid #d0d8e8;
    }
</style>
"""

display(HTML(css))

### From Numpy Array

In [4]:
import numpy as np
import pandas as pd

colors = np.array(["Blue", "Red", "Pink", "Grey", "Black"])
lists = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
numbers = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
index = [1, 3, 5, 7, 9]

df_from_nparray = pd.DataFrame({"color": colors, "list": list(lists), "number": numbers}, index=index)
display(df_from_nparray)

Unnamed: 0,color,list,number
1,Blue,"[1, 2]",1.1
3,Red,"[3, 4]",2.2
5,Pink,"[5, 6]",3.3
7,Grey,"[7, 8]",4.4
9,Black,"[9, 10]",5.5


### From Pandas Series

In [5]:
colors = pd.Series(["Blue", "Red", "Pink", "Grey", "Black"], index=index)
lists = pd.Series([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]], index=index)
numbers = pd.Series([1.1, 2.2, 3.3, 4.4, 5.5], index=index)

df_from_pdseries = pd.DataFrame({"color": colors, "list": lists, "number": numbers})
display(df_from_pdseries)

Unnamed: 0,color,list,number
1,Blue,"[1, 2]",1.1
3,Red,"[3, 4]",2.2
5,Pink,"[5, 6]",3.3
7,Grey,"[7, 8]",4.4
9,Black,"[9, 10]",5.5


### Types of Every Column and Types of the First Value of Every Column

In [6]:
print("Column types:")
print(df_from_pdseries.dtypes)

print("\nTypes of first value in each column:")
for col in df_from_pdseries.columns:
    print(f"{col}: {type(df_from_pdseries[col].iloc[0])}")

Column types:
color         str
list       object
number    float64
dtype: object

Types of first value in each column:
color: <class 'str'>
list: <class 'list'>
number: <class 'numpy.float64'>


# Exercise 3: E-commerce purchases

The goal of this exercise is to learn to manipulate real data with Pandas. This exercise is less guided since the exercise 2 should have given you a nice introduction.

The data set used is [E-commerce purchases](Ecommerce_purchases.tx).

Questions:

1. How many rows and columns are there?

2. What is the average Purchase Price?

3. What were the highest and lowest purchase prices?

4. How many people have English 'en' as their Language of choice on the website?

5. How many people have the job title of "Lawyer" ?

6. How many people made the purchase during the AM and how many people made the purchase during PM ?

7. What are the 5 most common Job Titles?

8. Someone made a purchase that came from Lot: "90 WT" , what was the Purchase Price for this transaction?

9. What is the email of the person with the following Credit Card Number: 4926535242672853

10. How many people have American Express as their Credit Card Provider and made a purchase above $95 ?

11. How many people have a credit card that expires in 2025?

12. What are the top 5 most popular email providers/hosts (e.g. gmail.com, yahoo.com, etc...)

In [7]:
purchases_df = pd.read_csv("Ecommerce_purchases.txt", low_memory=False)

### Number of Rows and Columns

In [8]:
print(f"Number of rows: {purchases_df.shape[0]}, Number of columns: {purchases_df.shape[1]}")

Number of rows: 10000, Number of columns: 14


### Average Purchase Price

In [9]:
print(f"Average Purchase Price: {purchases_df['Purchase Price'].mean()}")

Average Purchase Price: 50.347302


### Highest and Lowest Purchase Prices

In [10]:
print(f"Highest Purchase Price: {purchases_df['Purchase Price'].max()}")
print(f"Lowest Purchase Price: {purchases_df['Purchase Price'].min()}")

Highest Purchase Price: 99.99
Lowest Purchase Price: 0.0


### People with English as their Language

In [11]:
print(f"Number of people with English as their language: {purchases_df[purchases_df['Language'] == 'en'].shape[0]}")

Number of people with English as their language: 1098


### People with Job Title "Lawyer"

In [12]:
print(f"Number of people with the job title 'Lawyer': {purchases_df[purchases_df['Job'] == 'Lawyer'].shape[0]}")

Number of people with the job title 'Lawyer': 30


### Purchases Made During AM and PM

In [13]:
print(f"Number of purchases made during AM: {purchases_df[purchases_df['AM or PM'] == 'AM'].shape[0]}")
print(f"Number of purchases made during PM: {purchases_df[purchases_df['AM or PM'] == 'PM'].shape[0]}")

Number of purchases made during AM: 4932
Number of purchases made during PM: 5068


### 5 Most Common Job Titles

In [24]:
print("5 most common Job Titles:")
display(purchases_df['Job'].value_counts().head(5).to_frame())

5 most common Job Titles:


Unnamed: 0_level_0,count
Job,Unnamed: 1_level_1
Interior and spatial designer,31
Lawyer,30
Social researcher,28
"Designer, jewellery",27
Purchasing manager,27


### Purchase Price for Purchase from Lot "90 WT"

In [18]:
print(f"Purchase Price for Lot '90 WT': {purchases_df[purchases_df['Lot'] == '90 WT']['Purchase Price'].iloc[0]}")

Purchase Price for Lot '90 WT': 75.1


### Owner's Email for Credit Card Number 4926535242672853

In [19]:
print(f"Email of person with Credit Card Number 4926535242672853: {purchases_df[purchases_df['Credit Card'] == 4926535242672853]['Email'].iloc[0]}")

Email of person with Credit Card Number 4926535242672853: bondellen@williams-garza.com


### People with American Express and Purchases Above $95

In [20]:
print(f"Number of people with American Express and purchase above $95: {purchases_df[(purchases_df['CC Provider'] == 'American Express') & (purchases_df['Purchase Price'] > 95)].shape[0]}")

Number of people with American Express and purchase above $95: 39


### People with Credit Card Expiring in 2025

In [21]:
print(f"Number of people with credit card expiring in 2025: {purchases_df[purchases_df['CC Exp Date'].str.contains('/25')].shape[0]}")

Number of people with credit card expiring in 2025: 1033


### 5 Most Popular Email Providers/Hosts

In [27]:
print("Top 5 most popular email providers:")
display(purchases_df['Email'].apply(lambda x: x.split('@')[1]).value_counts().head(5).to_frame())

Top 5 most popular email providers:


Unnamed: 0_level_0,count
Email,Unnamed: 1_level_1
hotmail.com,1638
yahoo.com,1616
gmail.com,1605
smith.com,42
williams.com,37
