# Order Data Analysis.

## Project Goals and Requirements.

This project will have a few goals that must be met by the end. These goals are:
* Clean up the data, specifically any missing data on any rows.
* Create a total column that multiplies the price by the quantity ordered.
* Create a number of charts (what they are is to be determined).
    * Top 5 items that are frequently purchased
* Export the results to and Excel workbook / worksheet.

## Project Solution.

### Step 1. Import the Required Modules and Libraries.

In [78]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Step 2. Create a Pandas DataFrame From the Data CSV File.

In [79]:
order_data = pd.read_csv("data/order-data.csv")

# Show first five rows:
order_data.head()

Unnamed: 0,order_id,order_date,customer_name,item_vendor_name,item,item_qty,price_per_unit,price_currency
0,1,2022/05/31,Jane Bloggs,Wikizz,Lid - 3oz Med Rec,1.0,290.921,GBP
1,2,2022/06/14,Jane Bloggs,Yodoo,Beans - French,5.0,369.391,GBP
2,3,2021/08/25,Jane Bloggs,Zoonoodle,Maple Syrup,5.0,496.848,GBP
3,4,2021/09/01,Jane Bloggs,Skibox,Strawberries - California,5.0,129.945,GBP
4,5,2021/08/23,Jane Bloggs,Teklist,"Lamb - Racks, Frenched",1.0,361.578,GBP


In [80]:
# --- Show the length of the DataFrame:
print(f"Total Rows (Before NaN Removal): {len(order_data)}")

Total Rows (Before NaN Removal): 1000


### Step 3. Check for NaN (null) Values and Clean-Up.

Rules for cleanup:
* order_id missing: Delete row.
* order_date missing: Delete row.
* customer_name missing: Set name to "Jane Bloggs".
* item_vendor_name missing: Set name to "Unknown".
* item missing: Delete row.
* item_qty missing: Delete row.
* price_per_unit missing: Delete row.
* price_currency missing: Set to "GBP".

#### Step 3.1. Check for NaN In All Rows / Columns.

In [81]:
order_data.isna().sum()

order_id              0
order_date           17
customer_name       108
item_vendor_name      0
item                108
item_qty             73
price_per_unit       56
price_currency      108
dtype: int64

### Step 3.2. Replace NaN Values (Where Required).

In [82]:
# --- Replace NaN in customer_name with "Jane Bloggs":
order_data["customer_name"].fillna(value = "Jane Bloggs",
                                   inplace = True)

# Check for any NaN in customer_name:
print(f'customer_name NaN: {order_data["customer_name"].isna().sum()}')

customer_name NaN: 0


In [83]:
# --- Replace NaN in item_vendor_name with "Unknown":
order_data["item_vendor_name"].fillna(value = "Unknown",
                                   inplace = True)

# Check for any NaN in item_vendor_name:
print(f'item_vendor_name NaN: {order_data["item_vendor_name"].isna().sum()}')

item_vendor_name NaN: 0


In [84]:
# --- Replace NaN in price_currency with "GBP":
order_data["price_currency"].fillna(value = "GBP",
                                   inplace = True)

# Check for any NaN in price_currency:
print(f'price_currency NaN: {order_data["price_currency"].isna().sum()}')

price_currency NaN: 0


### Step 3.3. Remove Lines With NaN Values (Where Required).

As the above criteria for NaN values has been completed, any remaining NaN values in any row can be removed.

In [85]:
order_data.dropna(inplace = True)

Check for any remaining NaN values. There should be none.

In [86]:
order_data.isna().sum()

order_id            0
order_date          0
customer_name       0
item_vendor_name    0
item                0
item_qty            0
price_per_unit      0
price_currency      0
dtype: int64

In [87]:
print(f"Total Rows (After NaN Removal): {len(order_data)}")

Total Rows (After NaN Removal): 768


### Step 3.4. Convert item_qty Column To Integer from Float.

In [88]:
print(f'item_qty Data Type (Before Conversion): {order_data["item_qty"].dtype}')
print(f'item_qty (Before Conversion)\n{order_data["item_qty"].head()}');

order_data["item_qty"] = order_data["item_qty"].convert_dtypes(convert_integer=True)

print(f'\nitem_qty Data Type (After Conversion): {order_data["item_qty"].dtype}')
print(f'item_qty (After Conversion\n{order_data["item_qty"].head()}');

item_qty Data Type (Before Conversion): float64
item_qty (Before Conversion)
0    1.0
1    5.0
2    5.0
3    5.0
4    1.0
Name: item_qty, dtype: float64

item_qty Data Type (After Conversion): Int64
item_qty (After Conversion
0    1
1    5
2    5
3    5
4    1
Name: item_qty, dtype: Int64


### Step 3.5. Round price_per_unit to Two Decimal Places.