# Assignment (real project name to be here)

## Business understanding

Define business problem that needs to be solved.
- What is the goal of the analysis?
- What are the requirements and constraints?
- What is the expected outcome?




This notebook presents a solution to the assignment focusing on optimizing a drone delivery company's operations. The project is divided into two main parts:

Clustering: Identifying optimal locations for drone delivery hubs using k-means and hierarchical clustering.

Association Rule Mining: Discovering relationships between product groups to enhance sales through targeted recommendations.

We will follow the Cross-Industry Standard Process for Data Mining (CRISP-DM) model to structure our analysis.

A drone delivery company aims to improve its operational efficiency and increase revenue. The key business objectives are:

Minimize Delivery Times & Costs: By strategically placing a set of drone hubs (depots), the company can reduce the average travel distance and time for each delivery, leading to lower fuel/energy costs and faster service. The goal is to find the optimal coordinates for these hubs.

Increase Sales Revenue: By understanding customer purchasing patterns, the company can create targeted marketing campaigns, product bundles, and personalized recommendations. The goal is to identify which product groups are frequently bought together to create effective cross-selling strategies.

In [14]:
# hello world

## Data understanding

collect and explore the data.
- What data is available? What are the characteristics of the data (variable types, value distributions etc.)?
- Are there any quality issues with the data (missing values, outliers, nonsensical values)?

In [15]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time

from sklearn.cluster import KMeans, AgglomerativeClustering
# from mlxtend.frequent_patterns import apriori, association_rules

# Set plot style for better visuals
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

Init 1

In [16]:
# Load the customer location data
locations_df = pd.read_csv('./droneData/drone_cust_locations.csv ')

# Display basic information and the first few rows
print("Customer Locations Data Info:")
locations_df.info()
print("\nFirst 5 rows:")
print(locations_df.head())
print("\nDescriptive Statistics:")
print(locations_df.describe())

Customer Locations Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5956 entries, 0 to 5955
Data columns (total 1 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   clientid;x;y  5956 non-null   object
dtypes: object(1)
memory usage: 46.7+ KB

First 5 rows:
                clientid;x;y
0  1;622.7715723;164.8576227
1  2;416.3572979;630.1936343
2  3;292.7350197;567.3332306
3  4;737.2112881;166.2256759
4   5;540.4753747;682.912298

Descriptive Statistics:
                     clientid;x;y
count                        5956
unique                       5956
top     1;622.7715723;164.8576227
freq                            1


Init 2

In [17]:
# Load the product group data
products_df = pd.read_csv('./droneData/drone_prod_groups.csv')

# Display basic information and the first few rows
print("\nProduct Group Data Info:")
products_df.info()
print("\nFirst 5 rows:")
print(products_df.head())


Product Group Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 21 columns):
 #   Column   Non-Null Count   Dtype
---  ------   --------------   -----
 0   ID       100000 non-null  int64
 1   Prod1    100000 non-null  int64
 2    Prod2   100000 non-null  int64
 3    Prod3   100000 non-null  int64
 4    Prod4   100000 non-null  int64
 5    Prod5   100000 non-null  int64
 6    Prod6   100000 non-null  int64
 7    Prod7   100000 non-null  int64
 8    Prod8   100000 non-null  int64
 9    Prod9   100000 non-null  int64
 10   Prod10  100000 non-null  int64
 11   Prod11  100000 non-null  int64
 12   Prod12  100000 non-null  int64
 13   Prod13  100000 non-null  int64
 14   Prod14  100000 non-null  int64
 15   Prod15  100000 non-null  int64
 16   Prod16  100000 non-null  int64
 17   Prod17  100000 non-null  int64
 18   Prod18  100000 non-null  int64
 19   Prod19  100000 non-null  int64
 20   Prod20  100000 non-null  int64
dtypes: int64

O## Data preparation

data preprocessing
- cleaning the data
- transforming the data
- selecting the relevant features

In [18]:
# hello world

## Modeling

choose a machine learning method and train the model (+ model validation)
- which method was used?
- which parameters were used?
- what was the performance of the model?

In [19]:
# hello world

## Evaluation

evaluate the model
- How well does the model perform?
- Does it meet the business requirements?

In [20]:
# hell world

## Deployment

johtopaatos / creating a recommendation of how to use the model in practice, or what to do next
- How will the model be used in practice?
- How will the results be communicated?

In [21]:
# hello world

### Reflection

#### Ai Usage
- for research

#### Team contribution
- who did what

#### Sources
- links & descriptions