# Customer Indentification

## 1. Project Overview
- Basic project information
- Initial setup with common data science libraries


## 2. Business Understanding
**Goal:** Define the problem and business objectives.
- What is the problem we’re solving?
- What are the business goals and success criteria?
- Who are the stakeholders?
- Business objectives
- Situation assessment
- Data mining goals
- Project plan

You have been hired by **Ebuy Emporium**, a new e-commerce startup. They have been up and running for a month and have had unexpected success. They are starting to have an active interest in their customer base.

**Questions:**
1. Who are their customers?
1. What do they buy?
1. What drives their purchasing behavior?
1. How many customers do they have?

However, before they do any serious analysis, they need to be able to count their customers, which happens to be more difficult than anticipated. 

## 3. Data Understanding
**Goal:** Explore and assess the data.
- What data is available
- Where does the come from?
- Data collection, description, and quality verification
- Initial data exploration (size, structure, missing values, distributions)
- Placeholder code for basic EDA

**One problem is there are multiple sources of customer data**:

1. The e-commerce platform’s customer database
    - where customer details are recorded when they sign up for an account online.
    - This is where most of the customer details should be found.

1. The in-house CRM (customer relationship management) system
    - where customer details are recorded when they make a purchase over the phone or are otherwise onboarded as customers (except because of purchasing online with a registered account).

1. The raw transaction data (sales/purchases)
    - we'll refer to as “purchases” or “sales,”
    - also contains purchases made “as a guest,” meaning customer records are not explicitly created at the time of purchase.


**Another problem is that the existing data sources may not be mutually exclusive—there might be overlaps across them all.**
- There is almost certainly some duplication either because:
    - the same customer had their details entered into multiple systems
    - or because they have made purchases both as a guest and with a registered account.
- Duplicate accounts may not contain exactly the same information; there may be typos or misspellings. 


| **Column** | Definition |
| ------ | ---------- |
| **event_time** | The exact date and time the purchase occurred. |
| **product_id** | The unique identifier of the purchased product. |
| **category_id** | The unique identifier of the purchased product's specific category. |
| **category_code** | A broad category for the purchased product. In a hierarchy, category codes contain multiple category IDs, and one category_id should only be linked to one category_code. |
| **brand** | The purchased item's brand (if applicable). |
| **price** | The price the item was bought for (in USD). |
| **session_id** | A unique identifier for a purchase session. If multiple items are purchased in a transaction, each item will have a row in the table, and the rows will share a session_id. |
| **customer_id** | The unique identifier of the customer if they purchased using a registered account. For guest purchases, this value will be missing. |
| **guest_first_name** | The first name that was supplied if a purchase was made as a guest. For purchases made using registered accounts, this value will be missing. |
| **guest_surname** | The surname that was supplied if a purchase was made as a guest. For purchases made using registered accounts, this value will be missing. |
| **guest_postcode** | The postcode that was supplied if a purchase was made as a guest. For purchases made using registered accounts, this value will be missing. |

## 4. Data Preparation
**Goal:** Clean, transform, and structure the data for modeling.
- Handle missing values and duplicates
- Engineer new features if necessary
- Normalize, encode, or scale data as needed
- Data cleaning
- Feature engineering
- Data transformation

## 5. Modeling
**Goal:** Select and apply appropriate models.
- What models will be tested?
- Define the target variable and evaluation metrics.
- Train, tune, and compare models.
- Model selection
- Model building
- Model assessment

## 6. Evaluation
**Goal:** Assess model performance and business impact.
- Review model accuracy, precision, recall, etc.
- Compare results to business objectives.
- Identify any risks or limitations.
- Results evaluation
- Process review
- Next steps determination

## 7. Deployment
**Goal:** Implement the model in production.
- Deployment plan: How will the model be deployed (batch, API, etc.)?
- Monitoring and maintenance: What ongoing checks and updates are needed?
- Final project report: Document outcomes, lessons learned, and future improvements.