# Project plan

- Author: Iina Pirinen
- Version: 0.9
- Date: 20.10.2023 

## Assignment

**Data Analysis for Car Dealership Optimization**

## Background and starting points

This project is part of the AI/DA (Artificial Intelligence/Data Analytics) course, TTC8070-3005, offered by JAMK University of Applied Sciences. The assignment involves analyzing a dataset related to used cars.

The project group consists of students from JAMK, collaborating to apply data mining and analytics techniques to gain insights into the used car market.

The project follows the CRISP-DM (Cross Industry Standard Process for Data Mining) model, which provides a structured framework for data analysis.

Although there is an "imaginary customer" defined for this project, it's important to note that the customer's presence is conceptual. The customer serves as the driving force behind the project's goals and objectives.


## Project Goals

1. Business Understanding: The primary goal in this phase is to understand the business objectives and questions. Key questions to address include:

    - What business goals are we aiming to achieve?
    - What specific business area questions need answers?
    - What results are we expecting from the project?
    - How can we measure the success of the project?
    - Who is the customer, and who else will benefit from the results?
    - What technologies and personnel skills are required?
    - How will work be divided among team members?

2. Data Understanding: In this phase, we will focus on gaining a comprehensive understanding of the dataset provided. Key considerations include:

    - Data collection methods and sources.
    - Detailed description of the dataset's contents.
    - Initial data exploration and visualization.
    - Evaluation of data quality, including handling missing values and outliers.
    - Data selection, which may involve identifying relevant features.
    - Exploration of potential additional data sources and how they can be integrated.

3. Data Preparation: Data preparation is crucial to ensure that the dataset is of sufficient quality for modeling. This phase includes:

    - Data cleaning and editing.
    - Data integration and formatting.
    - Handling missing values and outliers.
    - Feature selection and engineering.
    - Data scaling and formatting.
    - Visualizing data after preprocessing.

4. Modeling: In the modeling phase, we will build predictive models based on the preprocessed data. Key steps include:

    - Selection of modeling methods (classification or regression).
    - Building and training machine learning models.
    - Model evaluation and testing.
    - Visualization of model results.
    - Analysis of results for each model.
    - Experimentation with different hyperparameter values.

5. Evaluation: After modeling, the project's results will be evaluated to determine their effectiveness and areas for improvement. Key points include:

    - Assessment of the machine/deep learning models' performance.
    - Comparison of different modeling methods.
    - Identification of areas for model development and optimization.
    - Consideration of further improvements and task-specific insights.

6. Deployment: In the deployment phase, we will plan for putting the model into production for real-world use. Considerations include:

    - Developing a production plan.
    - Creating a prototype for the production model.
    - Planning for monitoring and maintenance (even if not implemented).
    - Production server selection and model upload.
    - Handling new data on the production server.
    - Designing a user interface for the application.
    - Defining the user base and usage scenarios.

The project will progress through these CRISP-DM phases, each building on the insights and outputs of the previous phase. Milestones will be set to track progress and ensure a structured approach to the project.

The project will deliver a final report that includes a self-assessment of the project's implementation, a maintenance plan, and practical steps for deploying the model in a real-world context.


## Terms and definitions

| Term | Definition |
| :-- | :-- |
| Cash Flow | the movement of money in and out of a business, which can be impacted by inventory holding times and pricing strategies |
| CRISP-DM | Cross Industry Standard Process for Data Mining, a structured approach for data analysis projects, with well-defined stages |
| Customer Satisfaction | providing a positive experience for buyers |
| Data Mining | the process of extracting patterns, insights, or knowledge from large datasets |
| Data Preprocessing | data preprocessing involves tasks like cleaning, transformation, and feature engineering to prepare the dataset for analysis |
| Data Scaling | standardizing or normalizing the numerical values in the dataset to ensure they're on a consistent scale |
| Data Visualization | the creation of graphical representations of data to help understand patterns, relationships, and trends |
| Dataset | "US Used cars" dataset |
| Feature Selection | choosing the most relevant features or variables from a dataset to improve model performance |
| HTML | Hypertext Markup Language, the standard language used for creating web pages and documents |
| Hyperparameters | hyperparameters are parameters set before training a machine learning model, influencing the model's behavior |
| Inventory Management | the process of overseeing and controlling the supply, storage, and accessibility of inventory items |
| Market Segment | represents a group of consumers with similar characteristics or preferences |
| Model Evaluation | the process of assessing the performance of a machine learning model using various metrics |
| Outliers | data points or observations that significantly deviate from the majority of the data in a dataset |
| Prediction Model | a machine learning model designed to make predictions or forecasts based on input data |
| Team | the project team working on the project |

The terms (columns) related to the dataset are described in detail in the [Data description report](../2-data-understanding/data-description-report.ipynb)

## Project organization

| Name | Role | Company/Community |
|:--|:-:|:-:|
| Aleksandr | Team member | Team 1 |
| Anthony Bäckström | Team member | Team 1 |
|  Iiro | Team member | Team 1 |
|  Iina | Team member | Team 1 |
| Teacher A | Support | JAMK |
| Teacher B | Support | JAMK |
| Employee A | Customer | Company A |

## Project timeline

Preliminary plan for the schedule of the implementation of the steps.

![](./timeline_gantt.png)

**Gate 1**

- 9.11.2023
- Steps 1-3 (Business Understanding, Data Understanding, Data Preparation) must be completed.
- An interim seminar will be held to present the results of the project so far.

**Gate 2**

- 12.12.2023
- Steps 4-6 (Modeling, Evaluation, Deployment) must be completed
- A final seminar will be held where the final results of the project will be presented.