# End report

## Introduction

This is the final section of the steps for this project. In the final report, we present the steps taken during the project and the results of the project. This report also summarizes next steps for the application made during the project.

![](./img/logo.png)

## Project Steps

### Business understanding and Data understanding

In the first two steps, we defined the purpose of the project and the need for this project. The customer's needs were mapped in order to understand what the customer really wants so that the application can be developed in the right direction. We also looked into the material so that we could handle it correctly in the next steps.

A thorough examination of the available data was conducted, resulting in the creation of a comprehensive Data Description Report. This step was essential in determining the cleanliness and adequacy of the data, ensuring its suitability for subsequent analysis.

The first two phases went well and everyone on the team had a clear idea of what the purpose of this project was.

### Data preparation

Each team member was assigned a topic to research. The topics were "Impact of Sales Area on Vehicle's Age, Type, and Price", "Cars That Sell the Worst", "Effect on Price for Vehicles Used as Taxis" and "Identifying Dominant Features for Sales". At this stage, every member of the team gained valuable experience in processing large amounts of data and visualizing that data. At this stage, each team member also got an idea of what information the dataset contains and how we can use the data in the next steps.

### Modeling

In the modeling phase, we tested different models and tried to find out which machine learning model would be the best fit for this data. During this phase, we still had to go back to the preprocessing phase, because in the modeling phase our understanding of the data deepened, and we were able to find out the characteristics that had the most impact on the model. In the modeling phase, we achieved very good results. Optuna parameter optimization was employed to fine-tune the model, ensuring its optimal performance in capturing the underlying patterns in the data. The final model predicts the price of the car with more than 95% accuracy. We are currently satisfied with the accuracy of the model. However, nothing is ever really ready, the model could always be improved.

![](./img/results-model.png)

### Evaluation

The models were rigorously evaluated to determine which one best aligned with the business objectives. The Model Evaluation process provided insights into the effectiveness of the developed models, allowing for informed decisions on model selection.

### Deployment

The project culminated in the deployment phase, where we got to the point where we run the application locally. The application has a front end and a small back end using the Flask framework. Flask connects the trained model to a user interface where the user enters the car's information and then gets a prediction for the car's price. The information that the user enters in the UI has been selected according to what our machine learning model considers to be the most price-influencing features.

During the entire project, we were able to achieve the so-called MVP. From this, it is easy to continue in the future with a more automated process, where, for example, when new data comes in, the model would be automatically trained with the new data.

![](./img/minimum-viable-product-mvp.png)

## Used Technologies

Here are all the technologies we have used during the project.

- **Optuna**: Optuna is a hyperparameter optimization framework. It helps in automating the process of tuning hyperparameters, which were crucial in tuning out LightGBM model.
- **LightGBM**: LightGBM is a gradient boosting framework. It is used for building gradient boosting models, which are powerful algorithms for classification and regression tasks. LightGBM is known for its efficiency and speed, making it suitable for large datasets.
- **Scikit-learn**: Scikit-learn is a machine learning library for Python. It provides a wide range of tools for tasks such as classification, regression, clustering, and more.
- **Python**: Python is a versatile programming language. Python is widely used in machine learning for its simplicity and a rich ecosystem of libraries. It serves as the primary programming language for developing machine learning models, data analysis, and various other tasks in our project.
- **Flask**: Flask is a web framework for Python. Flask is used to deploy machine learning model as web services or APIs. It helps in creating a web interface to interact with the models, making them accessible to users over the internet.
- **HTML**: HTML is a markup language for creating web pages. HTML is used to structure the content of our web pages. HTML is employed to design user interfaces for model deployment, result visualization, and interaction with users.
- **CSS**: CSS (Cascading Style Sheets) is a stylesheet language for describing the presentation of a document written in HTML. CSS is used to style the HTML elements, enhancing the visual presentation of our web pages.

![](./img/tech.png)

## Deployment Process

### Model Deployment

The model is currently executed locally on a computer. Instructions for running the application can be found in the [Deplyment phase repository](https://gitlab.labranet.jamk.fi/AB7766/aida-team1/-/tree/main/6-deployment/flask-app).

Deploying the application to a server would provide a more scalable, accessible, reliable, and collaborative environment, with improved resource utilization and enhanced security, making it a preferable choice for production-ready applications. Deploying the application to server would be the next step in this project. We would choose server like CSC, because its free to use for students and some of the team members have previously used CSC's services for hosting applications.

### Model Upload to Production Server

1. Selecting a Production Server

2. Preparation of the Model

    Model is already prepared. Model is trained and testted thoroughly in the development environment. All needed components (libraries, model, frameworks) should be containerized to ease the deployment.

3. Environment Setup on Production Server

    Required runtime environment should be set up to production server, including installing necessary libraries, frameworks, and dependencies. Production server should match the specifications of the development env.

4. Model Upload to Server

    Packaged model files should be transferred to the production server. This can be done using various methods, such as SCP (Secure Copy Protocol), FTP (File Transfer Protocol), or through version control systems.

5. Model Verification

    Integrity of the transferred model files on the production server should be verified to ensure that there are no data corruption issues during the upload.

6. Server Configuration Updates

    Server configurations should be updated to integrate the model into the production environment. This may involve modifying server settings, environment variables, or system configurations.

7. Testing the Deployed Model

    Initial tests should be conducted to ensure that the model runs correctly in the production environment. This may involve running sample predictions and evaluating performance metrics.

8. Monitoring and Logging

    Monitoring and logging mechanisms should be implemented to track the performance of the deployed model. This includes logging predictions, monitoring resource usage, and setting up alerts for potential issues.

9. Documentation and Versioning

    The deployed model should be documneted, including version information, dependencies, and any specific configurations. Proper version control should be maintained to facilitate future updates and rollbacks.

10. Post-Deployment Testing

    Thorough testing after deployment should be conducted to verify that the model performs as expected in a production setting. This includes testing with real-world data and user scenarios.

## Maintenance plan

A maintenance plan is crucial for ensuring the continued functionality, performance, and security of your application. Here's a rough-level maintenance plan for a Car Price Predictor deployed in a production environment.

1. Regular Monitoring

    - Implement continuous monitoring of the application's performance, including resource usage, response times, and error rates.
    - Set up alerts for critical issues and irregularities in model predictions or system behavior.

2. Data Quality Checks:

    - Regularly assess and validate the quality of incoming data to ensure that it meets the required standards for model training and inference.
    - Implement checks for missing values, outliers, and data inconsistencies.

3. Scheduled Retraining:

    - Maintain a schedule for periodic model retraining to keep it up-to-date with the latest data trends and patterns.
    - Automate the retraining process and ensure that it aligns with the data collection frequency.

4. Backup and Recovery

    - Implement regular backups of critical data, configurations, and model versions.
    - Test and document the recovery process to ensure a quick and reliable restoration in case of data loss or system failures.

5. Documentation Updates

    - Keep all documentation, including user manuals, deployment guides, and system architecture documentation, up-to-date with the latest changes.
    - Document any modifications to the application's configurations or dependencies.

6. Dependency Management

    - Regularly review and update external dependencies, libraries, and frameworks to ensure that the application benefits from the latest features, bug fixes, and security patches.

7. Performance Optimization

    - Periodically assess the application's performance and identify opportunities for optimization, both in terms of code efficiency and resource utilization.
    - Implement optimizations to maintain or improve response times.

8. Scalability Planning

    - Continuously evaluate the application's scalability and plan for future growth.
    - Implement scalability measures proactively to accommodate increasing user loads or data volumes.

9. Feedback Loops

    - Establish feedback loops with end-users and stakeholders to gather insights into their experiences with the application.
    - Use feedback to identify areas for improvement and prioritize feature enhancements.

10. Environment Testing

    - Regularly test the application in a staging environment that mirrors the production setup to catch potential issues before they affect end-users.

11. Continuous Improvement

    - Foster a culture of continuous improvement by regularly assessing the effectiveness of maintenance processes and identifying opportunities for enhancement.

## User Interface

### Current UI

In the current version, the user enters 10 selected car features into the user interface. After this, the user clicks the "Submit" button, after which the predicted car price appears on the screen.

![](./img/front.png)

## User Base

The main user of this application is car dealership employees responsible for determining prices for new cars. The car price predictor could possibly also be used by car shop customers who are interested in how much money they could get for their own car.

## Conclusion

Throughout the AIDA project, Team 1 has diligently followed the CRISP-DM model to tackle the challenge of understanding and leveraging collected data for meaningful insights. The project's primary objective was to develop machine learning models aligned with the customer's needs, with a focus on optimizing the understanding of sales dynamics in the context of vehicle age, type, and price.

In conclusion, AIDA Project Team 1 successfully navigated through the CRISP-DM model, transforming raw data into valuable insights. The implemented machine learning models are poised to contribute meaningfully to the understanding of sales dynamics, thereby empowering the customer with actionable information for informed decision-making. The comprehensive approach taken by the team in each step underscores the commitment to delivering a robust and impactful solution to the client's needs.

## Demo

This is a demo of our application. it is ran locally, and it shows how to input data to form and get prediction of price based on user inputs. 

In [4]:
from IPython.display import HTML

HTML("""
    <video alt="test" controls>
        <source src="./img/demo.mp4" type="video/mp4">
    </video>
""")

## Future Considerations

In the future, the user could be given the option to enter more features of the car, depending on what the user wants to enter. This application could also be integrated into the pages of a car shop either for internal use only, or for the public use also.