
# Predicting the Ratings 
---

# Introduction

Using tools of your choice, complete the following tasks. Keep it simple. The goal of the exercise is not to do the deepest EDA, build the best model possible, or spend lots of time building a robust solution. The goal is to demonstrate your ability to complete the end-to-end process in a reasonable timeframe and explain your thought process clearly.

## Dataset

This dataset contains data of over 1,000 products, including their ratings and reviews.

### Features

The dataset includes the following features:

- `product_id`: Product ID
- `product_name`: Name of the Product
- `category`: Category of the Product
- `discounted_price`: Discounted Price of the Product
- `actual_price`: Actual Price of the Product
- `discount_percentage`: Percentage of Discount for the Product
- `rating`: Rating of the Product
- `rating_count`: Number of people who voted for the Amazon rating

This rich dataset offers a comprehensive view of products, covering various aspects like pricing, ratings, and customer feedback, for data analysis and insight.


## Hints

###### Time Management and Simplicity
- Start with simple methods and ensure all necessary components are functioning correctly.

- Use markdown in your Notebook to record assumptions, decisions, and any experimental techniques.

- Explain the creation and transformation of data columns, focusing on how these changes can enhance insights and model performance.

###### Exploratory Data Analysis (EDA)
- Evaluate data completeness and check for missing values.
- Generate visualisations to understand the distributions of variables like ‘category’, ‘actual_price’, ‘discount_price’, ‘discount’, ‘rating’, and ‘rating_count’. Include engineered columns if applicable.
- Investigate correlations between variables and describe the methods used for this analysis.

###### Model Development and Evaluation
- Build a model to predict ‘rating’, explaining the choice of modeling approach.
- Assess the model's accuracy and justify the chosen accuracy metric.

###### Insights, Explanation, and Model Improvement
- Analyse the importance of features in your model and the criteria for evaluating their significance.
- Describe the relationship between key variables and the prediction, including the methods used for this examination.
- Discuss the predictability of ‘rating’ based on available data and provide reasons.
- Suggest methods to enhance model performance.
- Offer a simplified, non-technical summary of your findings and learnings from the analysis.

###### Modeling Methodology
- Provide a concise explanation of your choices in the modeling process, highlighting alignment with analysis objectives.

###### Model Evaluation and Metrics
- Discuss your approach to evaluating model accuracy and the rationale behind the selection of specific evaluation metrics.


## Interview Focus

During the interview, we are most interested in the following:

1. Your understanding of how your model is performing with respect to the business problem.
2. Your ability to explain the assumptions and modeling decisions you have made.
3. Your plan for how you would put this model into production.
---

### Getting started
The data for this interview case is provided in a parquet file that can be loaded directly into a pandas dataframe as shown below. In order for reading parquet file make sure the pyarrow package is installed.

In [1]:
###### Load data to DataFrame
import pandas as pd
df = pd.read_csv('amazon.csv')

In [2]:
df.columns

Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count'],
      dtype='object')