## Exploratory Data Analysis

- **Return unique values in series**
    ```python
    for col in df.columns:
    print(col)
    print(df[col].unique()[:5])
    print(df[col].nunique())
    print()
    ```

- **Visualize the dataset (plots)**
    ```python
    sns.histplot(df.msrp[df.msrp < 100_000], bins=50)
    ```
  - We can see that we have a long tail distribution. This is not good for machine learning models because:
      - Long tail distributions can cause models to be biased towards the more frequent values, leading to poor generalization.
      - To mitigate this issue, we apply a logarithm scale, which can help in normalizing the distribution. Applying a logarithm transformation compresses the range of values and reduces the skewness, making the distribution more normal-like. This can improve the performance and stability of many machine learning algorithms.
    ```python
    price_logs = np.log1p(df.msrp)
    sns.histplot(price_logs, bins=50)
    ```

- **Check for null values**
    ```python
    df.isnull().sum()
    ```
