## Overview

### Create a Time-Series Model Using the Craigslist Vehicles Dataset available on [Kaggle](https://www.kaggle.com/datasets/mbaabuharun/craigslist-vehicles) following the approach outlined below
.
Here are the key steps

* Start by addressing missing values in the dataset. You can handle this by filling in missing values with the median for numerical columns and the mode for categorical columns.
* Ensure that the data types of the columns are appropriate. Specifically, make sure to convert the 'posting_date' column to a datetime data type.
* Utilize the 'posting_date' column to create a datetime index for the dataset. This will facilitate the analysis of temporal patterns.
* With clean data, explore it using various visualizations and statistical analysis techniques. This step is crucial for understanding temporal patterns, identifying seasonal trends, and analyzing demand-supply dynamics by region and vehicle type.
* Build the time-series chart.
* Finally, create a GitHub Repository and push your work there, also document your process through each of the steps and demonstrate your understanding by implementing them on the dataset.

In [1]:
#import relevant libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#load dataset
data = pd.read_csv("craigslist_vehicles.csv")
data.head(3)

Unnamed: 0.1,Unnamed: 0,id,url,region,region_url,price,year,manufacturer,model,condition,...,type,paint_color,image_url,description,county,state,lat,long,posting_date,removal_date
0,362773,7307679724,https://abilene.craigslist.org/ctd/d/abilene-2...,abilene,https://abilene.craigslist.org,4500,2002.0,bmw,x5,,...,,,https://images.craigslist.org/00m0m_iba78h8ty9...,"$4,500 Cash 2002 BMW X5 8 cylinder 4.4L moto...",,tx,32.401556,-99.884713,2021-04-16 00:00:00+00:00,2021-05-02 00:00:00+00:00
1,362712,7311833696,https://abilene.craigslist.org/ctd/d/abilene-2...,abilene,https://abilene.craigslist.org,4500,2002.0,bmw,x5,,...,,,https://images.craigslist.org/00m0m_iba78h8ty9...,"$4,500 Cash 2002 BMW X5 8 cylinder 4.4L moto...",,tx,32.401556,-99.884713,2021-04-24 00:00:00+00:00,2021-04-28 00:00:00+00:00
2,362722,7311441996,https://abilene.craigslist.org/ctd/d/abilene-2...,abilene,https://abilene.craigslist.org,4900,2006.0,toyota,camry,excellent,...,sedan,silver,https://images.craigslist.org/00808_5FkOw2aGjA...,2006 TOYOTA CAMRY LE Sedan Ready To Upgrade ...,,tx,32.453848,-99.7879,2021-04-23 00:00:00+00:00,2021-05-25 00:00:00+00:00


In [3]:
#checking  missing values in the dataset
data.isnull().sum()

Unnamed: 0           0
id                   0
url                  0
region               0
region_url           0
price                0
year              1205
manufacturer     17646
model             5277
condition       174104
cylinders       177678
fuel              3013
odometer          4400
title_status      8242
transmission      2556
VIN             161042
drive           130567
size            306361
type             92858
paint_color     130203
image_url           68
description         70
county          426880
state                0
lat               6549
long              6549
posting_date        68
removal_date        68
dtype: int64

The number of missing values is extreme for dropping thus,they'll be replaced with zero

In [4]:
#replace null values with 0
data.fillna(0, inplace=True)

In [5]:
#confirm
data.isnull().sum()

Unnamed: 0      0
id              0
url             0
region          0
region_url      0
price           0
year            0
manufacturer    0
model           0
condition       0
cylinders       0
fuel            0
odometer        0
title_status    0
transmission    0
VIN             0
drive           0
size            0
type            0
paint_color     0
image_url       0
description     0
county          0
state           0
lat             0
long            0
posting_date    0
removal_date    0
dtype: int64

In [6]:
#check data types
data.dtypes

Unnamed: 0        int64
id                int64
url              object
region           object
region_url       object
price             int64
year            float64
manufacturer     object
model            object
condition        object
cylinders        object
fuel             object
odometer        float64
title_status     object
transmission     object
VIN              object
drive            object
size             object
type             object
paint_color      object
image_url        object
description      object
county          float64
state            object
lat             float64
long            float64
posting_date     object
removal_date     object
dtype: object

In [None]:
#make sure to convert the 'posting_date' column to a datetime data type.
def custom_date_parser(date_str):
    try:
        return pd.to_datetime(date_str, format='%Y-%m-%d %H:%M:%S%z')
    except:
        return pd.NaT

data['posting_date'] = data['posting_date'].apply(custom_date_parser)

In [None]:
#confirmation
data["posting_date"]

In [None]:
#Utilize the 'posting_date' column to create a datetime index for the dataset. This will facilitate the analysis of temporal patterns.
data['posting_date'] = pd.to_datetime(data['posting_date'], format='%Y-%m-%d %H:%M:%S%z')
data.set_index('posting_date', inplace=True)

### EDA 

In [None]:
#explore it using various visualizations and statistical analysis techniques. 
#understanding temporal patterns, identifying seasonal trends, and analyzing demand-supply dynamics by region and vehicle type.
# plotting a sns.barplot:
fig, ax1 = plt.subplots(figsize=(10, 8))

x = data['price'].values
y = data['model']

sns.barplot(x=x, y=y, ax=ax1)

# Labeling the plot
ax1.set_title('Correlation between Price and Model', fontsize=16)

plt.show()
