**Univariate Exploratory Data Analysis (EDA)** is the simplest form of analyzing one variable at a time. It's main purpose is to explore data through summary statistics and visual charts, and graphs to (1) finds patterns in the data, and (2) make better decisions regarding data pre-processing tasks.

In [None]:
#import libraries
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

In [None]:
# loading data set as Pandas dataframe
df = pd.read_csv("./datasets/automobile.csv")
df.head()

In [None]:
df.dtypes

# Data Cleaning

In [None]:
# Find out the number of values which are not numeric
df['price'].str.isnumeric().value_counts()

# List out the values which are not numeric
df['price'].loc[df['price'].str.isnumeric() == False]

#Setting the missing value to mean of price and convert the datatype to integer
price = df['price'].loc[df['price'] != '?']
price_mean = price.astype(int).mean()
df['price'] = df['price'].replace('?', price_mean).astype(int)
df['price'].head()


In [None]:
print( "Mean: {:.2f}".format(df['price'].mean()))
print( "Median: {:.2f}".format(df['price'].median()))
print( "Std: {:.2f}".format(df['price'].std()))
print( "Var: {:.2f}".format(df['price'].var()))
print( "Quantiles: \n",df['price'].quantile([0.25,0.5,0.75]))

In [None]:
# Cleaning the horsepower field
df['horsepower'].str.isnumeric().value_counts()
horsepower = df['horsepower'].loc[df['horsepower'] != '?']
hp_mean = horsepower.astype(int).mean()
df['horsepower'] = df['horsepower'].replace('?',hp_mean).astype(int)
df['horsepower'].head()

In [None]:
# Cleaning the Normalized losses field
df[df['normalized-losses']=='?'].count()
nl=df['normalized-losses'].loc[df['normalized-losses'] !='?'].count()
nmean=nl.astype(int).mean()
df['normalized-losses'] = df['normalized-losses'].replace('?',nmean).astype(int)
df['normalized-losses'].head()

Now computing the Measure of central tendency of the values in column height. Remember taking only a single column of the data set we are making a univariate analysis.

In [None]:
#calculate mean, median and mode of dat set height
mean = df["height"].mean()
median =df["height"].median()
mode = df["height"].mode()
print(mean , median, mode)

# Data Visualization
Now let's visualize this analysis in graph.

In [None]:
#distribution plot
df.height.hist()

From the above graph, we can observe that the hight of most cars ranges from 53 to 57.

In [None]:
df.price.hist()

From the above graph, we can say that the price ranges from 5,000 to 45,000 but the price for most cars ranges between 5,000 to 10,000.

The box plot is also effective visual representation of statical measures like median and quartiles in univariate analysis.

In [None]:
# Calculate skewness
skewness = df.price.skew()
if skewness > 0:
    print("Positive Skewness: {:.2f}".format(skewness))
elif skewness < 0:  
    print("Negative Skewness: {:.2f}".format(skewness))
else:
    print("No Skewness")
    

In [None]:
# Simple poxplot example

# Generating random numbers from a normal distribution with 
# mean 100 and standard deviation 20
np.random.seed(10)
data = np.random.normal(100, 20, 200)
plt.figure(figsize =(10, 7))
# Creating plot
plt.boxplot(data)

In [None]:
#boxplot for price of cars
sns.boxplot(data=df, x="price")

The right border of the box is Q3 and the left border of the box is Q1. Whiskers extend from both sides of the box boundaries toward ± IQR × 1.5. Any values beyond these limits are marked as outliers (using points).

In [None]:
# Simple lineplot example
months = [1, 2, 3, 4, 5, 6, 7]
sales = [74, 75.8, 80, 85, 90.5, 60, 70.5]

sales_df = pd.DataFrame({"Month": months, "Sales": sales})
sns.lineplot(x = "Month", y = "Sales", data=sales_df, marker='o')

In [None]:
# Simple lineplot example
# Sample monthly sales data
months = pd.date_range(start='2023-01-01', end='2023-12-01', freq='MS')
sales = [1000, 1200, 1100, 1300, 1400, 1500, 1600, 1800, 1700, 1200, 1800, 2000]

# Create a DataFrame
df_sales = pd.DataFrame({'Month': months, 'Sales': sales})
sns.lineplot(x = "Month", y = "Sales", data=df_sales, marker='o')

# Plotting the LinePlot
#plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()

In this example, the LinePlot visualizes the monthly sales trend for the retail store in 2023. We can observe any fluctuations or any seasonal patterns in sales over the course of the year, helping stakeholders understand the performance of the business over time. e.g. upward trend with growth in sales except a drop in the month of October.