A bar chart (aka bar graph, column chart) plots numeric values for levels of a categorical feature as bars. Levels are plotted on one chart axis, and values are plotted on the other axis. Each categorical value claims one bar, and the length of each bar corresponds to the bar’s value. Bars are plotted on a common baseline to allow for easy comparison of values.

When you should use a bar chart
A bar chart is used when you want to show a distribution of data points or perform a comparison of metric values across different subgroups of your data. From a bar chart, we can see which groups are highest or most common, and how other groups compare against the others. Since this is a fairly common task, bar charts are a fairly ubiquitous chart type.

One of the variable is a categorical variable and the other one is a numerical variable

## Material on how to choose the right data visualization

https://cdn2.hubspot.net/hubfs/392937/How-To-Choose-The-Right-Data-Visualization.pdf

### Here we are using used cars dataset

In [37]:
import pandas as pd
import numpy as np

In [5]:
df = pd.read_csv("D:\cn c++\CAR DETAILS FROM CAR DEKHO.csv")

In [106]:
df.head()
df.shape

(4340, 8)

### Finding unique values in each column

In [16]:
for ele in df.columns:
    print(df[ele].value_counts())
    print()
    print("No of unique elements present in this column", df[ele].nunique())
    print()
    print("No of null elements present in the column", df[ele].isnull().sum())
    print()

Maruti Swift Dzire VDI                     69
Maruti Alto 800 LXI                        59
Maruti Alto LXi                            47
Maruti Alto LX                             35
Hyundai EON Era Plus                       35
                                           ..
Hyundai Verna Transform CRDi VGT SX ABS     1
Maruti S-Presso VXI Plus                    1
Toyota Etios Liva 1.2 VX                    1
Toyota Yaris G                              1
Hyundai i20 Magna 1.4 CRDi                  1
Name: name, Length: 1491, dtype: int64

No of unique elements present in this column 1491

No of null elements present in the column 0

2017    466
2015    421
2012    415
2013    386
2014    367
2018    366
2016    357
2011    271
2010    234
2019    195
2009    193
2008    145
2007    134
2006    110
2005     85
2020     48
2004     42
2003     23
2002     21
2001     20
1998     12
2000     12
1999     10
1997      3
1996      2
1995      1
1992      1
Name: year, dtype: int64

No of un

As we can see that there are no null values present in the column

In [85]:
car_models = pd.DataFrame(df["name"].value_counts() )
car_models

Unnamed: 0,name
Maruti Swift Dzire VDI,69
Maruti Alto 800 LXI,59
Maruti Alto LXi,47
Maruti Alto LX,35
Hyundai EON Era Plus,35
...,...
Hyundai Verna Transform CRDi VGT SX ABS,1
Maruti S-Presso VXI Plus,1
Toyota Etios Liva 1.2 VX,1
Toyota Yaris G,1


In [86]:
car_models["car_name"] = car_models.index
car_models.set_index(np.arange(1491), inplace = True)
car_models

Unnamed: 0,name,car_name
0,69,Maruti Swift Dzire VDI
1,59,Maruti Alto 800 LXI
2,47,Maruti Alto LXi
3,35,Maruti Alto LX
4,35,Hyundai EON Era Plus
...,...,...
1486,1,Hyundai Verna Transform CRDi VGT SX ABS
1487,1,Maruti S-Presso VXI Plus
1488,1,Toyota Etios Liva 1.2 VX
1489,1,Toyota Yaris G


In [88]:
# Renaming a column
car_models.rename(columns = {'name':'total_count'}, inplace = True)
car_models.head()

Unnamed: 0,total_count,car_name
0,69,Maruti Swift Dzire VDI
1,59,Maruti Alto 800 LXI
2,47,Maruti Alto LXi
3,35,Maruti Alto LX
4,35,Hyundai EON Era Plus


### Conclusion

As we can see that there are 695 models which have only 1 car listed and 319 models which have only 2 car listed so we want to group them in to a single category called exotic

In [100]:
li = car_models.car_name[(car_models.total_count <= 2)].values

In [101]:
li

array(['Ford Fiesta 1.6 ZXi Leather', 'Honda Amaze E i-VTEC',
       'Ford Figo Trend', ..., 'Toyota Etios Liva 1.2 VX',
       'Toyota Yaris G', 'Hyundai i20 Magna 1.4 CRDi'], dtype=object)

In [107]:
df.name = [df.name[i] == 'exotic' if df.name[i] in li for i in range(4340)]

SyntaxError: invalid syntax (Temp/ipykernel_7208/348077226.py, line 1)

In [132]:
df.name = df['name'].replace('others','exotic')
df.tail()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
4335,Hyundai i20 Magna 1.4 CRDi (Diesel),2014,409999,80000,Diesel,Individual,Manual,Second Owner
4336,exotic,2014,409999,80000,Diesel,Individual,Manual,Second Owner
4337,Maruti 800 AC BSIII,2009,110000,83000,Petrol,Individual,Manual,Second Owner
4338,Hyundai Creta 1.6 CRDi SX Option,2016,865000,90000,Diesel,Individual,Manual,First Owner
4339,Renault KWID RXT,2016,225000,40000,Petrol,Individual,Manual,First Owner


In [114]:
from matplotlib import pyplot as plt
import seaborn as sns

In [138]:
df.name[df.name == 'exotic'].count()

1333

In [139]:
df.shape

(4340, 8)

In [162]:
new_df = pd.DataFrame(df[['name']].value_counts().head(20))

new_df[0].replace(column = {0:'no_of_cars'})

TypeError: replace() got an unexpected keyword argument 'column'

## I want to show the histogram of the top 10 categories