# Find Percentage of Rating of Car Regarding Safety 

## 1.0 Introduction

I found the data sources from:

- https://www.kaggle.com/elikplim/car-evaluation-data-set![image.png](attachment:image.png)

- https://data.world/uci/car-evaluation![image.png](attachment:image.png)

I address two in my analysis:

- Does there exist a high percentage of cars that cost high, medium and low values that rate as high, medium regarding safety?
- Considering high, medium and low 'buying price' of cars, what percentage of 2, 3, 4 seat cars rate high, medium and low concerning safety?

## 2.0 Synopsis

I:

1. Loaded the csv file.
2. Found the percentage of cars safety that rate as high, low and med for each buying price category (v.high, high, med, low).
3. Found the percentage of cars safety that rate as high, low and med for two, three and four seater cars.
4. Produced multi subplots of a graph using matplotlib.
5. Produced subplot for each buying category.  Each subplot contains a bar chart of percentage of cars that rate as high, low and med.
6. Filtered only rows of buying price of high, medium and low values.
7. Produced a second multi subplots of a graph using matplotlib.
8. Took the filtered rows and produced a subplot for each car seat category of two, three or four.  Each subplot contains a bar chart of percentage of cars that rate as high, low and med.


## 3.0 Processing

In [15]:
#aim is to get a list of codes
#step 1 - import required modules
import pandas as pd
import unittest
import numpy as np



#step 2 - create a function to find the percentage of cars that rate as high, med, low for each 'buying.price' of a car

def percent(df):
    
    if df.empty==True:
            return None
    else:
             #step 3 - produce a series - do a first group by of buying.price and safety
            count = df.groupby('buying.price')['safety'].value_counts()
            
            #step 4 - produce a second series - do a second group by, 
            #i.e. for each 2nd group of 'buying.price' iterate and calculate 100 * safety's count divided by 'buying.price subtotal'
            count_pcts = count.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))
            return count_pcts.reset_index(name='percent').set_index("buying.price")

#step 3 - create a function to find the percentage of cars that rate as high, med, low for each 'number.persons' of a car
def percent_persons(df):
    
    if df.empty==True:
            return None
    else:
             #step 3 - produce a series - do a first group by of buying.price and safety
            count = df.groupby('number.persons')['safety'].value_counts()
            
            #step 4 - produce a second series - do a second group by, 
            #i.e. for each 2nd group of 'buying.price' iterate and calculate 100 * safety's count divided by 'buying.price subtotal'
            count_pcts = count.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))
            return count_pcts.reset_index(name='percent').set_index("number.persons")
'''
step 4 - read csv file
and create names 'buying.price', 'maintaince.cost', 'number.doors', 'number.persons', 'lug.boot', 'safety', 'decision' 
with no header
'''

file_car = pd.read_csv("car_evaluation.csv", header=None, names=['buying.price', 'maintaince.cost', 'number.doors', 'number.persons', 'lug.boot', 'safety', 'decision'])


'''
step 5 - found the percentage of cars that rate as high, med, low for each 'number.persons' of a car
'''


percent_persons(file_car)

Unnamed: 0_level_0,safety,percent
number.persons,Unnamed: 1_level_1,Unnamed: 2_level_1
2,high,33.333333
2,low,33.333333
2,med,33.333333
4,high,33.333333
4,low,33.333333
4,med,33.333333
more,high,33.333333
more,low,33.333333
more,med,33.333333


In [16]:


class TestNotebook(unittest.TestCase):
    
    def test_percent_null(self):
        '''
        test an empty data frame
        '''
        #step 1 - test when empty data frame is returned
        
        self.assertEqual(percent_persons(pd.DataFrame()), None)
    
    def test_percent_one_row(self):
        '''
        test one row of data frame for 'number.persons' equals 2.
        '''
        #step 2 - create predicted data frame
        predicted_df =pd.DataFrame({'safety':['low'], 'percent':100.0, 'number.persons':['2']}).set_index('number.persons')

        #step 3 - switch column order of predicted data frame
        cols = ['safety', 'percent']

        predicted_df = predicted_df[cols]
        
        

        
        #step 4 - test actual equals predicted data frame
        test_df =percent_persons(file_car.head(n=1))
        self.assertEqual(test_df.equals(predicted_df), True)
    
    def test_percent_two(self):
        '''
        test 10 rows of data frame for 'number.persons' equals 2
        '''
        
        #step 5 - create predicted data frame 
        predict=pd.DataFrame({'safety':['low', 'high', 'med'],'percent':np.array([40.0, 30.0, 30.0]),
                                      'number.persons':['2', '2', '2']}).set_index('number.persons')

        #step 6 - switch column order of predicted data frame
        cols = ['safety', 'percent']
        
        predict=predict[cols]
        #step 7 - test actual equals predicted data frame
        test_df=percent_persons(file_car[file_car['number.persons']=='2'].head(n=10))
        
        self.assertEqual(test_df.equals(predict), True)
        
    def test_percent_two_four(self):
        '''
        test 20 rows.
        test cases of 'buying.price' of 'high' & 'med'
        '''
        #step 8 - create predict data frame
        predict=pd.DataFrame({'safety':['low', 'high', 'med','low', 'high', 'med'], 'percent':np.array([40.0, 30.0, 30.0, 40.0, 30.0, 30.0]), 
                                      'number.persons':['2', '2', '2', '4', '4', '4']}).set_index('number.persons')
        #step 9 - switch column order of predicted data frame
        cols = ['safety', 'percent']

        predict = predict[cols]
        #step 10 - test actual equals predict data frame
        test_df = percent_persons(file_car[file_car['number.persons']=='2'].head(n=10).append(file_car[file_car['number.persons']=='4'].head(n=10)))
        
        self.assertEqual(test_df.equals(predict), True)

unittest.main(argv=[''], verbosity=2, exit=False)
    
    

test_percent_null (__main__.TestNotebook) ... ok
test_percent_one_row (__main__.TestNotebook) ... ok
test_percent_two (__main__.TestNotebook) ... ok
test_percent_two_four (__main__.TestNotebook) ... ok

----------------------------------------------------------------------
Ran 4 tests in 0.128s

OK


<unittest.main.TestProgram at 0x7b5ed50>