# In New Zealand, what kind of cars do thieves prefer

## Overview¶
## In this paper, we use statistical analysis and pyecharts visualization to process the New Zealand car theft data, 
## and combine it with relevant actual information to make a valuable analysis

In [None]:
### Import data

In [2]:
pip install pyecharts

Collecting pyecharts
  Downloading pyecharts-1.9.1-py3-none-any.whl (135 kB)
Collecting simplejson
  Downloading simplejson-3.17.6-cp39-cp39-win_amd64.whl (75 kB)
Collecting prettytable
  Downloading prettytable-3.3.0-py3-none-any.whl (26 kB)
Installing collected packages: simplejson, prettytable, pyecharts
Successfully installed prettytable-3.3.0 pyecharts-1.9.1 simplejson-3.17.6
Note: you may need to restart the kernel to use updated packages.


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
from pyecharts import options as opts
from pyecharts.charts import Bar
from pyecharts.charts import Pie, Line
# 导入数据
data = pd.read_csv("stolenvehicles.csv")

In [42]:
data.head(5)

Unnamed: 0,Color,VehicleModel,VehicleDesc,ModelYear,VehicleType,DateStolen,Location,DayOfWeek
0,Silver,Trailer,BST2021D,2021,Trailer,5/11/2021,Waitemata,Tuesday
1,Silver,Trailer,OUTBACK BOATS FT470,2021,Boat Trailer,13/12/2021,Eastern,Monday
2,Silver,Trailer,ASD JETSKI,2021,Boat Trailer,13/02/2022,Auckland City,Sunday
3,Silver,Trailer,MSC 7X4,2021,Trailer,13/11/2021,Central,Saturday
4,Silver,Trailer,D-MAX 8X5,2018,Trailer,10/01/2022,Waitemata,Saturday


In [None]:
### Content analysis

In [None]:
#### Silver, white and black vehicles are the top three victim vehicles
According to statistics, in all the vehicles stolen by thieves, silver, white, black vehicles 
to take the top three,  accounting for more than 61%. 
On the one hand, and our daily life in these three colors of the vehicle base (you see the 
vehicle on the road is basically the three colors), on the other hand, also with the thief 
security considerations, in order to not easy to be found, the thief will tend to steal more 
common car, and like pink, yellow, orange, such a relatively small number of brighter colors 
of the car, stolen more easily found by the police.

In [4]:
# Stolen car colors
data['Color'].value_counts()

Silver    1272
White      934
Black      588
Blue       512
Red        389
Grey       378
Green      224
Gold        77
Brown       49
Yellow      39
Orange      35
Purple      26
Cream        9
Pink         4
Name: Color, dtype: int64

In [5]:
sum(data['Color'].value_counts()[:3])/sum(data['Color'].value_counts())

0.6159611992945326

#### Monday was the highest chance of stolen cars, car thieves prefer weekdays
According to the data, in New Zealand, on Mondays, cars are the most likely to be stolen. 
While, Saturdays are the safest.
Therefore, when traveling by car on Monday, New Zealanders should pay more attention to car 
protection to avoid encountering thieves. In addition, on weekdays, the percentage of 
stolen cars are account for 73.91% of all data, and on weekends, the percentage of 
stolen cars are account for 26.09%.
This may be due to the fact that New Zealanders are more likely to drive to and from work on 
weekdays, so there are more cars on the road, so on weekdays, there are more opportunities 
for thieves to get their hands on them.

In [9]:
data['DayOfWeek'] = pd.to_datetime(data['DateStolen']).dt.day_name()

In [40]:
data['DayOfWeek'].value_counts().tolist()

[753, 693, 668, 634, 627, 617, 561]

In [29]:
total = sum(data['DayOfWeek'].value_counts())
weekdays_num = sum(data['DayOfWeek'].value_counts()[[0,1,2,3,5]])
weekends_num = sum(data['DayOfWeek'].value_counts()[[4,6]])
            
print("Weekdays: {}, Weekends: {}".format(weekdays_num/total, weekends_num/total))                                         

Weekdays: 0.7390731385899407, Weekends: 0.2609268614100593


In [65]:
x = data['DayOfWeek'].value_counts().keys()
y = data['DayOfWeek'].value_counts().tolist()

c = (
    Pie()
    .add(
        "",
        [list(z) for z in zip(x, y)],
        center=["35%", "50%"],
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Pie chart of the numbers in a week", pos_left='left'),
        legend_opts=opts.LegendOpts(pos_bottom='1%'),
    )
    .set_series_opts(
                    tooltip_opts=opts.TooltipOpts(
            trigger="item", formatter="{a} <br/>{b}: {c} ({d}%)"
        ))
)
c.render_notebook()

### New Zealand society has a security risk as the number of stolen cars increases month by month

#According to the data, from October last year to March this year, the monthly stolen car
data has an upward trend. March reached the highest, a 38% increase, compared with October 
last year, the growth rate even up to 126%, and the average number of stolen cars ride 25 
cars / day. Behind the car thieves more and more rampant shows that there are security risks 
in New Zealand society. The New Zealand government should develop relevant policies to 
strengthen the punishment of car theft to protect people's property security. At the same time
, the people of New Zealand should also sound the alarm, in ordinary travel need to pay more 
attention to the safety of car prevention.

In [43]:
data['DateStolen'] = pd.to_datetime(data['DateStolen'])

data['month'] = data['DateStolen'].apply(lambda x: x.month)
data['month'].value_counts()

3     827
1     668
2     638
12    549
11    481
10    480
5     193
4     192
7     142
8     136
9     126
6     121
Name: month, dtype: int64

In [44]:
df = data['month'].value_counts()
c = (
    Bar()
    .add_xaxis(list(df.index)[:-1][::-1])
    .add_yaxis("月份", list(df)[:-1][::-1])
)
c.render_notebook()

In [45]:
# The average number of cars stolen per day
mean = len(data) / len(data['DateStolen'].unique())
mean

25.016483516483518

## Stationwagon models became the favorite of thieves, accounting for 27.09 percent

### Among all the stolen cars, the five models of stationwagon, saloon, hatchback, 
trailer and utility accounted for 77.05% of the total data, among which the stationwagon 
model became the favorite of thieves, accounting for 27.09%. Therefore, it is suggested that 
automotive companies can enhance the investment in research and development of anti-theft 
functions when producing stationwagon models, and introduce higher level of value-added 
services of anti-theft systems.

In [46]:
# Stolen car Type
data['VehicleType'].value_counts()

Stationwagon               945
Saloon                     851
Hatchback                  644
Trailer                    582
Utility                    466
Roadbike                   297
Moped                      187
Light Van                  154
Boat Trailer               105
Trailer - Heavy             90
Caravan                     44
Other Truck                 42
Sports Car                  40
Flat Deck Truck             17
Mobile Home - Light         15
Convertible                 12
Cab and Chassis Only         8
Heavy Van                    7
Light Bus                    6
All Terrain Vehicle          5
Tractor                      4
Trail Bike                   2
Mobile Machine               2
Special Purpose Vehicle      1
Articulated Truck            1
Name: VehicleType, dtype: int64

In [68]:
vehicle_type = data['VehicleType'].value_counts()
vehi_1 = vehicle_type[vehicle_type >= 90]
vehi_2 = vehicle_type[vehicle_type < 90]   #   sum = 206
vehi_1 = vehi_1.append(pd.Series({'Other': 206}, name= 'VehicleType'))
c = (
    Pie()
    .add(
        "",
        [list(z) for z in zip(vehi_1.index, list(vehi_1))],
        center=["35%", "50%"],
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Vehicle Type"),
        legend_opts=opts.LegendOpts(pos_bottom="0.1%"),
    )
    .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}"),
                    tooltip_opts=opts.TooltipOpts(
            trigger="item", formatter="{a} <br/>{b}: {c} ({d}%)"
        ))
)
c.render_notebook()

### Thieves prefer old cars?

#Through the image we understand that the production date of the stolen vehicles are mostly 
concentrated between 1995-2020, but we can see the number of stolen vehicle models produced 
in 1995-2010, significantly more than those produced after 2010. So is the thief prefers to 
steal the model relatively older cars?
I learned from the information that New Zealand is a country with a very high rate of car 
ownership per capita, but most people drive old cars, and not a few of them are more than 
10 years old. Therefore, it is not that thieves love old cars, but in New Zealand are more 
often older models of vehicles.

In [69]:
# Stolen car Years
data['ModelYear'].value_counts()

2005    346
2006    333
2007    251
2004    238
2008    190
       ... 
1962      1
1943      1
1957      1
1940      1
1965      1
Name: ModelYear, Length: 64, dtype: int64

In [70]:
data['ModelYear'].unique()

array([2021, 2018, 2005, 2001, 2020, 2004, 2007, 2014, 2002, 2000, 2015,
       2017, 1998, 2003, 1995, 1999, 1997, 1983, 2008, 2006, 1969, 2011,
       1984, 1990, 1996, 1985, 2022, 1977, 1980, 2019, 1967, 1971, 1989,
       1994, 1962, 1972, 2016, 2012,    0, 1963, 1981, 1979, 1976, 1987,
       2009, 1940, 1960, 2013, 1993, 2010, 1986, 1970, 1974, 1975, 1992,
       1982, 1968, 1973, 1988, 1991, 1943, 1957, 1978, 1965], dtype=int64)

In [71]:
# deop outlier
frame = data['ModelYear'].drop(data[data['ModelYear'] == 0].index, axis= 0)

In [72]:
# For better analysis, I set a time interval with 10 years as a boundary
histogram = np.histogram(data['ModelYear'].values, bins= np.arange(frame.min(), frame.max(), 5))

In [73]:
# combine index
index = [f'{histogram[1][i]}-{histogram[1][i+1]}' for i in range(len(histogram[1]) - 1)]
frame = pd.DataFrame(np.array([index, histogram[0]]).T, columns= ['hist', 'num'])
frame

Unnamed: 0,hist,num
0,1940-1945,2
1,1945-1950,0
2,1950-1955,0
3,1955-1960,1
4,1960-1965,7
5,1965-1970,7
6,1970-1975,16
7,1975-1980,26
8,1980-1985,31
9,1985-1990,64


In [74]:
c = (
    Bar()
    .add_xaxis(list(frame['hist']))
    .add_yaxis("", list(frame['num']))
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Stolen vehicle production year bar chart"),
        datazoom_opts=[opts.DataZoomOpts(), opts.DataZoomOpts(type_="inside")],
    )
)
c.render_notebook()