In [2]:
import pandas as pd
from cleverminer import cleverminer
from datetime import datetime

Automatically reordering numeric categories ...done
Cleverminer version  1.0.10


In [3]:
df = pd.read_csv('../../Traffic_Violations_2023.csv', encoding='cp1250', sep=',')

df['Weekday'] = (df['Date Of Stop'].apply(lambda x: str(datetime.strptime(x, '%m/%d/%Y').weekday()+1)) + ' ' +
                  df['Date Of Stop'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y').strftime("%a")))

def extract_characters(text):
    if len(text) == 8:
        return int(text[:2])
    else:
        return "error"

df['Time Of Day'] = pd.cut(df['Time Of Stop'].apply(extract_characters), bins = [-float('inf'),5,11,17,float('inf')], labels = ['d) Night', 'a) Morning', 'b) Afternoon', 'c) Evening'])

df=df[df['Accident']=='Yes']
df = df[['SubAgency', 'Belts',
       'Personal Injury', 'Property Damage', 'Fatal', 'Commercial License',
       'Alcohol', 'Work Zone', 'VehicleType', 'Violation Type',
       'Race', 'Gender', 'Driver State', 'Arrest Type', 'Weekday', 'Time Of Day']]

In [4]:
df['Time Of Day'].value_counts()

Time Of Day
c) Evening      826
d) Night        742
b) Afternoon    661
a) Morning      507
Name: count, dtype: int64

In [5]:
df['Weekday'].value_counts()

Weekday
6 Sat    554
7 Sun    469
4 Thu    392
5 Fri    367
2 Tue    340
3 Wed    325
1 Mon    289
Name: count, dtype: int64

In [8]:
# vic jak 50% nočních nehod, při které se někdo zraní a nejsou zapnuté pásy, se stane v noci ze soboty na neděli
# tj. rule 1, podobne asi i rule 2 a 5
clm = cleverminer(df=df,target='Weekday',proc='CFMiner',
               quantifiers= {'RelMax':0.5, 'Base':50},
               cond ={
                    'attributes':[
                        {'name': 'Time Of Day', 'type': 'seq', 'minlen': 1, 'maxlen': 2},
                        {'name': 'Belts', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'Personal Injury', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'Property Damage', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'SubAgency', 'type': 'subset', 'minlen': 1, 'maxlen': 1}
                    ], 'minlen':1, 'maxlen':3, 'type':'con'}
               )

 
clm.print_summary()
clm.print_rulelist()
clm.print_rule(1)

Cleverminer version 1.0.10.
Starting data preparation ...
Automatically reordering numeric categories ...
Encoding columns into bit-form...
Encoding columns into bit-form...done
Data preparation finished.
Will go for  CFMiner
Starting to mine rules.


  0%|                                                    |Elapsed Time: 0:00:00
100%|####################################################|Elapsed Time: 0:00:00
Done. Total verifications : 381, rules 6, times: prep 0.04sec, processing 0.03sec

CleverMiner task processing summary:

Task type : CFMiner
Number of verifications : 381
Number of rules : 6
Total time needed : 00h 00m 00s
Time of data preparation : 00h 00m 00s
Time of rule mining : 00h 00m 00s


List of rules:
RULEID BASE  S_UP  S_DOWN Condition
     1    81     2     1 Time Of Day(d) Night) & Belts(No) & Personal Injury(Yes)
     2    60     2     1 Time Of Day(d) Night) & Property Damage(Yes) & SubAgency(3rd District, Silver Spring)
     3    53     1     1 Time Of Day(d) Night a) Morning) & Belts(Yes) & SubAgency(Headquarters and Special Operations)
     4    63     1     1 Time Of Day(d) Night a) Morning) & Property Damage(No) & SubAgency(6th District, Gaithersburg / Montgomery Village)
     5    60     4     1 Time Of Day(

In [11]:
# v Rockville Police District je priblizne stejny pocet nehod nezavisle na dni v tydnu (rule 16)
# odpoledních nehod je cca stejně nz na dni v tydnu (rule 6)
# nehod bez škody na majetku je cca stejně nz na dni v týdnu (rule 15)
clm = cleverminer(df=df,target='Weekday',proc='CFMiner',
               quantifiers= {'RelMax_leq':0.18, 'RelMin':0.1, 'Base':50},
               cond ={
                    'attributes':[
                        {'name': 'Time Of Day', 'type': 'seq', 'minlen': 1, 'maxlen': 2},
                        {'name': 'Belts', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'Personal Injury', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'Property Damage', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'SubAgency', 'type': 'subset', 'minlen': 1, 'maxlen': 1}
                    ], 'minlen':1, 'maxlen':3, 'type':'con'}
               )

 
clm.print_summary()
clm.print_rulelist()
clm.print_rule(16)

Cleverminer version 1.0.10.
Starting data preparation ...
Automatically reordering numeric categories ...
Encoding columns into bit-form...
Encoding columns into bit-form...done
Data preparation finished.
Will go for  CFMiner
Starting to mine rules.
  0%|                                                    |Elapsed Time: 0:00:00
100%|####################################################|Elapsed Time: 0:00:00
Done. Total verifications : 381, rules 16, times: prep 0.05sec, processing 0.03sec

CleverMiner task processing summary:

Task type : CFMiner
Number of verifications : 381
Number of rules : 16
Total time needed : 00h 00m 00s
Time of data preparation : 00h 00m 00s
Time of rule mining : 00h 00m 00s


List of rules:
RULEID BASE  S_UP  S_DOWN Condition
     1   786     2     1 Time Of Day(a) Morning b) Afternoon) & Belts(No) & Property Damage(No)
     2    78     1     2 Time Of Day(a) Morning b) Afternoon) & Belts(No) & SubAgency(1st District, Rockville)
     3   587     2     2 Time Of

In [12]:
# 75% patečních a sobotních nehod v policejnim okrsku Wheaton zahrnujicich zraneni osob se stane vecer, tj v rozmezi 18:00 - 23:59 hod
clm = cleverminer(df=df,target='Time Of Day',proc='CFMiner',
               quantifiers= {'RelMax':0.75, 'Base':50},
               cond ={
                    'attributes':[
                        {'name': 'Weekday', 'type': 'seq', 'minlen': 1, 'maxlen': 2},
                        {'name': 'Belts', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'Personal Injury', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'Property Damage', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'SubAgency', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                    ], 'minlen':1, 'maxlen':3, 'type':'con'}
               )

 
clm.print_summary()
clm.print_rulelist()
clm.print_rule(5)

Cleverminer version 1.0.10.
Starting data preparation ...
Automatically reordering numeric categories ...
Encoding columns into bit-form...
Encoding columns into bit-form...done
Data preparation finished.
Will go for  CFMiner
Starting to mine rules.
  0%|                                                    |Elapsed Time: 0:00:00
 13%|#######                                             |Elapsed Time: 0:00:00
100%|####################################################|Elapsed Time: 0:00:00
Done. Total verifications : 492, rules 3, times: prep 0.05sec, processing 0.10sec

CleverMiner task processing summary:

Task type : CFMiner
Number of verifications : 492
Number of rules : 3
Total time needed : 00h 00m 00s
Time of data preparation : 00h 00m 00s
Time of rule mining : 00h 00m 00s


List of rules:
RULEID BASE  S_UP  S_DOWN Condition
     1    56     2     1 Weekday(5 Fri 6 Sat) & Personal Injury(Yes) & SubAgency(4th District, Wheaton)
     2    53     1     1 Weekday(7 Sun) & Belts(No) & Sub

In [15]:
# v policejnim okrsku Bethesda je priblizne stene vysoky pocet nehod nezavisle na denni dobe (rule 4), utery az ctvrtek je priblizne stejny pocet nehod pri zapnutych pasech nezavisle na denni dobe (rule 1)
clm = cleverminer(df=df,target='Time Of Day',proc='CFMiner',
               quantifiers= {'RelMax_leq':0.28, 'RelMin':0.22, 'Base':50},
               cond ={
                    'attributes':[
                        {'name': 'Weekday', 'type': 'seq', 'minlen': 1, 'maxlen': 3},
                        {'name': 'Belts', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                        {'name': 'SubAgency', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                    ], 'minlen':1, 'maxlen':2, 'type':'con'}
               )

 
clm.print_summary()
clm.print_rulelist()
clm.print_rule(1)

Cleverminer version 1.0.10.
Starting data preparation ...
Automatically reordering numeric categories ...
Encoding columns into bit-form...
Encoding columns into bit-form...done
Data preparation finished.
Will go for  CFMiner
Starting to mine rules.


  0%|                                                    |Elapsed Time: 0:00:00
100%|####################################################|Elapsed Time: 0:00:00
Done. Total verifications : 155, rules 4, times: prep 0.04sec, processing 0.02sec

CleverMiner task processing summary:

Task type : CFMiner
Number of verifications : 155
Number of rules : 4
Total time needed : 00h 00m 00s
Time of data preparation : 00h 00m 00s
Time of rule mining : 00h 00m 00s


List of rules:
RULEID BASE  S_UP  S_DOWN Condition
     1    54     1     1 Weekday(2 Tue 3 Wed 4 Thu) & Belts(Yes)
     2   204     2     1 Weekday(4 Thu 5 Fri 6 Sat) & SubAgency(2nd District, Bethesda)
     3   506     1     1 Belts(No) & SubAgency(2nd District, Bethesda)
     4   515     2     1 SubAgency(2nd District, Bethesda)



Rule id : 1

Base :    54  Relative base : 0.020  Steps UP (consecutive) :     1  Steps DOWN (consecutive) :     1  Steps UP (any) :     1  Steps DOWN (any) :     2  Histogram maximum :    15  Histogram mi

In [191]:
df['SubAgency'].value_counts()

SubAgency
4th District, Wheaton                              630
3rd District, Silver Spring                        555
2nd District, Bethesda                             515
Headquarters and Special Operations                413
1st District, Rockville                            220
5th District, Germantown                           203
6th District, Gaithersburg / Montgomery Village    200
Name: count, dtype: int64

In [18]:
# (3) v patek v noci se více jak 60% nehod stane  ve 3. okrsku = Silver Spring
# (1) středy/čtvrtky ráno se více než 50% nehod s poškozením majetku stane ve 2. okrsku = Bethesda 
# (5) pá/so/ne večer se více než 50% nehod se zraněním osob stane ve 4. okrsku = Wheaton

# (8) v so v noci řešilo více než 50% nehod s poškozením majetku Velitelství a zvláštní operace.......
clm = cleverminer(df=df,target='SubAgency',proc='CFMiner',
               quantifiers= {'RelMax':0.5, 'Base':50},
               cond ={
                    'attributes':[
                        {'name': 'Weekday', 'type': 'seq', 'minlen': 1, 'maxlen': 3},
                        {'name': 'Personal Injury', 'type': 'one', 'value':'Yes'},
                        {'name': 'Property Damage', 'type': 'one', 'value':'Yes'},
                        {'name': 'Time Of Day', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
                    ], 'minlen':1, 'maxlen':3, 'type':'con'}
               )

 
clm.print_summary()
clm.print_rulelist()
clm.print_rule(7)

Cleverminer version 1.0.10.
Starting data preparation ...
Automatically reordering numeric categories ...
Encoding columns into bit-form...
Encoding columns into bit-form...done
Data preparation finished.
Will go for  CFMiner
Starting to mine rules.
  0%|                                                    |Elapsed Time: 0:00:00
100%|####################################################|Elapsed Time: 0:00:00
Done. Total verifications : 191, rules 8, times: prep 0.05sec, processing 0.02sec

CleverMiner task processing summary:

Task type : CFMiner
Number of verifications : 191
Number of rules : 8
Total time needed : 00h 00m 00s
Time of data preparation : 00h 00m 00s
Time of rule mining : 00h 00m 00s


List of rules:
RULEID BASE  S_UP  S_DOWN Condition
     1    59     1     2 Weekday(3 Wed 4 Thu) & Property Damage(Yes) & Time Of Day(a) Morning)
     2    65     1     2 Weekday(4 Thu 5 Fri) & Property Damage(Yes) & Time Of Day(d) Night)
     3    78     1     2 Weekday(5 Fri) & Time Of Day