# Overview

- There are 333 results in the search for "Machine Learning" in *Accident Analysis and Prevention*
- I downloaded all of them over four days.  On the UL Library website, I can download 100 per day.
- I downloaded the BibTeX citations.  If you set the UL Library website to view 100 results at a time, you can download all 100 citations as one .bib file.  There is no daily limit.
- I "read" 30+ of the articles.
    - I kept all of my notes in the .bib file.
    - I added an *institution* field for the universities, so I can look for active research hubs.
    - If the article had *suggestions for future research*, I added an *addendum* field.
    - Most articles used a local (city, state, province) database, but if they used some popular database, like SHRP2, and it wasn't listed in the citation already (in the keywords or abstract, I put it in the keywords, so I can see which databases are popular.
    - I read the abstracts and created an *annotation* field.
    - About a third of the articles had nothing to do with machine learning, although the article might have been interesting for other reasons.  I put "Not ML" somewhere in the annotation.
    - About half of the articles I flagged for future review by putting "Interesting" somewhere in the annotation.
- I treated the .bib file as a database file, and did what we do with datasets.
    - Who are the prolific authors?
    - What are the most common keywords?
    - Which universities are the research hubs?
    - Which algorithms, metrics, and databases are most mentioned in the keywords and abstracts?
- I also glanced at *Transportation Research: Part C, Emerging Technologies* and ran the same analysis on that dataset.  There are 500 results for a search for "Machine Learning" there.  


# Setup

## Import Libraries

In [1]:
import bibtexparser
import pandas as pd
import numpy as np

## Parse .bib Files, Choose Journal, and Create Pandas Dataframe

In [2]:
with open('Accident_Analysis_and_Prevention.bib') as bibtex_file:
    bib_database = bibtexparser.load(bibtex_file)
AAP = pd.DataFrame(bib_database.entries)

with open('Transportation_Research_Part_C.bib') as bibtex_file:
    bib_database = bibtexparser.load(bibtex_file)
TRC = pd.DataFrame(bib_database.entries)

P = AAP

# Fields in .bib File

In [3]:
for row in P:
    print (row)

abstract
keywords
author
url
doi
issn
year
pages
volume
journal
title
ENTRYTYPE
ID
addendum
annotation
institution
note
number


# Keywords
## Sort Keywords by Frequency

In [4]:
P['keywords'] = P['keywords'].fillna('None')
A = [ x.split(', ') for x in P['keywords'].tolist() ]
B = [item for sublist in A for item in sublist]
C = {x:B.count(x) for x in B}
D = dict(sorted(C.items(), key=lambda item: item[1], reverse=True))
D

{'None': 39,
 'Machine learning': 32,
 'Road safety': 14,
 'Safety': 13,
 'Traffic safety': 9,
 'Deep learning': 9,
 'Automated driving': 7,
 'Crash severity': 7,
 'Data mining': 7,
 'Support vector machine': 7,
 'Connected vehicles': 6,
 'Driving simulator': 6,
 'Fatigue': 6,
 'Driver behavior': 6,
 'Classification': 5,
 'Injury severity': 5,
 'Crash prediction': 5,
 'Drowsiness': 5,
 'Naturalistic driving': 5,
 'Accident analysis': 5,
 'Negative binomial model': 5,
 'Naturalistic driving study': 5,
 'Random forest': 5,
 'Motorcycle': 5,
 'Ergonomics': 5,
 'Driver distraction': 5,
 'Driving behavior': 4,
 'Text mining': 4,
 'Distracted driving': 4,
 'Built environment': 4,
 'Sleep': 4,
 'Traffic conflicts': 4,
 'Neural network': 4,
 'SHRP2': 4,
 'Risk perception': 4,
 'Driver behaviour': 4,
 'Crash': 4,
 'Risk': 4,
 'Accident causation': 4,
 'Police records': 4,
 'Narrative text': 4,
 'Driving': 3,
 'Naturalistic driving data': 3,
 'Spatial analysis': 3,
 'Machine Learning': 3,
 'Deci

# Algorithms

## Create Dictionary of Algorithms

In [5]:
Algorithms = {
    'ANN:  Artificial Neural Network': ['Artificial Neural Network'],
    'Bagging': ['Bagging'],
    'Bayesian': ['Bayesian Logistics Regression', 'Bayes'],
    'Binomial Regression': ['Binomial Regression'],
    'Convex Hull Algorithm': ['Convex Hull'],
    'CNN:  Convolutional Neural Network': ['Convolutional Neural Network', 'CNN'],
    'CIF: Cumulative Incidence Function': ['Cumulative Incidence Function'],
    'Decision Jungle': ['Decision Jungle'],
    'Deep Learning': ['Deep Learning', 'deep-learning'],
    'Dimensionality Reduction': ['Dimensionality Reduction'],
    'Dynamic Bayesian Network': ['Dynamic Bayesian'],
    'Ensemble': ['Ensemble'],
    'Ensemble Tree': ['Ensemble Tree'],
    'Feature Extraction': ['Feature Extraction'],
    'Fuzzy Logic': ['Fuzzy Logic'],
    'Genetic Algorithm': ['Genetic Algorithm', 'Genetic Programming'],
    'Hierarchical': ['Hierarchical'],
    'IGA: Intelligent Genetic Algorithm': ['Intelligent Genetic Algorithm'],
    'Logistic Regression': ['Logistic Regression'],
    'LSTM: Long Short-Term Memory': ['Long Short-term Memory'],
    'Marginal Effect Analysis': ['Marginal Effect Analysis'],
    'MDU: Maximum Dissimilarity Undersampling': ['maximum dissimilarity undersampling'],
    'Mixed Methods': ['Mixed Methods'],
    'Neural Network': ['Neural Network'],
    'Random Forest':['Random Forest'],
    'RSF: Random Survival Forest': ['Random Survival Forest'],
    'Self-Organizing Maps': ['Self-Organizing Maps', 'Self Organizing Maps'],
    'Shapley': ['Shapley'],
    'Statistical Learning': ['Statistical learning'],
    'SMO: Synthetic Minority Oversampling': ['synthetic minority oversampling'],
    't-SNE': ['t-SNE'],
    'VIMP: Variable Importance': ['Variable Importance'],
    'XGBoost':['XGBoost', 'XGB'],

}
    

## Find Mentions of Algorithms in Abstracts or Keywords

In [6]:
for alg in Algorithms:
    P[alg] = P['abstract'].str.contains('|'.join(Algorithms[alg]), case=False) | P['keywords'].str.contains('|'.join(Algorithms[alg]), case=False)

## Count Mentions of Algorithms in Abstracts or Keywords

In [7]:
A = P[Algorithms.keys()].sum()
A.sort_values(ascending=False)

Bayesian                                    45
Neural Network                              33
Random Forest                               28
Logistic Regression                         19
Deep Learning                               16
ANN:  Artificial Neural Network              9
XGBoost                                      9
Hierarchical                                 8
CNN:  Convolutional Neural Network           8
LSTM: Long Short-Term Memory                 8
Genetic Algorithm                            6
Ensemble                                     5
Feature Extraction                           5
SMO: Synthetic Minority Oversampling         4
VIMP: Variable Importance                    4
Statistical Learning                         4
Fuzzy Logic                                  3
Dynamic Bayesian Network                     3
Binomial Regression                          2
Bagging                                      2
Shapley                                      2
t-SNE        

# Analysis Tools

## Create Dictionary of Analysis Tools

In [8]:
Analysis_Tools = {
    'Sensitivity': ['Sensitivity'],
    'Area under Curve': ['Area under Curve'],
    'False Alarm Rate': ['False Alarm Rate'],
    'Accuracy': ['accuracy'],
    'Precision': ['macro-average precision'], 
    'Recall': ['macro-average recall'], 
    'Geometric Mean': ['geometric mean'],
    'Hyperparameters': ['Hyperparameter'],
    'Spearman': ['Spearman'],
    'Aggregated Gain': ['Aggregated Gain'],
    'Time Dependencies': ['Time dependencies'],
    'Temporal': ['Temporal'],
    'Kinematic': ['Kinematic'],
    'Visualization': ['Visualization'],
    'F1 Loss Function': ['F1'],
    'Connected Vehicles': ['Connected Vehicles'],
    'Imbalanced Data': ['Imbalanced Data'],
}

## Find Mentions of Analysis Tools in Abstracts or Keywords

In [9]:
for alg in Analysis_Tools:
    P[alg] = P['abstract'].str.contains('|'.join(Analysis_Tools[alg]), case=False) | P['keywords'].str.contains('|'.join(Analysis_Tools[alg]), case=False)

## Count Mentions of Analysis Tools in Abstracts or Keywords

In [10]:
A = P[Analysis_Tools.keys()].sum()
A.sort_values(ascending=False)

Accuracy              72
Sensitivity           29
Temporal              23
Kinematic             10
Imbalanced Data        9
Connected Vehicles     8
False Alarm Rate       6
Visualization          4
F1 Loss Function       4
Geometric Mean         2
Hyperparameters        2
Aggregated Gain        2
Recall                 1
Area under Curve       1
Time Dependencies      1
Precision              1
Spearman               1
dtype: int64

# Datasets
## Create Dictionary of Datasets

In [11]:
Datasets = {
    'Second Highway Research Program (Data Set)': ['Second Highway Research Program', 'SHRP2'],
    'Virginia 100-car Database': ['Virginia', '100-car', '100 car'],
    'NGSIM Trajectory Data': ['NGSIM'],
    
}

## Find Mentions of Dataset in Abstract and Keywords

In [12]:
for x in Datasets:
    P[x] = P['abstract'].str.contains('|'.join(Datasets[x]), case=False) | P['keywords'].str.contains('|'.join(Datasets[x]), case=False)

## Count Mentions of Datasets in Abstracts and Keywords

In [13]:
A = P[Datasets.keys()].sum()
A.sort_values(ascending=False)

Second Highway Research Program (Data Set)    9
Virginia 100-car Database                     4
NGSIM Trajectory Data                         3
dtype: int64

# Authors

## Sort Authors by Frequency

In [14]:
P['author'] = P['author'].fillna('None')
A = [ x.split(' and ') for x in P['author'].tolist() ]
B = [item for sublist in A for item in sublist]
C = {x:B.count(x) for x in B}
D = dict(sorted(C.items(), key=lambda item: item[1], reverse=True))
D

{'Mohamed Abdel-Aty': 14,
 'Zhibin Li': 7,
 'Junhua Wang': 6,
 'Rongjie Yu': 6,
 'Pan Liu': 6,
 'Asad J. Khattak': 5,
 'Ting Fu': 5,
 'Mohammed Quddus': 5,
 'Jinghui Yuan': 5,
 'Mark King': 5,
 'Chengcheng Xu': 5,
 'Dominique Lord': 4,
 'Helai Huang': 4,
 'Qing Cai': 4,
 'Oscar Oviedo-Trespalacios': 4,
 'None': 4,
 'George Yannis': 3,
 'X. Jessie Yang': 3,
 'Mohamed M. Ahmed': 3,
 'Jie He': 3,
 'Alfonso Montella': 3,
 'Xiaomeng Li': 3,
 'Andry Rakotonirainy': 3,
 'Pei Li': 3,
 'Simon Washington': 3,
 'Zulqarnain H. Khattak': 3,
 'Michael D. Fontaine': 3,
 'Yiik Diew Wong': 3,
 'Xiupeng Shi': 3,
 'Kirolos Haleem': 3,
 'William J. Horrey': 3,
 'Matthias Schlögl': 3,
 'Frederik Naujoks': 3,
 'Priyanka Alluri': 3,
 'Richard Forsyth': 3,
 'Richard Wright': 3,
 'Wei Wang': 3,
 'Amirfarrokh Iranitalab': 2,
 'Eleni I. Vlahogianni': 2,
 'Mahama Yahaya': 2,
 'Runhua Guo': 2,
 'Xinguo Jiang': 2,
 'Kamal Bashir': 2,
 'Shiwei Xu': 2,
 'Behram Wali': 2,
 'Tarek Sayed': 2,
 'Jaeyoung Lee': 2,
 'Tiany

## Who are these Authors?

### Mohamed Abdel-Aty
- U of Central Florida
- Editor in Chief Emeritus of the journal
- PhD from Davis

### Zhibin Li
- Southeast University, Nanjing

### Junhua Wang
- Tongji U, Shanghai

### Rongjie Yu
- Coauthor with Mohamed Abdel-Aty
- Tongji U, Shanghai

### Pan Liu
- Southeast University, Nanjing
- Coauthors:
    - Jie Bao (2)
    - Satish V. Ukkusuri
    - Xiao Qin 
    - Huaguo Zhou
    - Yanyong Guo 
    - Zhibin Li (2)
    - Yao Wu
    - Wei Wang (2)
    - Chengcheng Xu (2)

### Asad J. Khattak
- U of Tennessee



# Institutions
- Note that the Institutions aren't in the database until I manually add them.
## Sort Institutions by Frequency

In [15]:
x = 'institution'
P[x] = P[x].fillna('None')
A = [ x.split(', ') for x in P[x].tolist() ]
B = [item for sublist in A for item in sublist]
C = {x:B.count(x) for x in B}
D = dict(sorted(C.items(), key=lambda item: item[1], reverse=True))
D

{'None': 308,
 'Louisiana State U': 3,
 'Tsinghua U': 2,
 'Tongji U': 2,
 'U of Central Florida': 2,
 'Southeast U': 2,
 'Nanjing': 2,
 'Queensland U of Technology': 2,
 'U of Natural Resources and Life Sciences': 2,
 'Vienna': 2,
 'North Dakota State U': 1,
 'Southwest Jiaotong U': 1,
 'Shanghai': 1,
 'Hefei U of Technology': 1,
 'Changsha U of Technology': 1,
 'Nanyang Technological U': 1,
 'Oak Ridge National Laboratory': 1,
 'Virginia Transportation Research Council': 1,
 'Northwestern U': 1,
 'Shahid Bahonar U': 1,
 'Texas A\\&M U': 1,
 'Nanyang U': 1,
 'City University of Hong Kong': 1,
 'Texas A \\& M U': 1,
 'Federal University of Rio Grade do Sul (Brazil)': 1,
 'Federal Rural University of Semi-Arid (Brazil)': 1,
 'Jiangsu U': 1,
 'Deft U': 1}

# Interesting Articles

In [16]:
P['annotation'] = P['annotation'].fillna('None')
Interesting = P[P['annotation'].str.contains('Interesting', case=False)]
Interesting

Unnamed: 0,abstract,keywords,author,url,doi,issn,year,pages,volume,journal,...,Time Dependencies,Temporal,Kinematic,Visualization,F1 Loss Function,Connected Vehicles,Imbalanced Data,Second Highway Research Program (Data Set),Virginia 100-car Database,NGSIM Trajectory Data
12,Accurate real-time prediction of occupant inju...,"Motor vehicle crashes, Occupant protection, In...",Qingfan Wang and Shun Gan and Wentao Chen and ...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2021.106149,0001-4575,2021,106149,156,Accident Analysis \& Prevention,...,False,False,True,True,False,False,False,False,False,False
71,Highway work zones are most vulnerable roadway...,"Traffic collision/accident severity, Deep lear...",Md Adilur Rahim and Hany M. Hassan,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2021.106090,0001-4575,2021,106090,154,Accident Analysis \& Prevention,...,False,False,False,False,True,False,False,False,False,False
84,Transportation agencies utilize Active traffic...,"Active traffic management, Safety, Injury seve...",Zulqarnain H. Khattak and Michael D. Fontaine,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2020.105544,0001-4575,2020,105544,145,Accident Analysis \& Prevention,...,False,True,False,False,False,False,False,False,False,False
106,"According to NHTSA, more than 3477 people (inc...","Distracted driving, Secondary tasks, Detection...",Osama A. Osman and Mustafa Hajij and Sogand Ka...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2018.12.005,0001-4575,2019,274-281,123,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,True,False,False
124,Determining and understanding the environmenta...,"Adverse weather effects, Imbalanced data, Bina...",Matthias Schlögl,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2019.105398,0001-4575,2020,105398,136,Accident Analysis \& Prevention,...,False,True,False,False,False,False,True,False,False,False
142,This study designs and evaluates a contextual ...,"Drowsiness, Detection, Dynamic Bayesian Networ...",Anthony D. McDonald and John D. Lee and Chris ...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2018.01.005,0001-4575,2018,25-37,113,Accident Analysis \& Prevention,...,True,True,False,False,False,False,False,False,False,False
148,This study designs a framework of feature extr...,"Driving behaviour, Feature learning, XGBoost, ...",Xiupeng Shi and Yiik Diew Wong and Michael Zhi...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2019.05.005,0001-4575,2019,170-179,129,Accident Analysis \& Prevention,...,False,False,False,False,False,False,True,False,False,True
157,Traffic crash detection is a major component o...,"Crash detection, Deep learning methods, Long s...",Feifeng Jiang and Kwok Kit Richard Yuen and Er...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2020.105520,0001-4575,2020,105520,141,Accident Analysis \& Prevention,...,False,True,False,False,False,False,False,False,False,False
161,Safety analysts usually use post-modeling meth...,"Model Selection, Heuristics, Characteristics o...",Mohammadali Shirazi and Soma Sekhar Dhavala an...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2017.07.002,0001-4575,2017,186-194,107,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
188,"In the United States, there are approximately ...","Highway-Rail Grade Crossing Consolidation, Clo...",Samira Soleimani and Saleh R. Mousa and Julius...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2019.04.002,0001-4575,2019,65-77,128,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False


# Not Machine Learning

In [17]:
A = P[P['annotation'].str.contains('Not ML', case=False)]
A

Unnamed: 0,abstract,keywords,author,url,doi,issn,year,pages,volume,journal,...,Time Dependencies,Temporal,Kinematic,Visualization,F1 Loss Function,Connected Vehicles,Imbalanced Data,Second Highway Research Program (Data Set),Virginia 100-car Database,NGSIM Trajectory Data
34,"Given the severe traffic safety issue, tremend...","Driving capability assessment, Longitudinal dr...",Rongjie Yu and Xiaojie Long and Mohammed Quddu...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2020.105779,0001-4575,2020,105779,147,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
57,With the development and maturation of vehicle...,"Freeway safety, Segments with horizontal curva...",Changjian Zhang and Jie He and Mark King and Z...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2020.105911,0001-4575,2021,105911,150,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
81,The autonomous vehicle is regarded as a promis...,"Autonomous vehicles, Driving strategy, Risk ap...",Can Zhao and Li Li and Xin Pei and Zhiheng Li ...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2020.105937,0001-4575,2021,105937,150,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
84,Transportation agencies utilize Active traffic...,"Active traffic management, Safety, Injury seve...",Zulqarnain H. Khattak and Michael D. Fontaine,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2020.105544,0001-4575,2020,105544,145,Accident Analysis \& Prevention,...,False,True,False,False,False,False,False,False,False,False
101,As part of the emerging world of intelligent t...,"Driver behavior, Vehicle trajectories, Connect...",Zihan Hong and Ying Chen and Yang Wu,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2020.105460,0001-4575,2020,105460,139,Accident Analysis \& Prevention,...,False,False,False,False,False,True,False,False,False,False
116,Traditional statistical crash prediction model...,"Bivariate extreme value theory, Video-based ve...",Chen Wang and Chengcheng Xu and Yulu Dai,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2018.12.013,0001-4575,2019,365-373,123,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
130,Mobile phone distracted driving is a major ris...,"Cell phone, Ergonomics, Human-machine interact...",Oscar Oviedo-Trespalacios and Verity Truelove ...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2019.105412,0001-4575,2020,105412,137,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
170,More than one million people die or suffer non...,"Collision risk, Traffic conflicts, variable se...",Miriam Rocha and Michel Anzanello and Felipe C...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2019.105269,0001-4575,2019,105269,132,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
197,Measuring risk is critical for collision avoid...,"Collision avoidance, Deceleration curve, Clust...",Xiaoxia Xiong and Meng Wang and Yingfeng Cai a...,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2019.05.004,0001-4575,2019,30-43,129,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,True,False
239,The study aims at understanding the relationsh...,"Crash severity, Multi-lane roads, Genetic algo...",Abhishek Das and Mohamed Abdel-Aty,https://www.sciencedirect.com/science/article/...,https://doi.org/10.1016/j.aap.2009.09.021,0001-4575,2010,548-557,42,Accident Analysis \& Prevention,...,False,False,False,False,False,False,False,False,False,False
