#### High level:
This notebook shows all the inconsistencies of field that were produced with dictionaries (and have `hebrew` in the name) with their respective numeric values for the `markers_hebrew` table.

The specific analysis below is based on data from `2019-11-16_views_and_main_tables` folder from Nov 16, 2019 that can be found here: https://drive.google.com/drive/folders/1StZkyR7KG_cfPpk8xMj5es3HGkIA00C9?usp=sharing 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()
pd.options.display.max_rows = 200
pd.options.display.max_columns = 100

In [2]:
involved_raw = pd.read_csv('../../views_and_main_tables_2019_11/involved_hebrew.csv')
i_all = involved_raw[involved_raw['accident_year'] < 2019]

  interactivity=interactivity, compiler=compiler, result=result)


In [9]:
i_all.info(null_counts=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1703597 entries, 0 to 1728510
Data columns (total 49 columns):
accident_id                     1703597 non-null int64
provider_and_id                 1703597 non-null int64
provider_code                   1703597 non-null int64
file_type_police                130428 non-null float64
involved_type                   1703597 non-null int64
involved_type_hebrew            1703597 non-null object
license_acquiring_date          1703597 non-null int64
age_group                       1703597 non-null int64
age_group_hebrew                1703597 non-null object
sex                             1703585 non-null float64
sex_hebrew                      1344620 non-null object
vehicle_type                    1611002 non-null float64
vehicle_type_hebrew             1611002 non-null object
safety_measures                 1703597 non-null int64
safety_measures_hebrew          1703555 non-null object
involve_yishuv_symbol           1703597 non-null int

### Helper functions

In [5]:
def calc_diff_counts_hebrew(data, feat_name):
    data = data[(data[feat_name].isnull() == False) & (data[feat_name + '_hebrew'].isnull() == False)]
    print(f'Shape of data: {data.shape}')
    return data[feat_name].value_counts().reset_index(drop=True) - \
           data[feat_name + '_hebrew'].value_counts().reset_index(drop=True)

In [6]:
def merge_with_hebrew(data, feat_name):
    nums_df = data[feat_name].value_counts().reset_index()
    nums_df.columns = ['index_' + feat_name, 'count']

    hebrew_df = data[feat_name + '_hebrew'].value_counts().reset_index()
    hebrew_df.columns = ['index_' + feat_name + '_hebrew', 'count']

    return pd.merge(nums_df, hebrew_df, how='outer', on='count')

In [106]:
def merge_with_hebrew_print_split_years(data, feat_name):
    merged = merge_with_hebrew(data, feat_name)

    for null_heb in merged[merged['index_' + feat_name + '_hebrew'].isnull()]['index_' + feat_name]:
        print(f'{feat_name} {null_heb}:')
        val_counts = data[data[feat_name] == null_heb][feat_name + '_hebrew'].value_counts()
        print(val_counts)
        print(f'Total: {val_counts.sum()}')
        for type_h in val_counts.index:
            print(f"Years {type_h}: {data[data[feat_name + '_hebrew'] == type_h]['accident_year'].unique()}")
        print('')

## involved_type and involved_type_hebrew

In [52]:
# involved_type                   1703597 non-null int64
# involved_type_hebrew            1703597 non-null object

**Null conclusion** - no nulls

**Specific values mistmatch investigations:**

In [54]:
calc_diff_counts_hebrew(i_all, 'involved_type')

Shape of data: (1703597, 49)


0    0
1    0
2    0
dtype: int64

**Specific values conclusions:** no issues

## age_group and age_group_hebrew

In [None]:
# age_group                       1703597 non-null int64
# age_group_hebrew                1703597 non-null object

**Null conclusion** - no nulls

**Specific values mistmatch investigations:**

In [56]:
merge_with_hebrew(i_all, 'age_group')

Unnamed: 0,index_age_group,count,index_age_group_hebrew
0,99.0,363145,לא ידוע
1,6.0,173064,25-29
2,5.0,159854,20-24
3,7.0,154253,30-34
4,8.0,136046,35-39
5,9.0,116188,40-44
6,10.0,96823,45-49
7,11.0,84808,50-54
8,4.0,78643,15-19
9,12.0,76132,55-59


In [58]:
33566 + 2514, 31873 + 2304

(36080, 34177)

In [60]:
i_all[i_all['age_group_hebrew'] == '05-ספטמבר']['accident_year'].value_counts()

2009    1270
2008    1244
Name: accident_year, dtype: int64

In [61]:
i_all[i_all['age_group_hebrew'] == 'אוקטובר-14']['accident_year'].value_counts()

2009    1183
2008    1121
Name: accident_year, dtype: int64

**Specific values conclusions:** 
- `age_group` 2 is split up into `age_group_hebrew` == `05-09` which is correct, and during years 2008-2009 `age_group_hebrew` == `05-ספטמבר` which is a mistake
- `age_group` 3 is split up into `age_group_hebrew` == `10-14` which is correct, and during years 2008-2009 `age_group_hebrew` == `אוקטובר-14` which is a mistake

### sex and sex_hebrew

In [15]:
# sex                             1703585 non-null float64
# sex_hebrew                      1344620 non-null object
i_all[(i_all['sex'].isnull() == False) & (i_all['sex_hebrew'].isnull() == True)].shape

(358965, 49)

In [16]:
i_all[(i_all['sex'].isnull() == False) & (i_all['sex_hebrew'].isnull() == True)].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,358965.0,358965.0,358965.0,26822.0,358965.0,358965.0,358965.0,358965.0,351077.0,358965.0,358965.0,358965.0,358965.0,358965.0,358965.0,3180.0,358965.0,3180.0,3161.0,3180.0,22.0,22.0,22.0,22.0,42.0,358965.0,358965.0,358965.0,358965.0
mean,2012720000.0,31131850000.0,2.911913,2.891805,1.095678,39.97271,98.881924,0.0,5.643038,4.990339,39.382291,0.180341,0.188896,7.835145,1.014455,3.93239,98.487022,412.734591,12.036381,16.476101,1.681818,1.772727,1.272727,1.318182,1.02381,1.55879,2.34985,2012.679387,6.47876
std,3169954.0,4103690000.0,0.410383,0.452428,0.396055,279.662212,3.309393,0.0,6.685705,0.19376,486.357558,0.711583,0.964801,1.139931,0.205139,1.63156,5.638342,162.910011,30.163011,6.366325,0.476731,1.066004,0.7025,0.646335,0.154303,0.665372,0.931835,3.164852,3.424976
min,2008000000.0,12008000000.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,11.0,111.0,0.0,9.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2008.0,1.0
25%,2010027000.0,32010000000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,3.0,99.0,311.0,0.0,13.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2010.0,4.0
50%,2012062000.0,32012040000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,4.0,99.0,441.0,0.0,14.0,2.0,1.0,1.0,1.0,1.0,2.0,2.0,2012.0,6.0
75%,2015094000.0,32015080000.0,3.0,3.0,1.0,0.0,99.0,0.0,11.0,5.0,0.0,0.0,0.0,8.0,1.0,5.0,99.0,511.0,0.0,16.0,2.0,2.0,1.0,1.0,1.0,2.0,3.0,2015.0,9.0
max,2018100000.0,32018100000.0,3.0,3.0,3.0,2017.0,99.0,0.0,25.0,5.0,9800.0,3.0,9.0,9.0,4.0,7.0,99.0,770.0,99.0,51.0,2.0,4.0,3.0,3.0,2.0,20.0,58.0,2018.0,12.0


In [14]:
i_all['sex'].value_counts()

1.0    829823
2.0    514797
0.0    358965
Name: sex, dtype: int64

**Conclusion:** all `sex` = 0 spread over different years is missing

**Specific values mistmatch investigations:**

In [50]:
calc_diff_counts_hebrew(i_all, 'sex')

Shape of data: (1344620, 49)


0    0
1    0
dtype: int64

**Specific values conclusions:** no issues

## vehicle_type and vehicle_type_hebrew

In [None]:
# vehicle_type                    1611002 non-null float64
# vehicle_type_hebrew             1611002 non-null object

**Null conclusion** - no nulls

**Specific values mistmatch investigations:**

In [63]:
merge_with_hebrew(i_all, 'vehicle_type')

Unnamed: 0,index_vehicle_type,count,index_vehicle_type_hebrew
0,1.0,1084888,רכב נוסעים פרטי
1,17.0,95099,אחר ולא ידוע
2,2.0,79623,
3,11.0,76383,אוטובוס
4,9.0,47586,
5,12.0,45268,מונית
6,3.0,43267,
7,10.0,33010,
8,15.0,18915,אופניים
9,4.0,18743,


In [107]:
merge_with_hebrew_print_split_years(i_all, 'vehicle_type')

vehicle_type 2.0:
משא עד 3.5 טון - אחוד (טרנזיט)    76288
משא עד 4 טון - אחוד (טרנזיט)       3335
Name: vehicle_type_hebrew, dtype: int64
Total: 79623
Years משא עד 3.5 טון - אחוד (טרנזיט): [2011 2009 2010 2012 2013 2015 2014 2016 2017 2018 2008]
Years משא עד 4 טון - אחוד (טרנזיט): [2008]

vehicle_type 9.0:
אופנוע 51 עד 125 סמ"ק    45255
אופנוע 51 עד 250 סמ"ק     2331
Name: vehicle_type_hebrew, dtype: int64
Total: 47586
Years אופנוע 51 עד 125 סמ"ק: [2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2008]
Years אופנוע 51 עד 250 סמ"ק: [2008]

vehicle_type 3.0:
משא עד 3.5  טון - לא אחוד (טנדר)    41406
משא עד 4 טון - לא אחוד (טנדר)        1861
Name: vehicle_type_hebrew, dtype: int64
Total: 43267
Years משא עד 3.5  טון - לא אחוד (טנדר): [2011 2012 2009 2010 2013 2015 2016 2014 2017 2018 2008]
Years משא עד 4 טון - לא אחוד (טנדר): [2008]

vehicle_type 10.0:
אופנוע 126 עד 400 סמ"ק    32629
אופנוע 251 עד 500 סמ"ק      381
Name: vehicle_type_hebrew, dtype: int64
Total: 33010
Years אופנוע 126 עד 4

**Specific values conclusions:** see summary above, in year 2008, some categories were split a bit differently.  There was a change in the middle of 2008

## safety_measures and safety_measures_hebrew

In [19]:
# safety_measures                 1703597 non-null int64
# safety_measures_hebrew          1703555 non-null object
i_all[(i_all['safety_measures'].isnull() == False) & (i_all['safety_measures_hebrew'].isnull() == True)].shape

(42, 49)

In [20]:
i_all[(i_all['safety_measures'].isnull() == False) & (i_all['safety_measures_hebrew'].isnull() == True)].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,42.0,42.0,42.0,5.0,42.0,42.0,42.0,42.0,42.0,42.0,42.0,42.0,42.0,42.0,42.0,0.0,42.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,42.0,42.0,42.0,42.0
mean,2016458000.0,29159310000.0,2.714286,2.6,2.047619,0.0,99.0,0.02381,15.0,0.0,0.0,2.97619,7.095238,1.333333,1.0,,99.0,,,,,,,,,1.857143,2.190476,2016.404762,7.619048
std,967526.5,7083342000.0,0.708338,0.894427,0.21554,0.0,0.0,0.154303,5.391954,0.0,0.0,0.154303,2.093072,1.508782,0.0,,0.0,,,,,,,,,0.472225,0.833391,0.964227,3.526571
min,2015013000.0,12015020000.0,1.0,1.0,2.0,0.0,99.0,0.0,1.0,0.0,0.0,2.0,2.0,1.0,1.0,,99.0,,,,,,,,,1.0,1.0,2015.0,1.0
25%,2016026000.0,32015050000.0,3.0,3.0,2.0,0.0,99.0,0.0,17.0,0.0,0.0,3.0,8.0,1.0,1.0,,99.0,,,,,,,,,2.0,2.0,2016.0,5.25
50%,2016555000.0,32016080000.0,3.0,3.0,2.0,0.0,99.0,0.0,17.0,0.0,0.0,3.0,8.0,1.0,1.0,,99.0,,,,,,,,,2.0,2.0,2016.5,8.0
75%,2017064000.0,32017050000.0,3.0,3.0,2.0,0.0,99.0,0.0,17.0,0.0,0.0,3.0,8.0,1.0,1.0,,99.0,,,,,,,,,2.0,2.75,2017.0,11.0
max,2018096000.0,32018100000.0,3.0,3.0,3.0,0.0,99.0,1.0,23.0,0.0,0.0,3.0,9.0,8.0,1.0,,99.0,,,,,,,,,3.0,4.0,2018.0,12.0


**Nulls Conclusion:** all `safety_measures` == 0 are missing.  

**Specific values mistmatch investigations:**

In [75]:
calc_diff_counts_hebrew(i_all, 'safety_measures')

Shape of data: (1703555, 49)


0    0
1    0
2    0
3    0
4    0
dtype: int64

**Specific values conclusions:** no issues

## involve_yishuv_symbol and involve_yishuv_name (not related to hebrew!!)

In [76]:
# involve_yishuv_symbol           1703597 non-null int64
# involve_yishuv_name             1326561 non-null object

In [77]:
i_all[(i_all['involve_yishuv_symbol'].isnull() == False) & (i_all['involve_yishuv_name'].isnull() == True)].shape

(377036, 49)

In [78]:
i_all[(i_all['involve_yishuv_symbol'].isnull() == False) & (i_all['involve_yishuv_name'].isnull() == True)].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,377036.0,377036.0,377036.0,29043.0,377036.0,377036.0,377036.0,377036.0,368395.0,377036.0,377036.0,377036.0,377036.0,377036.0,377036.0,1988.0,377036.0,1988.0,58.0,1988.0,802.0,802.0,802.0,936.0,318.0,377036.0,377036.0,377036.0,377036.0
mean,2012770000.0,30689130000.0,2.867636,2.823228,1.143716,88.887631,94.219605,0.071144,5.618021,4.93177,17.872315,0.27452,0.291256,7.672278,1.059151,6.946177,98.866901,737.630282,24.87931,37.406942,1.735661,2.133416,1.389027,1.481838,1.003145,1.56397,2.376699,2012.728697,6.492648
std,3171548.0,4971799000.0,0.497201,0.567721,0.474313,411.946684,20.395185,0.309429,6.667865,0.510528,248.357578,0.861604,1.151732,1.529906,0.394227,0.335188,1.855691,43.585082,3.797865,10.888782,0.441256,1.485983,0.892907,0.795503,0.056077,0.670726,1.023745,3.166381,3.423356
min,2008000000.0,12008000000.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,51.0,512.0,24.0,9.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2008.0,1.0
25%,2010031000.0,32009070000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,7.0,99.0,730.0,24.0,29.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2010.0,4.0
50%,2013005000.0,32012040000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,7.0,99.0,740.0,24.0,29.0,2.0,2.0,1.0,1.0,1.0,2.0,2.0,2013.0,6.0
75%,2015100000.0,32015070000.0,3.0,3.0,1.0,0.0,99.0,0.0,11.0,5.0,0.0,0.0,0.0,8.0,1.0,7.0,99.0,770.0,24.0,49.0,2.0,3.0,1.0,2.0,1.0,2.0,3.0,2015.0,9.0
max,2018100000.0,32018100000.0,3.0,3.0,3.0,2018.0,99.0,2.0,25.0,5.0,5800.0,3.0,9.0,9.0,4.0,8.0,99.0,870.0,41.0,59.0,2.0,9.0,5.0,3.0,2.0,20.0,58.0,2018.0,12.0


In [84]:
i_all[(i_all['involve_yishuv_symbol'].isnull() == False) & (i_all['involve_yishuv_name'].isnull() == True)]['involve_yishuv_symbol'].nunique()

297

In [83]:
i_all[(i_all['involve_yishuv_symbol'].isnull() == False) & (i_all['involve_yishuv_name'].isnull() == True)]['involve_yishuv_symbol'].value_counts()

0       375048
3400       225
3700        80
3200        76
3900        70
3800        52
3441        50
3600        37
3677        35
1049        34
3740        26
3415        26
3005        25
3100        25
3500        24
3690        24
3186        23
3181        23
3631        20
3063        18
3305        15
3673        14
3098        14
3803        14
3919        14
3044        14
3060        13
3067        13
3447        13
3300        12
3080        12
3626        12
3073        12
883         12
3921        11
3302        11
3296        11
3522        11
3448        11
3292        11
3336        10
3086        10
3304        10
3918        10
3672        10
3309        10
3621        10
3291         9
3697         9
3134         9
3805         9
3771         9
3843         9
3442         9
3551         9
3189         9
3041         9
3735         9
3691         9
3534         8
3541         8
3732         8
3634         8
3301         8
3666         8
3412         8
3496      

**Null conclusion**: there are 297 different values of `involve_yishuv_symbol` that don't have `involve_yishuv_name`, value 0 is a very large percentage of it, but there are many others.

**Specific values conclusions:** not done due to too many issues Null values

 ## injury_severity and injury_severity_hebrew

In [21]:
# injury_severity                 1703597 non-null int64
# injury_severity_hebrew          1064910 non-null object
i_all[(i_all['injury_severity'].isnull() == False) & (i_all['injury_severity_hebrew'].isnull() == True)].shape

(638687, 49)

In [22]:
i_all[(i_all['injury_severity'].isnull() == False) & (i_all['injury_severity_hebrew'].isnull() == True)].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,638687.0,638687.0,638687.0,50646.0,638687.0,638687.0,638687.0,638682.0,638687.0,638687.0,638687.0,638687.0,638687.0,638687.0,638687.0,297053.0,638687.0,297053.0,295346.0,297053.0,489.0,489.0,489.0,490.0,0.0,638687.0,638687.0,638687.0,638687.0
mean,2012783000.0,28143810000.0,2.613103,2.604115,1.0,927.320598,56.977097,0.578552,4.557132,4.769609,2070.354336,0.0,0.0,7.999987,1.12947,3.90361,72.056326,412.836948,17.232334,17.243792,1.601227,1.511247,1.167689,1.42449,,1.524219,2.032788,2012.741941,6.431329
std,3193059.0,7900030000.0,0.790003,0.796905,0.0,994.52039,44.920332,0.676016,5.865761,0.927011,3049.492682,0.0,0.0,0.01001,0.397525,1.657728,31.022415,165.565809,35.634647,6.759922,0.490147,1.061807,0.604372,0.788084,,0.652318,0.992106,3.188179,3.419179
min,2008000000.0,12008000000.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,11.0,111.0,0.0,9.0,1.0,1.0,1.0,1.0,,1.0,1.0,2008.0,1.0
25%,2010030000.0,32008050000.0,3.0,3.0,1.0,0.0,9.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,3.0,44.0,311.0,0.0,13.0,1.0,1.0,1.0,1.0,,1.0,1.0,2010.0,3.0
50%,2013008000.0,32011050000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,4.0,99.0,431.0,0.0,15.0,2.0,1.0,1.0,1.0,,1.0,2.0,2013.0,6.0
75%,2016003000.0,32015030000.0,3.0,3.0,1.0,1994.0,99.0,1.0,6.0,5.0,3616.0,0.0,0.0,8.0,1.0,5.0,99.0,513.0,0.0,17.0,2.0,2.0,1.0,1.0,,2.0,2.0,2016.0,9.0
max,2018100000.0,32018100000.0,3.0,3.0,1.0,4444.0,99.0,2.0,25.0,5.0,9800.0,0.0,0.0,8.0,4.0,8.0,99.0,999.0,99.0,59.0,2.0,9.0,5.0,3.0,,28.0,77.0,2018.0,12.0


**Null Conclusion:** `injury_severity` == 0 spread over numerous years is missing the translation

**Specific values mistmatch investigations:**

In [86]:
calc_diff_counts_hebrew(i_all, 'injury_severity')

Shape of data: (1064910, 49)


0    0
1    0
2    0
dtype: int64

**Specific values Conclusion:** no issues

## injured_type and injured_type_hebrew

In [23]:
# injured_type                    1703597 non-null int64
# injured_type_hebrew             1064908 non-null object
i_all[(i_all['injured_type'].isnull() == False) & (i_all['injured_type_hebrew'].isnull() == True)].shape

(638689, 49)

In [24]:
i_all[(i_all['injured_type'].isnull() == False) & (i_all['injured_type_hebrew'].isnull() == True)].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,638689.0,638689.0,638689.0,50646.0,638689.0,638689.0,638689.0,638684.0,638689.0,638689.0,638689.0,638689.0,638689.0,638689.0,638689.0,297055.0,638689.0,297055.0,295348.0,297055.0,489.0,489.0,489.0,490.0,0.0,638689.0,638689.0,638689.0,638689.0
mean,2012783000.0,28143830000.0,2.613104,2.604115,1.000005,927.32082,56.97694,0.578551,4.557121,4.769609,2070.36445,9e-06,0.0,7.999987,1.129467,3.90361,72.056234,412.83704,17.232218,17.24378,1.601227,1.511247,1.167689,1.42449,,1.524218,2.032786,2012.741929,6.431337
std,3193061.0,7900021000.0,0.790003,0.796905,0.002798,994.520409,44.920349,0.676016,5.865755,0.92701,3049.501894,0.005309,0.0,0.01001,0.397529,1.657722,31.022411,165.565256,35.634555,6.759901,0.490147,1.061807,0.604372,0.788084,,0.652318,0.992105,3.188181,3.419177
min,2008000000.0,12008000000.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,11.0,111.0,0.0,9.0,1.0,1.0,1.0,1.0,,1.0,1.0,2008.0,1.0
25%,2010030000.0,32008050000.0,3.0,3.0,1.0,0.0,9.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,3.0,44.0,311.0,0.0,13.0,1.0,1.0,1.0,1.0,,1.0,1.0,2010.0,3.0
50%,2013008000.0,32011050000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,4.0,99.0,431.0,0.0,15.0,2.0,1.0,1.0,1.0,,1.0,2.0,2013.0,6.0
75%,2016003000.0,32015030000.0,3.0,3.0,1.0,1994.0,99.0,1.0,6.0,5.0,3616.0,0.0,0.0,8.0,1.0,5.0,99.0,513.0,0.0,17.0,2.0,2.0,1.0,1.0,,2.0,2.0,2016.0,9.0
max,2018100000.0,32018100000.0,3.0,3.0,3.0,4444.0,99.0,2.0,25.0,5.0,9800.0,3.0,0.0,8.0,4.0,8.0,99.0,999.0,99.0,59.0,2.0,9.0,5.0,3.0,,28.0,77.0,2018.0,12.0


In [25]:
i_all[i_all['injured_type'] == 0].shape

(638689, 49)

**Null conclusion:** all `injured_type` == 0 have dictionary issues, spread over numerous years

**Specific values mistmatch investigations:**

In [87]:
calc_diff_counts_hebrew(i_all, 'injured_type')

Shape of data: (1064908, 49)


0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
dtype: int64

**Specific values conclusions:** no issues

## injured_position and injured_position_hebrew

In [26]:
# injured_position                1703597 non-null int64
# injured_position_hebrew         1703596 non-null object

In [28]:
i_all[(i_all['injured_position'].isnull() == False) & (i_all['injured_position_hebrew'].isnull() == True)]

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,involved_type_hebrew,license_acquiring_date,age_group,age_group_hebrew,sex,sex_hebrew,vehicle_type,vehicle_type_hebrew,safety_measures,safety_measures_hebrew,involve_yishuv_symbol,involve_yishuv_name,injury_severity,injury_severity_hebrew,injured_type,injured_type_hebrew,injured_position,injured_position_hebrew,population_type,population_type_hebrew,home_region,home_region_hebrew,home_district,home_district_hebrew,home_natural_area,home_natural_area_hebrew,home_municipal_status,home_municipal_status_hebrew,home_yishuv_shape,home_yishuv_shape_hebrew,hospital_time,hospital_time_hebrew,medical_type,medical_type_hebrew,release_dest,release_dest_hebrew,safety_measures_use,safety_measures_use_hebrew,late_deceased,late_deceased_hebrew,car_id,involve_id,accident_year,accident_month
1211284,2014008632,32014008632,3,,1,נהג,0,99,לא ידוע,0.0,,1.0,רכב נוסעים פרטי,5,לא ידוע,0,,0,,0,,0,,1,יהודים,,,99,,,,,,,,,,,,,,,,,,2,2,2014,1


In [29]:
i_all[i_all['injured_position'] == 0].shape

(1, 49)

**Null conclusion:** all `injured_position` == 0 (1 instance) have dictionary issues in year 2014

**Specific values mistmatch investigations:**

In [88]:
calc_diff_counts_hebrew(i_all, 'injured_position')

Shape of data: (1703596, 49)


0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
dtype: int64

**Specific values conclusions:** no issues

## population_type and population_type_hebrew

In [89]:
# population_type                 1703597 non-null int64
# population_type_hebrew          1703597 non-null object

**Null conclusion:** no issues

**Specific values investigation:**

In [91]:
calc_diff_counts_hebrew(i_all, 'population_type')

Shape of data: (1703597, 49)


0    0
1    0
2    0
3    0
4    0
dtype: int64

**Specific values conclusion:** no issues

## home_region and home_region_hebrew

In [92]:
# home_region                     1328543 non-null float64
# home_region_hebrew              1328543 non-null object

**Nulls conclusion:** no issues

**Specific values investigation:**

In [94]:
calc_diff_counts_hebrew(i_all, 'home_region')

Shape of data: (1328543, 49)


0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
dtype: int64

**Specific values conclusions:** no issues

## home_district and home_district_hebrew

In [30]:
# home_district                   1703597 non-null int64
# home_district_hebrew            1328543 non-null object

In [31]:
i_all[(i_all['home_district'].isnull() == False) & (i_all['home_district_hebrew'].isnull() == True)].shape

(375054, 49)

In [32]:
i_all[(i_all['home_district'].isnull() == False) & (i_all['home_district_hebrew'].isnull() == True)].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,375054.0,375054.0,375054.0,29032.0,375054.0,375054.0,375054.0,375054.0,366646.0,375054.0,375054.0,375054.0,375054.0,375054.0,375054.0,0.0,375054.0,0.0,0.0,0.0,691.0,691.0,691.0,825.0,278.0,375054.0,375054.0,375054.0,375054.0
mean,2012787000.0,30749130000.0,2.873634,2.823643,1.137434,84.760704,94.671151,0.06542,5.62155,4.937932,0.021706,0.263997,0.282394,7.692138,1.054368,,99.0,,,,1.716353,2.075253,1.347323,1.477576,1.003597,1.56353,2.370267,2012.746082,6.492476
std,3168515.0,4865585000.0,0.486584,0.567119,0.463895,402.706369,19.460125,0.298709,6.672784,0.487813,5.427176,0.846832,1.143942,1.489261,0.388537,,0.0,,,,0.451094,1.508472,0.846684,0.798843,0.059976,0.669543,0.993916,3.163323,3.423042
min,2008000000.0,12008000000.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,,99.0,,,,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2008.0,1.0
25%,2010033000.0,32009070000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,,99.0,,,,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2010.0,4.0
50%,2013006000.0,32012040000.0,3.0,3.0,1.0,0.0,99.0,0.0,1.0,5.0,0.0,0.0,0.0,8.0,1.0,,99.0,,,,2.0,2.0,1.0,1.0,1.0,2.0,2.0,2013.0,6.0
75%,2016001000.0,32015070000.0,3.0,3.0,1.0,0.0,99.0,0.0,11.0,5.0,0.0,0.0,0.0,8.0,1.0,,99.0,,,,2.0,3.0,1.0,2.0,1.0,2.0,3.0,2016.0,9.0
max,2018100000.0,32018100000.0,3.0,3.0,3.0,2018.0,99.0,2.0,25.0,5.0,1370.0,3.0,9.0,9.0,4.0,,99.0,,,,2.0,9.0,5.0,3.0,2.0,20.0,58.0,2018.0,12.0


In [33]:
i_all[i_all['home_district'] == 99].shape

(375054, 49)

**Null conclusion**: all appearances of `home_district` == 99 are missing in dictionary, spread over numerous years

**Specific values investigation:**

In [95]:
calc_diff_counts_hebrew(i_all, 'home_district')

Shape of data: (1328543, 49)


0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    0
17    0
18    0
19    0
20    0
21    0
22    0
23    0
24    0
25    0
dtype: int64

**Specific values conclusions:** no issues

## home_natural_area and home_natural_area_hebrew

In [None]:
# home_natural_area               1328543 non-null float64
# home_natural_area_hebrew        1321189 non-null object

In [35]:
i_all[(i_all['home_natural_area'].isnull() == False) & (i_all['home_natural_area_hebrew'].isnull() == True)].shape

(7354, 49)

In [36]:
i_all[(i_all['home_natural_area'].isnull() == False) & (i_all['home_natural_area_hebrew'].isnull() == True)].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,7354.0,7354.0,7354.0,35.0,7354.0,7354.0,7354.0,7354.0,6916.0,7354.0,7354.0,7354.0,7354.0,7354.0,7354.0,7354.0,7354.0,7354.0,7354.0,7354.0,341.0,341.0,341.0,345.0,11.0,7354.0,7354.0,7354.0,7354.0
mean,2013555000.0,27928700000.0,2.591515,2.485714,2.006119,1418.88904,8.239597,1.382785,3.116397,4.382921,5744.050041,2.209274,1.824449,4.468589,1.254555,4.031819,43.328801,438.01387,21.014686,17.482459,1.542522,1.548387,1.208211,1.394203,1.0,1.369595,1.798069,2013.518085,6.391896
std,638881.8,8063487000.0,0.806349,0.886879,0.724982,905.834675,6.287502,0.492216,4.479039,1.434821,3054.206536,1.314169,1.451907,3.549369,0.504228,0.307341,3.175859,58.08749,36.379261,5.335554,0.498921,0.847924,0.677836,0.625031,0.0,0.688093,1.08944,0.632767,3.437669
min,2013000000.0,12013000000.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,64.0,0.0,0.0,1.0,1.0,4.0,43.0,432.0,0.0,15.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2013.0,1.0
25%,2013031000.0,32013010000.0,3.0,2.0,1.0,0.0,5.0,1.0,1.0,5.0,2530.0,0.0,0.0,1.0,1.0,4.0,43.0,432.0,0.0,15.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2013.0,3.0
50%,2013059000.0,32013050000.0,3.0,3.0,2.0,1991.0,8.0,1.0,1.0,5.0,7000.0,3.0,2.0,2.0,1.0,4.0,43.0,432.0,0.0,15.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,2013.0,6.0
75%,2014039000.0,32014020000.0,3.0,3.0,3.0,2003.0,10.0,2.0,2.0,5.0,8500.0,3.0,3.0,8.0,1.0,4.0,43.0,432.0,30.0,17.0,2.0,2.0,1.0,2.0,1.0,2.0,2.0,2014.0,9.0
max,2018098000.0,32018100000.0,3.0,3.0,3.0,2017.0,99.0,2.0,19.0,5.0,8500.0,3.0,9.0,9.0,4.0,7.0,74.0,999.0,99.0,35.0,2.0,4.0,5.0,3.0,1.0,6.0,16.0,2018.0,12.0


In [37]:
i_all[(i_all['home_natural_area'] == 432) | (i_all['home_natural_area'] == 999)].shape

(21894, 49)

In [41]:
i_all[i_all['home_natural_area'] == 432].shape

(21816, 49)

In [39]:
i_all[(i_all['home_natural_area'] == 432) & (i_all['home_natural_area_hebrew'].isnull())].shape

(7276, 49)

In [40]:
i_all[(i_all['home_natural_area'] == 432) & (i_all['home_natural_area_hebrew'].isnull())].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,7276.0,7276.0,7276.0,0.0,7276.0,7276.0,7276.0,7276.0,6843.0,7276.0,7276.0,7276.0,7276.0,7276.0,7276.0,7276.0,7276.0,7276.0,7276.0,7276.0,338.0,338.0,338.0,338.0,11.0,7276.0,7276.0,7276.0,7276.0
mean,2013515000.0,27939850000.0,2.592633,,2.005498,1420.651594,8.243953,1.382903,3.105655,4.388813,5764.644035,2.21028,1.82298,4.458906,1.257009,4.0,43.0,432.0,20.457394,17.46619,1.538462,1.54142,1.204142,1.39645,1.0,1.370533,1.799615,2013.478422,6.385514
std,505885.5,8055281000.0,0.805528,,0.724228,905.016013,6.310804,0.49231,4.465081,1.429731,3064.016859,1.313858,1.450204,3.549023,0.505828,0.0,0.0,0.0,36.171185,5.361754,0.499258,0.840235,0.673611,0.623182,0.0,0.688945,1.09179,0.499569,3.439884
min,2013000000.0,12013000000.0,1.0,,1.0,0.0,1.0,0.0,1.0,1.0,64.0,0.0,0.0,1.0,1.0,4.0,43.0,432.0,0.0,15.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2013.0,1.0
25%,2013031000.0,32013010000.0,3.0,,1.0,0.0,5.0,1.0,1.0,5.0,2530.0,0.0,0.0,1.0,1.0,4.0,43.0,432.0,0.0,15.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2013.0,3.0
50%,2013059000.0,32013050000.0,3.0,,2.0,1991.0,8.0,1.0,1.0,5.0,7000.0,3.0,2.0,2.0,1.0,4.0,43.0,432.0,0.0,15.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,2013.0,6.0
75%,2014037000.0,32014020000.0,3.0,,3.0,2003.0,10.0,2.0,2.0,5.0,8500.0,3.0,3.0,8.0,1.0,4.0,43.0,432.0,30.0,17.0,2.0,2.0,1.0,2.0,1.0,2.0,2.0,2014.0,9.0
max,2014100000.0,32014100000.0,3.0,,3.0,2014.0,99.0,2.0,19.0,5.0,8500.0,3.0,9.0,9.0,4.0,4.0,43.0,432.0,99.0,35.0,2.0,4.0,5.0,3.0,1.0,6.0,16.0,2014.0,12.0


In [42]:
i_all[(i_all['home_natural_area'] == 432) & ((i_all['accident_year'] == 2013) | (i_all['accident_year'] == 2014))].shape

(7276, 49)

In [43]:
i_all[i_all['home_natural_area'] == 999].shape

(78, 49)

In [44]:
i_all[(i_all['home_natural_area'] == 999) & (i_all['home_natural_area_hebrew'].isnull())].shape

(78, 49)

In [45]:
i_all[i_all['home_natural_area'] == 999].describe()

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,license_acquiring_date,age_group,sex,vehicle_type,safety_measures,involve_yishuv_symbol,injury_severity,injured_type,injured_position,population_type,home_region,home_district,home_natural_area,home_municipal_status,home_yishuv_shape,hospital_time,medical_type,release_dest,safety_measures_use,late_deceased,car_id,involve_id,accident_year,accident_month
count,78.0,78.0,78.0,35.0,78.0,78.0,78.0,78.0,73.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0,78.0,3.0,3.0,3.0,7.0,0.0,78.0,78.0,78.0,78.0
mean,2017274000.0,26889070000.0,2.487179,2.485714,2.064103,1254.474359,7.833333,1.371795,4.123288,3.833333,3823.0,2.115385,1.961538,5.371795,1.025641,7.0,74.0,999.0,73.0,19.0,2.0,2.333333,1.666667,1.285714,,1.282051,1.653846,2017.217949,6.987179
std,798538.8,8789462000.0,0.878954,0.886879,0.795111,971.380925,3.48435,0.486412,5.582575,1.77586,0.0,1.348214,1.607146,3.486929,0.226455,0.0,0.0,0.0,0.0,0.0,0.0,1.527525,1.154701,0.755929,,0.60081,0.834747,0.800121,3.188834
min,2016002000.0,12016060000.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,3823.0,0.0,0.0,1.0,1.0,7.0,74.0,999.0,73.0,19.0,2.0,1.0,1.0,1.0,,0.0,1.0,2016.0,1.0
25%,2017012000.0,17017570000.0,1.5,2.0,1.0,0.0,5.0,1.0,1.0,2.0,3823.0,0.0,0.0,1.0,1.0,7.0,74.0,999.0,73.0,19.0,2.0,1.5,1.0,1.0,,1.0,1.0,2017.0,4.0
50%,2017088000.0,32017020000.0,3.0,3.0,2.0,1984.0,8.0,1.0,1.0,5.0,3823.0,3.0,2.0,8.0,1.0,7.0,74.0,999.0,73.0,19.0,2.0,2.0,1.0,1.0,,1.0,1.0,2017.0,7.0
75%,2018049000.0,32018030000.0,3.0,3.0,3.0,2003.0,10.75,2.0,6.0,5.0,3823.0,3.0,3.0,8.0,1.0,7.0,74.0,999.0,73.0,19.0,2.0,3.0,2.0,1.0,,2.0,2.0,2018.0,10.0
max,2018098000.0,32018100000.0,3.0,3.0,3.0,2017.0,16.0,2.0,19.0,5.0,3823.0,3.0,8.0,9.0,3.0,7.0,74.0,999.0,73.0,19.0,2.0,4.0,3.0,3.0,,3.0,5.0,2018.0,12.0


**Null conclusion**: 
- all instances of `home_natural_area` == 432 in years 2013-2014 are problematic, rest of the years are OK
- all instances of `home_natural_area`== 999 are problematic (appear in 2016-2018)

**Specific values investigation**: 

In [112]:
temp = i_all[(i_all['home_natural_area'].isnull() == False) & (i_all['home_natural_area'] != 999) & ((i_all['home_natural_area'] != 432) | ((i_all['accident_year'] != 2013) & (i_all['accident_year'] != 2014)))]
merge_with_hebrew_print_split_years(temp, 'home_natural_area')

home_natural_area 431.0:
אזור לוד        29743
אזור מודיעין     3854
Name: home_natural_area_hebrew, dtype: int64
Total: 33597
Years אזור לוד: [2008 2009 2010 2011 2012 2013 2014]
Years אזור מודיעין: [2015 2016 2017 2018]



In [114]:
merge_with_hebrew(temp, 'home_natural_area')

Unnamed: 0,index_home_natural_area,count,index_home_natural_area_hebrew
0,111.0,98382,הרי יהודה
1,311.0,98310,אזור חיפה
2,511.0,96473,אזור תל אביב
3,513.0,69396,אזור חולון
4,422.0,65956,אזור פתח תקווה
5,237.0,65203,הרי נצרת-תירען
6,512.0,58992,אזור רמת גן
7,623.0,54773,אזור באר שבע
8,241.0,53796,אזור שפרעם
9,442.0,53752,אזור ראשון לציון


**Specific values conclusions:** `home_natural_area` was called `אזור לוד` until 2014 (including) and `אזור מודיעין` since


## home_municipal_status and home_municipal_status_hebrew

In [None]:
# home_municipal_status           1322591 non-null float64
# home_municipal_status_hebrew    1322591 non-null object

**Null conclusion:** no issues

**Specific values investigations**:

In [116]:
merge_with_hebrew(i_all, 'home_municipal_status')

Unnamed: 0,index_home_municipal_status,count,index_home_municipal_status_hebrew
0,0.0,986903,עירייה
1,99.0,232018,מועצה מקומית
2,73.0,6940,מטה בנימין
3,26.0,6034,מטה יהודה
4,72.0,4793,שומרון
5,16.0,4506,עמק חפר
6,8.0,4469,הגלבוע
7,9.0,4099,עמק יזרעאל
8,56.0,4026,משגב
9,4.0,3982,מטה אשר


In [118]:
merge_with_hebrew_print_split_years(i_all, 'home_municipal_status')

home_municipal_status 7.0:
בקעת בית שאן    1003
עמק המעיינות     428
Name: home_municipal_status_hebrew, dtype: int64
Total: 1431
Years בקעת בית שאן: [2008 2009 2010 2011 2012 2013 2014]
Years עמק המעיינות: [2015 2016 2017 2018]



**Specific values conclusions:**  `home_municipal_status` == 7 was called in `home_municipal_status_hebrew` as `בקעת בית שאן` till 2014 (including), and `עמק המעיינות` since

## home_yishuv_shape and home_yishuv_shape_hebrew

In [None]:
# home_yishuv_shape               1328543 non-null float64
# home_yishuv_shape_hebrew        1328540 non-null object

In [46]:
i_all[(i_all['home_yishuv_shape'].isnull() == False) & (i_all['home_yishuv_shape_hebrew'].isnull() == True)]

Unnamed: 0,accident_id,provider_and_id,provider_code,file_type_police,involved_type,involved_type_hebrew,license_acquiring_date,age_group,age_group_hebrew,sex,sex_hebrew,vehicle_type,vehicle_type_hebrew,safety_measures,safety_measures_hebrew,involve_yishuv_symbol,involve_yishuv_name,injury_severity,injury_severity_hebrew,injured_type,injured_type_hebrew,injured_position,injured_position_hebrew,population_type,population_type_hebrew,home_region,home_region_hebrew,home_district,home_district_hebrew,home_natural_area,home_natural_area_hebrew,home_municipal_status,home_municipal_status_hebrew,home_yishuv_shape,home_yishuv_shape_hebrew,hospital_time,hospital_time_hebrew,medical_type,medical_type_hebrew,release_dest,release_dest_hebrew,safety_measures_use,safety_measures_use_hebrew,late_deceased,late_deceased_hebrew,car_id,involve_id,accident_year,accident_month
200167,2012048742,12012048742,1,,2,נהג נפגע,1996,7,30-34,2.0,נקבה,1.0,רכב נוסעים פרטי,5,לא ידוע,1939,"אזור באר שבע של""ש",2,פצוע קשה,2,נהג - רכב בעל 4 גלגלים ויותר,1,ישב ברכב במושב קדמי,1,יהודים,6.0,הדרום,62,באר שבע,623.0,אזור באר שבע,,,53.0,,2.0,\t אשפוז מעל 24 שעות,1.0,\t( קל (1-8\t,1.0,\t בית\t,1.0,\t כן\t,,,1,2,2012,1
200168,2012048742,12012048742,1,,3,נפגע,0,1,00-04,1.0,זכר,1.0,רכב נוסעים פרטי,5,לא ידוע,1939,"אזור באר שבע של""ש",2,פצוע קשה,3,נוסע - רכב בעל 4 גלגלים ויותר,2,ישב ברכב במושב אחורי,1,יהודים,6.0,הדרום,62,באר שבע,623.0,אזור באר שבע,,,53.0,,2.0,\t אשפוז מעל 24 שעות,1.0,\t( קל (1-8\t,1.0,\t בית\t,1.0,\t כן\t,,,1,5,2012,1
200169,2012048742,12012048742,1,,3,נפגע,0,1,00-04,2.0,נקבה,1.0,רכב נוסעים פרטי,5,לא ידוע,1939,"אזור באר שבע של""ש",2,פצוע קשה,3,נוסע - רכב בעל 4 גלגלים ויותר,2,ישב ברכב במושב אחורי,1,יהודים,6.0,הדרום,62,באר שבע,623.0,אזור באר שבע,,,53.0,,2.0,\t אשפוז מעל 24 שעות,2.0,\t(בינוני (9-15\t,1.0,\t בית\t,1.0,\t כן\t,,,1,6,2012,1


In [47]:
i_all[i_all['home_yishuv_shape'] == 53].shape

(274, 49)

In [48]:
i_all[(i_all['home_yishuv_shape'] == 53) & (i_all['accident_year'] == 2012)].shape

(3, 49)

**Null conclusion:** all appearances of `home_yishuv_shape` == 53 have issues in year 2012, rest of the years are OK

**Specific values investigation:**

In [122]:
merge_with_hebrew(i_all, 'home_yishuv_shape')

Unnamed: 0,index_home_yishuv_shape,count,index_home_yishuv_shape_hebrew
0,14.0,220801,
1,16.0,215386,
2,13.0,162232,
3,15.0,126211,
4,12.0,86063,
5,27.0,83770,
6,26.0,78395,
7,28.0,52499,
8,17.0,48057,
9,11.0,45890,ירושלים


In [121]:
merge_with_hebrew_print_split_years(i_all, 'home_yishuv_shape')

home_yishuv_shape 14.0:
100,000-199,999-תושב, יישוב יהודי       129139
יישובים יהודיים 199999-100000 תושבים     91662
Name: home_yishuv_shape_hebrew, dtype: int64
Total: 220801
Years 100,000-199,999-תושב, יישוב יהודי: [2008 2009 2010 2011 2012]
Years יישובים יהודיים 199999-100000 תושבים: [2013 2014 2015 2016 2017 2018]

home_yishuv_shape 16.0:
20,000-49,999-תושב, יישוב יהודי       111529
יישובים יהודיים 49999-20000 תושבים    103857
Name: home_yishuv_shape_hebrew, dtype: int64
Total: 215386
Years 20,000-49,999-תושב, יישוב יהודי: [2008 2009 2010 2011 2012]
Years יישובים יהודיים 49999-20000 תושבים: [2013 2014 2015 2016 2017 2018]

home_yishuv_shape 13.0:
יישובים יהודיים 499999-200000 תושבים    136638
חיפה                                     25594
Name: home_yishuv_shape_hebrew, dtype: int64
Total: 162232
Years יישובים יהודיים 499999-200000 תושבים: [2013 2014 2015 2016 2017 2018]
Years חיפה: [2008 2009 2010 2011 2012]

home_yishuv_shape 15.0:
יישובים יהודיים 99999-50000 תושבים    69733
50,

**Specific values conclusions:** See above, a lot of names of `home_yishuv_shape` changed between 2012 and 2013 (beside `home_yishuv_shape` == 9 where there seems to be a mistake of 1 instance in 2015

## hospital_time and hospital_time_hebrew

In [123]:
# hospital_time                   53022 non-null float64
# hospital_time_hebrew            53022 non-null object

**Null conclusion:** no issues

**Specific values investigation:**

In [125]:
calc_diff_counts_hebrew(i_all,'hospital_time')

Shape of data: (53022, 49)


0    0
1    0
dtype: int64

**Specific values conclusion:** no issues

## medical_type and medical_type_hebrew

In [126]:
# medical_type                    53022 non-null float64
# medical_type_hebrew             53022 non-null object

**Null conclusion:** no issues

**Specific values investigation:**

In [130]:
calc_diff_counts_hebrew(i_all, 'medical_type')

Shape of data: (53022, 49)


0    0
1    0
2    0
3    0
4    0
dtype: int64

**Specific values conclusion:** no issues

## release_dest and release_dest_hebrew

In [127]:
# release_dest                    53022 non-null float64
# release_dest_hebrew             53022 non-null object

**Null conclusion:** no issues

**Specific values investigation:**

In [131]:
calc_diff_counts_hebrew(i_all, 'release_dest')

Shape of data: (53022, 49)


0    0
1    0
2    0
3    0
4    0
dtype: int64

**Specific values conclusion:** no issues

## safety_measures_use and safety_measures_use_hebrew

In [128]:
# safety_measures_use             58180 non-null float64
# safety_measures_use_hebrew      58180 non-null object

**Null conclusion:** no issues

**Specific values investigation:**

In [132]:
calc_diff_counts_hebrew(i_all, 'safety_measures_use')

Shape of data: (58180, 49)


0    0
1    0
2    0
dtype: int64

**Specific values conclusion:** no issues

## late_deceased and late_deceased_hebrew

In [129]:
# late_deceased                   4013 non-null float64
# late_deceased_hebrew            4013 non-null object

**Null conclusion:** no issues

**Specific values investigation:**

In [134]:
calc_diff_counts_hebrew(i_all, 'late_deceased')

Shape of data: (4013, 49)


0    0
1    0
dtype: int64

**Specific values conclusion:** no issues