# San Francisco Crime Classification from a top ranker

본 notebook은 Yannis Pappas 커널을 참고하여 작성했습니다. (https://www.kaggle.com/yannisp/sf-crime-analysis-prediction)

## Data Science Life Cycle
Data Science Life Cycle은 아래의 단계로 구성되어 있으며, 본 경진 대회에서도 아래의 전체 Life Cycle대로 진행할 예정입니다.
1. 데이터 품질을 향상시키기 위한 Data Wrangling
2. 탐색적 데이터 분석 (EDA)
3. 현재 Feature들을 기반으로 추가적인 Feature들을 만드는 Feature Engineering
4. (필요 시) 데이터 정규화 및 변환
5. 모델 성능 측정을 위한 훈련 데이터, 테스트 데이터 생성 및 파라미터 조정
6. 모델 선택 및 평가, 결과 예측을 위한 모델 생성

In [1]:
import pandas as pd
from shapely.geometry import  Point
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import seaborn as sns
from matplotlib import cm
import urllib.request
import shutil
import zipfile
import os
import re
import contextily as ctx
import geoplot as gplt
import lightgbm as lgb
import eli5
from eli5.sklearn import PermutationImportance
from lightgbm import LGBMClassifier
from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots
import shap

In [2]:
train = pd.read_csv('data/train.csv', parse_dates=['Dates'])
test = pd.read_csv('data/test.csv', parse_dates=['Dates'], index_col='Id')

In [3]:
train.Dates.describe()

count                  878049
unique                 389257
top       2011-01-01 00:01:00
freq                      185
first     2003-01-06 00:01:00
last      2015-05-13 23:53:00
Name: Dates, dtype: object

In [4]:
train.shape

(878049, 9)

훈련 데이터는 2003.1.6.부터 2015.5.13.까지의 범죄를 담고 있으며, 총 9개의 features가 있습니다.

In [5]:
train.head()

Unnamed: 0,Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y
0,2015-05-13 23:53:00,WARRANTS,WARRANT ARREST,Wednesday,NORTHERN,"ARREST, BOOKED",OAK ST / LAGUNA ST,-122.425892,37.774599
1,2015-05-13 23:53:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,"ARREST, BOOKED",OAK ST / LAGUNA ST,-122.425892,37.774599
2,2015-05-13 23:33:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,"ARREST, BOOKED",VANNESS AV / GREENWICH ST,-122.424363,37.800414
3,2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,NORTHERN,NONE,1500 Block of LOMBARD ST,-122.426995,37.800873
4,2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,PARK,NONE,100 Block of BRODERICK ST,-122.438738,37.771541
